Installing IBM MDM Publisher in a Red Hat OpenShift cluster

To install an IBM® MDM Publisher instance in a Red Hat® OpenShift® cluster, run a file called publisher-helm-installer.bin to start the setup process.

Before you begin

Before you begin installing MDM Publisher in a Red Hat OpenShift cluster:
  • Ensure that the system where you plan to run the installation has access to the internet.
  • Review the prerequisites. Ensure that the prerequisites are in place before continuing. This includes provisioning your OpenShift cluster and setting up a storage provider such as Portworx, NFS, or Red Hat OpenShift Container Storage (OCS).
  • Download the installation assets.
  • If you intend to use MDM Publisher to set up ongoing synchronization between InfoSphere® MDM Advanced Edition or Standard Edition and IBM Master Data Connect, install the MDM ongoing synchronization server.

About this task

Note: If you are trying out MDM Publisher in a development or trial environment and do not have Red Hat OpenShift or Kubernetes, you can install it on Minikube instead. Minikube deployments are not supported on production environments. For information about installing MDM Publisher in a Minikube environment for development or trial use, see Installing IBM MDM Publisher on internet-connected Minikube (for trial or development environments only).

MDM Publisher installation and deployment is done using a Helm chart. The MDM Publisher Helm chart is wrapped into an installation bin binary. You can either install the MDM Publisher Helm chart by running the scripts included in the installation bin or using unattended mode that leverages direct Helm commands.

The MDM Publisher distribution comes with an installation file called publisher-helm-installer.bin. When you run the file, it creates a directory called mdm-publisher. This directory contains Helm charts, scripts, and other artifacts required to set up an MDM Publisher instance on Red Hat OpenShift. The file also provides you with information about using the artifacts to set up and configure your MDM Publisher instance.

Procedure

  1. On a computer connected to the internet, run publisher-helm-installer.bin.
    ./publisher-helm-installer.bin

    Confirm that the script created a directory called mdm-publisher.

  2. Depending on the amount of data you are intending to bulk load using MDM Publisher, you might need to adjust the amount of CPU and memory allocated to it by Red Hat OpenShift. The default allocations are small (8 executors with 1280 MB of memory) and must be adjusted for larger workloads. To adjust the resource allocations:
    1. Open ${INSTALL_LOC}/mdm-publisher/ibm-publisher-services-prod/values-openshift.yaml.
    2. Update the resource allocations as required for your deployment. For more information about configuration and workload sizes, see Configuring IBM MDM Publisher.
    3. Specify the number of Spark Executor Red Hat OpenShift pods for running MDM Publisher jobs. Edit the following properties in the YAML file:
      spark:
      ........  
        sparktransform:
          executor:
            instances: "4"  
          shufflePartitions: "50"
          memoryOverheadFactor: "0.1"
          driver:
            memory: "2g"
          mem: "1024m"
          limit:
            cores: "1"
        sparkextract:
          largetable:
            executor:
              instances: "4"
          smalltable:
            executor:
              instances: "1"        
          memoryOverheadFactor: "0.1"
          driver:
            memory: "2g"
          mem: "1024m"
          limit:
            cores: "1"
        graphBatchCommitSize: 100 # Size of a single commit to graph in a spark job
      Tip: The number of executor pods can be different for each MDM Publisher job stage (extract and transform).
  3. If you intend to use this MDM Publisher instance to connect to the Master Data Management service on IBM Cloud® or IBM Cloud Pak® for Data as a Service, edit the configuration to enable the connection. For more information, see Connecting MDM Publisher to the IBM Match 360 service on Cloud Pak for Data as a Service.
  4. Ensure that all of your secure endpoints are up and running.
    The MDM Publisher security setup wizard that runs as part of initialization supports the following endpoints:
    • Master Data Connect:
      • Master Data Connect server
      • IBM Aspera® High-Speed Transfer Server (HSTS)
    • InfoSphere MDM:
      • Ongoing synchronization server (Apache Kafka)
      • Database server (Db2®, Db2 for z/OS®, or Oracle)
      • For virtual MDM deployments, the MDM application server (WebSphere® Application Server)
  5. Initialize the MDM Publisherinstallation by running the following script:
    ${INSTALL_LOC}/mdm-publisher/bin/init_publisher.sh
    The initialization script includes a number of startup actions, some of which require your input:
    • Starts a security setup wizard. Use the wizard to facilitate the configuration of secure SSL communication between MDM Publisher and other systems such as InfoSphere MDM, Master Data Connect, and their underlying systems. The wizard prompts you for parameters, imports server certificates into corresponding MDM Publisher trust stores, and creates necessary artifacts to facilitate secure communication.
    • Downloads and installs the MDM Publisher image.
    • Initializes the MDM Publisher container.
    Important: Do not try to access MDM Publisher container until it is in a READY state. It can take several minutes to for MDM Publisher to successfully initialize. The first initialization will take longer than subsequent initializations.
  6. Set the route for users to access the MDM Publisher user interface.
    1. Create and following sample script to set up the route.
      Note: If you installed MDM Publisher on a namespace other than mdm-publisher, be sure to update the value of the namespace property in the route script.
      cat <<EOF >  ${INSTALL_LOC}/mdm-publisher/openshift/publisher-ui-route.yaml
      kind: Route
      apiVersion: route.openshift.io/v1
      metadata:
        name: mdm-publisher
        namespace: mdm-publisher
        labels:
          app: mdm-publisher-service
      spec:
        to:
          kind: Service
          name: mdm-publisher
          weight: 100
        port:
          targetPort: publisher-https
        tls:
          termination: passthrough
          insecureEdgeTerminationPolicy: Redirect
        wildcardPolicy: None
      EOF
    2. Run the script you just created:
      oc create -f ${INSTALL_LOC}/mdm-publisher/openshift/publisher-ui-route.yaml
    3. Use the get route command to obtain the MDM Publisher user interface's web URL:
      oc get route -n mdm-publisher -o wide
      The response should be similar to the following:
      NAME            HOST/PORT                                             PATH            SERVICES          PORT              TERMINATION            WILDCARD
      mdm-publisher   mdm-publisher-mdm-publisher.apps.os.acme.com          mdm-publisher   publisher-https                     passthrough/Redirect   None
      The HOST/PORT value corresponds to the MDM Publisher user interface's web URL. Using the above example, the URL would be https://mdm-publisher-mdm-publisher.apps.os.acme.com/gateway-security?originalUrl=/publisher.
  7. Secure a connection between MDM Publisher and a Master Data Connect instance.
  8. To modify the MDM Publisher configuration for an existing MDM Publisher deployment, complete the following steps.
    1. Update the appropriate configuration map YAML file with your configuration changes:
      • ${INSTALL_LOC}/mdm-publisher/ibm-publisher-services-prod/templates/publish-config-openshift.yaml
      • ${INSTALL_LOC}/mdm-publisher/ibm-publisher-services-prod/templates/publisher-wlp-configmap.yaml
      • ${INSTALL_LOC}/mdm-publisher/ibm-publisher-services-prod/templates/data-transfer-configuration.yaml
    2. Run the following script to apply the new configuration:
      ${INSTALL_LOC}/mdm-publisher/bin/update_configuration.sh
      This command gracefully shuts down and deletes the running MDM Publisher pod. Kubernetes will then recreate a new pod using the new configuration.
      Note: This command does not delete persistent volumes associated with MDM Publisher, so all of the MDM Publisher job data is preserved.

What to do next