Installing IBM MDM Publisher in an internet-connected Kubernetes cluster

To install an IBM® MDM Publisher instance in an internet-connected Kubernetes cluster, run a file called publisher-helm-installer.bin to start the setup process.

Before you begin

Note: These instructions are for installing MDM Publisher in an internet-connected environment. If you intend to install in an offline environment, see Installing IBM MDM Publisher in a Kubernetes cluster that is not connected to the internet.
Before you begin installing MDM Publisher in an online Kubernetes cluster:
  • Ensure that the system where you plan to run the installation has access to the internet.
  • Review the prerequisites. Ensure that the prerequisites are in place before continuing. This includes provisioning your Kubernetes cluster and setting up a storage provider such as Portworx, NFS, or Rook-Ceph®.
    Tip: The MDM Publisher distribution includes a sample cluster using Rook and Ceph. To set up the provided Rook-Ceph sample cluster:
    1. Create the Rook-Ceph sample cluster by running the following command:
      kubectl apply -f ${INSTALL_LOC}/mdm-publisher/config/rwx_storage/sample-cluster.yaml
      This process can take up to 10 minutes.
    2. To check the sample cluster status, run the following command:
      kubectl get CephCluster rook-ceph -n rook-ceph
      When the cluster has been successfully created, the returned status will be Created. For example:
      NAME        DATADIRHOSTPATH   MONCOUNT   AGE    STATE
      rook-ceph   /var/lib/rook     3          122m   Created
    3. Create the Rook-Ceph file system by running the following command:
      kubectl apply -f ${INSTALL_LOC}/mdm-publisher/config/rwx_storage/filesystem.yaml
    4. Wait for the file system pod to start before running the initialization script.
  • Ensure that NGINX Ingress Controller is also running on the Kubernetes cluster. This component is used to organize ingress into components of an MDM Publisher deployment. For more information, see Setting up NGINX Ingress Controller.
  • Download the installation assets.
  • If you intend to use MDM Publisher to set up ongoing synchronization between InfoSphere® MDM Advanced Edition or Standard Edition and IBM Master Data Connect, install the MDM ongoing synchronization server.

About this task

Note: If you are trying out MDM Publisher in a development or trial environment and do not have Kubernetes, you can install it on Minikube instead. Minikube deployments are not supported on production environments. For information about installing MDM Publisher in a Minikube environment for development or trial use, see Installing IBM MDM Publisher on internet-connected Minikube (for trial or development environments only).

MDM Publisher installation and deployment is done using a Helm chart. The MDM Publisher Helm chart is wrapped into an installation bin binary. You can either install the MDM Publisher Helm chart by running the scripts included in the installation bin or using unattended mode that leverages direct Helm commands.

The MDM Publisher distribution comes with an installation file called publisher-helm-installer.bin. When you run the file, it creates a directory called mdm-publisher. This directory contains Helm charts, scripts, and other artifacts required to set up an MDM Publisher instance on Kubernetes. The file also provides you with information about using the artifacts to set up and configure your MDM Publisher instance.

Procedure

  1. On a computer connected to the internet, run publisher-helm-installer.bin.
    ./publisher-helm-installer.bin

    Confirm that the script created a directory called mdm-publisher.

  2. Depending on the amount of data you are intending to bulk load using MDM Publisher, you might need to adjust the amount of CPU and memory allocated to it by Kubernetes. The default allocations are small (8 executors with 1280 MB of memory) and must be adjusted for larger workloads. To adjust the resource allocations:
    1. Open ${INSTALL_LOC}/mdm-publisher/ibm-publisher-services-prod/values-k8s.yaml and locate the resources section.
    2. Update the resource allocations as required for your deployment.
      For more information and for guidance about resource allocation and workload sizes, see Configuring IBM MDM Publisher.
      Tip: To ensure MDM Publisher stability, ensure that the values of requests and limits are always equal. This ensures that MDM Publisher pre-allocates the correct amount of memory.
    3. Specify the number of Spark executor Kubernetes pods for running MDM Publisher jobs. Edit the following properties in the YAML file:
      spark:
      ........  
        sparktransform:
          executor:
            instances: "4"  
          shufflePartitions: "50"
          memoryOverheadFactor: "0.1"
          driver:
            memory: "2g"
          mem: "1024m"
          limit:
            cores: "1"
        sparkextract:
          largetable:
            executor:
              instances: "4"
          smalltable:
            executor:
              instances: "1"        
          memoryOverheadFactor: "0.1"
          driver:
            memory: "2g"
          mem: "1024m"
          limit:
            cores: "1"
        graphBatchCommitSize: 100 # Size of a single commit to graph in a spark job
      Tip: The number of executor pods can be different for each MDM Publisher job stage (extract and transform).
  3. If you intend to use this MDM Publisher instance to connect to the Master Data Management service on IBM Cloud® or IBM Cloud Pak® for Data as a Service, edit the configuration to enable the connection. For more information, see Connecting MDM Publisher to the IBM Match 360 service on Cloud Pak for Data as a Service.
  4. Ensure that all of your secure endpoints are up and running.
    The MDM Publisher security setup wizard that runs as part of initialization supports the following endpoints:
    • Master Data Connect:
      • Master Data Connect server
      • IBM Aspera® High-Speed Transfer Server (HSTS)
    • InfoSphere MDM:
      • Ongoing synchronization server (Apache Kafka)
      • Database server (Db2®, Db2 for z/OS®, or Oracle)
      • For virtual MDM deployments, the MDM application server (WebSphere® Application Server)
  5. Initialize the MDM Publisher installation by running the following script:
    ${INSTALL_LOC}/mdm-publisher/bin/init_publisher.sh
    The initialization script includes a number of startup actions, some of which require your input:
    • Starts a security setup wizard. Use the wizard to facilitate the configuration of secure SSL communication between MDM Publisher and other systems such as InfoSphere MDM, Master Data Connect, and their underlying systems. The wizard prompts you for parameters, imports server certificates into corresponding MDM Publisher trust stores, and creates necessary artifacts to facilitate secure communication. Certificates are extracted and placed into the cert_management folder.
    • Downloads and installs the MDM Publisher image.
    • Initializes the MDM Publisher container.
    Important: Do not try to access MDM Publisher container until it is in a READY state. It can take several minutes to for MDM Publisher to successfully initialize. The first initialization will take longer than subsequent initializations.
  6. Secure a connection between MDM Publisher and a Master Data Connect instance.
  7. To modify the MDM Publisher configuration for an existing MDM Publisher deployment, complete the following steps.
    1. Update the appropriate configuration map YAML file with your configuration changes:
      • ${INSTALL_LOC}/mdm-publisher/ibm-publisher-services-prod/templates/publish-config-k8s.yaml
      • ${INSTALL_LOC}/mdm-publisher/ibm-publisher-services-prod/templates/publisher-wlp-configmap.yaml
      • ${INSTALL_LOC}/mdm-publisher/ibm-publisher-services-prod/templates/data-transfer-configuration.yaml
    2. Run the following script to apply the new configuration:
      ${INSTALL_LOC}/mdm-publisher/bin/update_configuration.sh
      This command gracefully shuts down and deletes the running MDM Publisher pod. Kubernetes will then recreate a new pod using the new configuration.
      Note: This command does not delete persistent volumes associated with MDM Publisher, so all of the MDM Publisher job data is preserved.
    Tip: As an alternate method of updating the MDM Publisher configuration, you can use a silent (headless) mode. This method could be useful if you frequently need to update a large number of endpoints.
    1. Record a silent mode response file by running the following command.
      helm get values ibm-publisher-services-prod-369994195 --namespace mdm-publisher > /root/install_bin/values-headless.yaml
    2. Edit the values file /root/install_bin/values-headless.yaml to provide the updated information for each endpoint in a block of code. For example:
      global:
        authorizedEndpoints:
          - type: MDC
            host: jujitsu1.example.ibm.com
            alias: mdc444a
            port: 30299
            aspera_host: jujitsu1.example.ibm.com
            aspera_alias: asp444a
            aspera_port: 31000
          - type: MDM_jetty
            host: dockermdm1.example.ibm.com
            alias: jet444a
            port: 4070
          - type: MDM_server
            host: rajumdm1.example.ibm.com
            alias: mdm444a
            port: 9443
          - type: MDM_jetty
            host: rajumdm1.example.ibm.com
            alias: jet444a2
            port: 4070
    3. Add a comment to the first line of the /root/install_bin/values-headless.yaml file that says USER-SUPPLIED VALUES:
    4. Run the following two commands to apply the changes to your existing MDM Publisher deployment. Replace the example values with values relevant to your deployment.
      helm upgrade --namespace mdm-publisher --reuse-values --values /root/install_bin/values-headless.yaml ibm-publisher-services-prod-369994195 ./mdm-publisher/ibm-publisher-services-prod
      kubectl -n mdm-publisher delete pod mdm-publisher-0 mdm-publisher-aspera-client-sts-0

What to do next

Now that you have installed and deployed MDM Publisher, you might want to take the following actions: