Table of contents

Installing the Watson Language Translator service

An administrator can install the Watson™ Language Translator service on IBM® Cloud Pak for Data.

Before you begin

Required role: To complete this task, you must be an administrator of the project (namespace) where you will deploy the service.

If you are running the installation on a cluster that is connected to the internet, ensure that your repo.yaml file includes the appropriate registry entry for the service. For details, see Obtaining the installation files.

The systems that host the service must meet these additional requirements:
  • Intel 64-bit architecture.
  • Datastores (PostgreSQL, MinIO) only support block storage for persistence.
Identify Language Support

Before installing Language Translator, you must identify which translation models are required for your installation. The number of languages installed directly affects the system requirements and installation time.

Translation models are provided in three separate installation modules. You need to select at least one language pak module as a prerequisite to installing the Watson Language Translator service.

A language pak module has a collection of images, one for each translation model that you might want to install. Each image for a translation model (for example English to German translation) has a size between 1GB and 2.5GB on disk.

To translate from a source language to a target language, download and enable the translation models for the source and target languages.

Language Translator can then translate between the source and target languages directly or automatically in two steps through English if no direct model is available.

For example, to translate directly from German to English, select Language Pak 1, which includes the German to English translation model, and then specify the German to English model in the override configuration (de-en). In the case of translating from German to Korean through English, select Language Pak 1, which includes German to English model, and Language Pak 2, which includes the English to Korean model. and then specify the German to English (de-en) and English to Korean models (en-ko) in the override configuration (de-en).

The exception is Catalan, which supports translating only to and from Spanish.

Important: Make sure to download and enable all the translation models that you want to translate because you cannot update the languages after installation.

Languages are grouped into the following modules:

IBM Watson Language Translator Language Pak 1

For each of the following languages, the module contains a translation model for English to the language and a reverse translation model for the language into English:

Module name: watson-language-pak-1

Language Language Code
Arabic ar
Chinese (simplified) zh
Chinese (traditional) zh-TW
French fr
German de
Hebrew he
Italian it
Portuguese (Brazilian) pt
Russian ru
Spanish es
Turkish tr
IBM Watson Language Translator Language Pak 2

For each of the following languages, the module contains a translation model for English to the language and a reverse translation model for the language into English:

Module name: watson-language-pak-2

Language Language Code
Bengali bn
Gujarati gu
Hindi hi
Indonesian id
Japanese ja
Korean ko
Malay ms
Malayalam ml
Maltese mt
Nepali ne
Sinhala si
Tamil ta
Telugu te
Thai th
Urdu ur
Vietnamese vi
IBM Watson Language Translator Language Pak 3

For each of the following languages, the package contains a translation model for English to the language and a reverse translation model for the language into English:

Module name: watson-language-pak-3

Language Language Code
Bulgarian bg
Croatian hr
Czech cs
Danish da
Dutch nl
Estonian et
Finnish fi
Greek el
Hungarian hu
Irish ga
Latvian lv
Lithuanian lt
Norwegian Bokmål nb
Polish pl
Romanian ro
Slovak sk
Slovenian sl
Swedish sv
Extra Non-English translation models in Language Pak 3:
Translation Model Translation Model Codes
Catalan <-> Spanish ca-es, es-ca
German <-> French de-fr, fr-de
German <-> Italian de-it, it-de
French <-> Spanish fr-es, es-fr
Resources Required

In addition to the general hardware requirements and recommendations, the Watson Language Translator service has the following requirements:

Resource Dev Prod (HA)
Minimum CPU 8 16
Minimum Memory 30GB 80GB

The dev requirements are based on:

  • single replicas for service components
  • 2 installed translation models

The prod (HA) requirements are based on:

  • 2 replicas (highly available mode) for service components
  • 6 installed translation models
Storage Requirements
Datastore Space per PVC Storage type Supported Storage Classes
PostgreSQL 10 GB Block Storage portworx, EBS, vsphere
MiniO 10 GB Block Storage portworx, EBS, vsphere
Storage Class and Persistent Volume Set Up

A Persistent Volume (PV) is a unit of storage in the cluster. In the same way that a node is a cluster resource, a persistent volume is also a resource in the cluster. For an overview, see Persistent Volumes in the Cloud Pak for data storage add-ons documentation.

You can use a Cloud Pak for Data storage add-on, or a storage option that is hosted outside the cluster, such as the vSphere Cloud Provider.

Note the storage class volume type requirements below when selecting your storage options.

To see the available storage classes in your cluster, or to verify that you have properly set up persistent volumes and storage classes, run the applicable command and confirm the storage class you configured is listed:

oc get storageclass
  1. If the portworx-sc storageclass does not exist (use the oc get sc portworx-sc command to check), create it by running the following command.
    oc create -f {manifest-file}

    where {manifest-file} is a YAML file that contains the following specifications:

    apiVersion:storage.k8s.io/v1
    kind:StorageClass
    metadata:
      name:portworx-sc
    parameters:
      block_size:64k
      io_profile:db
      priority_io:high
      repl:"3"
      snap_interval:"0"
    provisioner:kubernetes.io/portworx-volume
    reclaimPolicy:Delete
    volumeBindingMode:Immediate
  2. To verify that the storageclass was created properly, run the following command.
    oc get storageclass | grep portworx-sc

Before you can use the service, you must complete the following steps:

  1. Create an lt-repo.yaml file.
    Add the following content to the file:
    registry:
      - url: cp.icr.io/cp/cpd
        username: "cp"
        apikey: {entitlement-key}
        namespace: ""
        name: base-registry
      - url: cp.icr.io
        username: "cp"
        apikey: {entitlement-key}
        namespace: "cp/watson-lt"
        name: lt-registry
    fileservers:
      - url: https://raw.github.com/IBM/cloud-pak/master/repo/cpd3
  2. If you don't have one or don't know it, get the {entitlement-key} from myibm.com.
  3. Replace the {entitlement-key} references in the YAML file with your entitlement key value, and then save and close the lt-repo.yaml file.

Ensure that the Mac OS or Linux machine where you will run the commands meets the appropriate requirements for your environment:

Requirements for the machine Cluster is connected to the internet Cluster is air-gapped
Can connect to the cluster.
Is connected to the internet.  
Has the oc command-line interface.
You can download the appropriate client tools for your operating system from Red Hat® OpenShift®:
Has the Cloud Pak for Data command-line interface.
Has the lt-repo.yaml in the same directory as the Cloud Pak for Data command-line interface.  
Has the cpd-Operating_System-workspace directory, which contains the required files.
See Preparing for air-gapped installations.
Important: When you follow the instructions, use these values for the parameters:
  • Replace repo.yaml with lt-repo.yaml
  • Use watson-language-translator as the assembly name.
  • Use 1.1.2 as the assembly version.
  • You can pass $(oc registry info)/{namespace} with the --transfer-image-to parameter.
  • When asked for credentials, specify the appropriate Open Shift administrator user name, such as kubeadmin or ocadmin. You can use oc whoami -t to specify the associated password.
 

About this task

If you are installing multiple services on your cluster, you must run the installations one at a time and wait until the installation completes before installing another service. You cannot run the installations in parallel.

Procedure

  1. This service requires the restricted SecurityContextConstraints to be bound to the target namespace prior to installation. If this SCC is already applied to the control plane, skip this step.
    For a sample of the standard definition of the restricted SCC, see the README file for the service. Run the following command to bind the restricted SecurityContextConstraint to the Cloud Pak for Data namespace in which you will install the service:
    oc adm policy add-scc-to-group restricted system:serviceaccounts:{namespace}

    where {namespace} is the namespace in which Cloud Pak for Data is installed.

  2. Add the cluster namespace label to your service namespace.
    The label is needed to permit communication between your application's namespace and the Cloud Pak for Data namespace by using a network policy.
    1. Log in to OpenShift.
      oc login
    2. Add the label.
      oc label --overwrite namespace {namespace} ns={namespace}
      where {namespace} is the namespace in which Cloud Pak for Data is installed.
      For example:
      oc label --overwrite namespace zen ns=zen
      If you get a message that says namespace/zen not labeled, it means the namespace was already labeled. No action is required.
    3. Make sure you are pointing to the correct project.
      oc project {namespace}
      where {namespace} is the namespace in which Cloud Pak for Data is installed.
  3. From the namespace where the Cloud Pak for Data cluster is installed, get the name of the secret for pulling images from the internal Docker registry.
    oc get secrets | grep default-dockercfg
    Make a note of the secret. You will add it as the value for the global.image.pullSecret setting in the override file that you create in the next step. For example:
    global:
      image:
        pullSecret: "default-dockercfg-gqfb4"
  4. Create an override configuration.

    The default installation of IBM Language Translator does not include any translation models, so an override file must be used to specify which translation models are installed.

    1. Create an lt-override.yaml file and define any custom configuration settings. You can use the sample overrides.yaml file that is provided with the service installation files as a starting point.

      Set the storage class if it is different from portworx-sc (portworx-sc is the default value):

      global:
         storageClassName: "{storage_class}"

      where {storage_class} is the StorageClass name specified in the StorageClass definition.

    2. If the namespace is different from zen, set the following parameter:
      gateway:
        addonService:
          zenNamespace: <target_namespace>
    3. If using persistent volumes for the data stores, then set the following parameter to enable it:

      For MinIO (s3):

      s3:
        persistence:
          enabled: true
          size: 10Gi

      For PostgreSQL:

      postgres:
        persistence:
          enabled: true
          size: 10Gi
    4. Enabling Language Support

      You need to enable at least one translation model in the installation configuration before proceeding with the chart installation. In the predefined development and production configurations, all translation models are disabled by default. To enable a translation model, add the desired translation model and set the enabled parameter to true.

      All listed languages can be translated to and from English (8 non-English translation models are available in language pak 3). Each direction of translation is a separate translation model. For example de-en translates from German to English, and en-de translates from English to German.

      Specify a translation model in the override file by adding an object with the "from" and "to" language codes separated by a hyphen as follows:

       translationModels:
         ...
         de-en: # <-- This specifies the translation model to enable
           enabled: true# <--- set this to true to enable the German to English translation model
         ...
      

      Important: For every translation model that you enable in the configuration, ensure that the language pak model that contains it is included during deployment. For example, if you enable Irish in the configuration, language pak 3 must be specified during the deployment.

      Important: The CPU and memory requirements are based on installation of 2 translation models for development and 6 translation models for production.

    Example lt-override.yaml

    The following is an exemplary override file for zen namespace, portworx-sc storageclass, and two translation model definitions.

    global:
      storageClassName: "portworx-sc"
    
    gateway:
      addonService:
        zenNamespace: zen
    
    s3:
      persistence:
        enabled: true
        size: 10Gi
    
    postgres:
      persistence:
        enabled: true
        size: 10Gi
    
    translationModels:
      ar-en:
        enabled: true
      de-en:
        enabled: true

To install the service:

  1. Run the appropriate cpd command for your environment:
    Tip: For a list of all available options, enter the command: ./cpd-Operating_System --help.
    • To install the service on a cluster that can connect to the internet:
      1. Change to the directory where you stored the Cloud Pak for Data command-line interface and the lt-repo.yaml file.
      2. Log in to your Cloud Pak for Data cluster as a project administrator:
        oc login OpenShift_URL:port
      3. Run the following command to see a preview of what will be installed when you install the service.
        ./cpd-{Operating_System} --repo ./lt-repo.yaml \
                --assembly watson-language-translator \
                --version assembly_version \
                --namespace Project \
                --optional-modules Modules \
                --storageclass Storage_class_name \
                --transfer-image-to Registry_location \
                --target-registry-username OpenShift_Username \
                --target-registry-password OpenShift_Password \
                --insecure-skip-tls-verify \
                --cluster-pull-prefix Registry_from_cluster \
                --override Filepath_to_override.yaml \
                --dry-run
        • Replace the {Operating_System} in the cpd-{Operating_System} command:
          • Linux: linux
          • Mac OS: darwin
        • The lt-repo.yaml file is the file you created earlier.
        • For {assembly_version}, specify 1.1.2.
        • For the Registry_location, specify $(oc registry info)/{namespace}. The command oc registry info retrieves the registry location. Be sure to add /{namespace} to it.
        • For Registry_from_cluster, specify the address of the internal OpenShift docker registry and add /{namespace} to it. The values are typically:
            • OpenShift 4.x: image-registry.openshift-image-registry.svc:5000
            • OpenShift 3.x: docker-registry.default.svc:5000
        • {namespace} is the namespace that Cloud Pak for Data was installed into, which is typically zen.
        • {modules} Select one or more module and they should be comma separated.
          • watson-language-pak-1,watson-language-pak-2,watson-language-pak-3
        • Provide the username and password for a user with access to the registry in the target-registry-username and target-registry-password parameters. The default username is typically:
            • OpenShift 4.x: kubeadmin
            • OpenShift 3.x: ocadmin
            If you specify $(oc whoami -t) as the password, the corresponding password is populated for you.
        • If you are using the internal Red Hat OpenShift registry and you are using the default self-signed certificate, specify the --insecure-skip-tls-verify flag to prevent x509 errors.
        • Specify overrides.yaml as the Filepath_to_override.yaml.
        For example:
        ./cpd-linux --repo lt-repo.yaml \
                --assembly watson-language-translator \
                --version 1.1.2 \
                --namespace zen \
                --storageclass Storage_class_name \
                --transfer-image-to $(oc registry info)/zen \
                --target-registry-username kubeadmin  \
                --target-registry-password $(oc whoami -t) \
                --insecure-skip-tls-verify \
                --cluster-pull-prefix image-registry.openshift-image-registry.svc:5000/zen \
                --override overrides.yaml \
                --dry-run
        
      4. If the dry-run is successful, then you are ready to install the service. Remove the --dry-run parameter from the command and enter the command again. Otherwise, fix any problems that exist before you try to install the service.
    • To install the service on an air-gapped cluster:
      Important: You should have already performed the steps in Preparing for air-gapped installations to prepare for an air-gapped installation and used the following values for the parameters:
      • Replace repo.yaml with lt-repo.yaml
      • Use watson-language-translator as the assembly name.
      • Use 1.1.2 as the assembly version.
      • You can pass $(oc registry info)/{namespace} with the --transfer-image-to parameter.
      • When asked for credentials, specify the appropriate Open Shift administrator user name, such as kubeadmin or ocadmin. You can use oc whoami -t to specify the associated password.
      1. Change to the directory where you placed the Cloud Pak for Data command-line interface.
      2. Log in to your Red Hat OpenShift cluster as a project administrator:
        oc login OpenShift_URL:port
      3. Run the following command to install the service.
        ./cpd-{Operating_System} \ 
                --load-from Image_directory_location \ 
                --assembly watson-language-translator \
                --optional-modules Modules \
                --version Assembly_version \
                --namespace Project \
                --storageclass Storage_class_name \
                --override Filepath_to_override.yaml \
                --cluster-pull-prefix Registry_from_cluster
                
        • Replace the {Operating_System} in the cpd-{Operating_System} command:
          • Linux: linux
          • Mac OS: darwin
        • For Image_directory_location, specify the location of the {cpd-Operating_System-workspace} directory.
        • {modules} Select one or more module and they should be comma separated.
          • watson-language-pak-1,watson-language-pak-2,watson-language-pak-3
        • For {assembly_version}, specify 1.1.2.
        • For Registry_from_cluster, specify the address of the internal OpenShift docker registry and add /{namespace} to it. The values are typically:
            • OpenShift 4.x: image-registry.openshift-image-registry.svc:5000
            • OpenShift 3.x: docker-registry.default.svc:5000
        • Specify overrides.yaml as the Filepath_to_override.yaml.
        • If you are using the internal Red Hat OpenShift registry, do not specify the --ask-pull-registry-credentials parameter.
        For example:
        ./cpd-linux --load-from ./cpd-{Operating_System}-workspace \
                --assembly watson-language-translator \
                --version 1.1.2 --namespace zen \
                --cluster-pull-prefix image-registry.openshift-image-registry.svc:5000/zen \
                --override overrides.yaml
        

What to do next

Verify the installation to ensure that all the assemblies and modules are running and that you can provision your instance of Watson Language Translator.
  1. Check the status of the assembly and modules
    ./cpd-linux status --namespace {namespace} --assembly watson-language-translator
    • {namespace} is the namespace IBM Cloud Pak for Data was installed into, normally zen.
  2. Set up your Helm environment:
    export TILLER_NAMESPACE=zen
    oc get secret helm-secret -n $TILLER_NAMESPACE -o yaml|grep -A3 '^data:'|tail -3 | awk -F: '{system("echo "$2" |base64 --decode > "$1)}'export HELM_TLS_CA_CERT=$PWD/ca.cert.pem
    export HELM_TLS_CERT=$PWD/helm.cert.pem
    export HELM_TLS_KEY=$PWD/helm.key.pem
    helm version --tls
    

    You should see output like this:

    Client: &version.Version{values}
    Server: &version.Version{values}
    
  3. See the instruction (from NOTES.txt within chart) after the helm installation completes for chart verification. The instruction can also be viewed by running the command:
    helm status {release-name} --tls
    
  4. Test the installation by running:
    helm test {release-name} --tls [--timeout=600] [--cleanup]
    
    • --timeout={time} waits for the time in seconds for the tests to run. Keep it above 600 as default
    • --cleanup deletes test pods upon completion
    • Remove the --tls flag for the described helm commands if your helm installation is not secured over tls.
    • Remove the --cleanup flag if you want to keep the test pod, e.g. to look at its logs.
  5. Navigate to your Cloud Pak for Data home page and provision a Watson Language Translator service instance:

    Get the hostname of the remote cluster where Watson Language Translator is being installed:

    oc get routes -n ${TARGET_NAMESPACE}

    In a browser, enter https://<hostname>:31843 in the address field and log in. Open the Add-ons page or Services page (located near the top right corner of the page) and select the Watson Language Translator tile. Select Provision instance in the menu.