Configuring `multipath.conf` for Netezza Performance Server with connector nodes

Deployment options: Netezza Performance Server for Cloud Pak for Data System

Cloud Pak for Data System 2.0.2.0 is the first release that comes with a Red Hat OpenShift version with production support for configuring multipath with the system online.

Learn how to configure multipath.conf on if you are on Cloud Pak for Data System 2.0.2.0 or later for Netezza Performance Server to make configuration changes.

With previous releases, you edited config files with ssh and vi. With Cloud Pak for Data System 2.0.2.0 and later, a new procedure is introduced because Red Hat OpenShift 4.X with Red Hat CoreOS disallows editing config files directly on nodes and requires you to use machineconfig updates instead.

Before you begin

Cloud Pak for Data System 2.0.X with connector nodes comes preconfigured with multipath settings for two families of IBM products that are commonly tested and used with Cloud Pak for Data System 2.0.X. The families are:

IBM FlashSystem
IBM Storwize

If you have a storage product from these families, you might not need to configure the multipath.conf file.

Gather the required multipath settings for that storage product and check whether the vendor and product from those settings match the following settings.

If the vendor and product match, skip this procedure.

devices {
    device {
        vendor "IBM"
        product "FlashSystem-9840"
        path_selector "service-time 0"
        path_grouping_policy multibus
        path_checker tur
        rr_min_io_rq 4
        rr_weight uniform
        no_path_retry fail
        failback immediate
    }
    device {
         vendor "IBM"
         product "2145"
         path_grouping_policy "group_by_prio"
         path_selector "service-time 0"
         prio "alua"
         path_checker "tur"
         failback "immediate"
         no_path_retry fail
         rr_weight uniform
         rr_min_io_rq "1"
    }
}

About this task

During the procedure, in step 5, Netezza Performance Server is stopped for 1 - 2 hours because a new configuration is applied and the nodes that are designated for Netezza Performance Server host pods are rebooted.

You can complete steps 1 - 4 any time before you must stop Netezza Performance Server.

Tip: If you actively use the instance on production, plan the configuration ahead of time to account for the system outage (around 2 hours).

Procedure

Gather and save the vendor-specific multipath device settings in a text file from an existing Mako or other PureData System for Analytics system.
- If you already have a PureData System for Analytics system or another family of systems to refer to, the information is in the /etc/multipath.conf file that worked with the wanted SAN equipment or the same family of SAN equipment.
- If the information is not available, gather the necessary or recommended multipath settings from vendor documentation or from a vendor contact.
Prepare a working directory for the procedure:
1. ssh to e1n1.
```
ssh e1n1
```
2. Create a /root/multipath_work directory.
```
mkdir -p /root/multipath_work
```
3. Change directories to /root/multipath_work.
```
cd /root/multipath_work
```
Edit the newmultipath.conf file:
1. Create a copy of the /etc/multipath.conf file.
```
cp /etc/multipath.conf /root/multipath_work/newmultipath.conf
```
  newmultipath.conf specifies the name of a temporary file that is used for editing.
2. Open the newmultipath.conf file in an editor.
```
vi /root/multipath_work/newmultipath.conf
```
3. In the Devices section, add a device that represents the settings that are recommended by the SAN storage vendor.
  Important:
  Do not remove anything from the file. Add your device structure after the existing structures in the Devices section.
4. Encode the changes so that you can use the information in later steps.
```
base64 /root/multipath_work/newmultipath.conf | tr -d \\n > /root/multipath_work/base64multipath1.txt
```

Edit the /root/multipath_work/multipath-mcp.yaml file:

Open a new file that is called multipath-mcp.yaml with the vi editor and enter the insert mode.
```
vi /root/multipath_work/multipath-mcp.yaml
```

Copy and paste the following information into the file.

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: nps-shared
  name: nps-shared-multipathing
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
         source: data:text/plain;charset=utf-8;base64,<--multipath_conf_file_data_base64_encoded-->
        filesystem: root
        mode: 420
        path: /etc/multipath.conf
    systemd:
      units:
      - name: iscsid.service
        enabled: true
      - name: multipathd.service
        enabled: true

Replace the <--multipath_conf_file_data_base64_encoded--> section with the contents from the /root/multipath_work/base64multipath1.txt file.
Exit the insert mode and write-quit to save the file and exit the vi session.

Stop Netezza Performance Server:

oc -n NPS_NAMESPACE exec -it pod/ipshost-0 -c ipshost -- bash

```
su - nz
```
```
nzstop
```
```
exit
```
```
exit
```

oc -n NPS_NAMESPACE scale sts/ipshost --replicas=0

Netezza Performance Server is stopped and you are back on e1n1.

Unpause the nps-shared machineconfig pool so that the future steps do not hang.

oc patch mcp nps-shared --type json --patch '[{"op": "replace", "path": "/spec/paused", "value": false}]'

Example:

oc patch mcp nps-shared --type json --patch '[{"op": "replace", "path": "/spec/paused", "value": false}]'
machineconfigpool.machineconfiguration.openshift.io/nps-shared patched

Start the multipath reconfiguration.
```
oc create -f /root/multipath_work/multipath-mcp.yaml
```
This command triggers a rolling reboot that is called a machineconfig update and is managed by Red Hat OpenShift. The reboot takes place in a nondeterministic order among the nodes in the nps-shared pool, including the connector node.

Monitor the reboot until all of the fields in the UPDATED column show True.

oc get mcp

The reboot might take 5 - 15 minutes per node on the nps-shared pool.

Example:

First, the status changes to UPDATING = True and the nps-shared node role changes to SchedulingDisabled.

oc get mcp
NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master       rendered-master-d8537132ffb6d789ce8b2a7257833bf9       True      False      False      3              3                   3                     0                      14d
nps-shared   rendered-nps-shared-053d2105bc50eeb67b4cb50614e9a0da   False     True      False       3              0                   0                     0                      8d
unset        rendered-unset-442d08db52ce65c62d60b906718744f6        True      False      False      0              0                   0                     0                      14d
worker       rendered-worker-442d08db52ce65c62d60b906718744f6       True      False      False      3              3                   3                     0                      14d

oc get nodes
NAME                STATUS   ROLES               AGE   VERSION
e1n1-master.fbond   Ready    master              14d   v1.21.8+ee73ea2
e1n2-master.fbond   Ready    master              14d   v1.21.8+ee73ea2
e1n3-master.fbond   Ready    master              14d   v1.21.8+ee73ea2
e1n4.fbond          Ready    worker              14d   v1.21.8+ee73ea2
e2n1.fbond          Ready    worker              14d   v1.21.8+ee73ea2
e2n2.fbond          Ready    worker              14d   v1.21.8+ee73ea2
e2n3.fbond          Ready    nps-shared,worker   14d   v1.21.8+ee73ea2
e2n4.fbond          Ready,SchedulingDisabled  nps-shared,worker   14d   v1.21.8+ee73ea2
e5n1.fbond          Ready    nps-shared,worker   13d   v1.21.8+ee73ea2

Next, the nps-shared node status changes to NotReady:

oc get nodes
NAME                STATUS   ROLES               AGE   VERSION
e1n1-master.fbond   Ready    master              14d   v1.21.8+ee73ea2
e1n2-master.fbond   Ready    master              14d   v1.21.8+ee73ea2
e1n3-master.fbond   Ready    master              14d   v1.21.8+ee73ea2
e1n4.fbond          Ready    worker              14d   v1.21.8+ee73ea2
e2n1.fbond          Ready    worker              14d   v1.21.8+ee73ea2
e2n2.fbond          Ready    worker              14d   v1.21.8+ee73ea2
e2n3.fbond          Ready    nps-shared,worker   14d   v1.21.8+ee73ea2
e2n4.fbond          NotReady,SchedulingDisabled    nps-shared,worker   14d   v1.21.8+ee73ea2
e5n1.fbond          Ready    nps-shared,worker   13d   v1.21.8+ee73ea2

Then, the status goes back to Ready:

[root@gdlyos18 ~]# oc get nodes
NAME                STATUS   ROLES               AGE   VERSION
e1n1-master.fbond   Ready    master              14d   v1.21.8+ee73ea2
e1n2-master.fbond   Ready    master              14d   v1.21.8+ee73ea2
e1n3-master.fbond   Ready    master              14d   v1.21.8+ee73ea2
e1n4.fbond          Ready    worker              14d   v1.21.8+ee73ea2
e2n1.fbond          Ready    worker              14d   v1.21.8+ee73ea2
e2n2.fbond          Ready    worker              14d   v1.21.8+ee73ea2
e2n3.fbond          Ready    nps-shared,worker   14d   v1.21.8+ee73ea2
e2n4.fbond          Ready,SchedulingDisabled    nps-shared,worker   14d   v1.21.8+ee73ea2
e5n1.fbond          Ready    nps-shared,worker   13d   v1.21.8+ee73ea2

The cycle repeats for the other nodes in the nps-shared pool.

When the update completes, UPDATING changes to False:

oc get mcp
NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master       rendered-master-d8537132ffb6d789ce8b2a7257833bf9       True      False      False      3              3                   3                     0                      14d
nps-shared   rendered-nps-shared-053d2105bc50eeb67b4cb50614e9a0da   True      False      False      3              3                   3                     0                      8d
unset        rendered-unset-442d08db52ce65c62d60b906718744f6        True      False      False      0              0                   0                     0                      14d
worker       rendered-worker-442d08db52ce65c62d60b906718744f6       True      False      False      3              3                   3                     0                      14d

SSH to the connector node and verify the multipath settings.

Important:
Do not modify any files during an ssh session. If you want to modify files, you must do this by doing machineconfig updates.

Cloud Pak for Data System 2.0.X runs on Red Hat OpenShift 4.X with Red Hat CoreOS as the OS for the nodes. With CoreOS, the root file system and all config files are immutable. Any changes that you make manually during an ssh session, are automatically wiped away by CoreOS at unpredictable times.

Also, any changes that you make during an ssh session might cause the OS to become out of sync with the Red Hat OpenShift cluster state and prevent future operations (for example, Cloud Pak for Data System upgrades) from completing.
- If the connector node is e5n1.fbond
  1. On Cloud Pak for Data System, from e1n1, ssh into e5n1.fbond
```
ssh core@e5n1
```
  2. ```
  sudo su
```
3. Verify that the settings were changed successfully:
```
  cat /etc/multipath.conf
```
  Ensure that all the LUNs that were configured for use are in the output of the multipath -ll command, and that all paths are Active Ready Running.
  multipath -ll
  Tip:
  If you do not see the multipath devices, verify the following items:
  
  The LUNs are configured properly and have access to the WWNs of the Fibre Channel cards on the connector node.
  
  The FC connections are physically cabled between the SAN storage device and the SAN switch or switches and between the SAN switch or switches and the connector node or nodes.
  
  The relevant ports on the SAN switches are enabled and show link.
  
  The multipath settings, path_checker specifically, are correct.
  
  If issues occur, contact IBM Support and the vendors of the customer-owned SAN equipment.

Pause the nps-shared machineconfig pool to return it to the wanted state.

The nps-shared pool is paused except for select times to avoid an inadvertent outage.

oc patch mcp nps-shared --type json --patch '[{"op": "replace", "path": "/spec/paused", "value": true}]'

Example:

oc patch mcp nps-shared --type json --patch '[{"op": "replace", "path": "/spec/paused", "value": true}]'
machineconfigpool.machineconfiguration.openshift.io/nps-shared patched

Restart Netezza Performance Server:

```
oc -n NPS_NAMESPACE scale sts/ipshost --replicas=1
```
Wait for the ipshost pod to spawn and verify it is on a connector node. If the pod is not up in 5 minutes, verify that there are no issues.
```
watch "oc -n NPS_NAMESPACE get pods -o wide | grep ipshost"
```

oc -n NPS_NAMESPACE exec -it pod/ipshost-0 -c ipshost -- bash

What to do next

Deploying backup and restore setups

Configuring multipath.conf for Netezza Performance Server with connector nodes

Before you begin

About this task

Procedure

What to do next

Configuring `multipath.conf` for Netezza Performance Server with connector nodes