One-time setup before entering production

This section is an operational prerequisite for DR to function after you meet connector nodes, SAN, storage, and storage-level replication prerequisites.

Before you begin

Make sure that:

  • Both the production site and DR site are upgraded to the desired release, and ap issues are clear of any hardware issues or issues fundamental to the health of the system.
  • At both the production and DR sites:
    • The ap node -d command shows all nodes including the connector nodes ENABLED and connector nodes show personality as CN,VDB_HOST.
    • All connector nodes' /etc/multipath.conf is edited to add the multipath settings for the storage vendor's storage products. Do NOT remove any content from the existing /etc/multipath.conf. Instead, follow the example of device entries that exist in the file and add device with settings the vendor asks for.
      Note: If IBM® storage devices are used for the SAN, no need to add anything to /etc/multipath.conf because it already covers IBM storage device settings. Other vendors might have specific multipath requirements or recommendations. The storage admin must know about these multipath requirements or recommendations. If you cannot find any specific settings to add, you can continue with the defaults.
    • All connector nodes are physically cabled to site-local FC SAN storage.
    • Site-local FC SAN storage has an equal number of volumes and size per volume at both sites.
      Note: If you have a storage-level replication software, you can use it to operate between production and DR sites.
  • The storage admin is aware of the WWPNs on all relevant connector node HBAs at both sites. You can ssh to the connector nodes and run the following to get them.
    cat /sys/class/fc_host/host*/port_name
  • At both the production and DR sites, the storage admin has added the WWPNs for the connector nodes to the hosts access list for the BnR volumes on the site-local SAN storage device or appliance.
On fulfilling the prerequisites, both production and DR systems become online with NPS® running. Only the production system can take backups from or restore to the SAN. These prerequisites make the DR system ready to act as a failover when a disaster occurs.
Important: In the future, if the SAN storage that is administered for backups must be expanded, reduced, or vendors or models for the SAN equipment is changed, then there are special steps, including an additional outage. Those steps are not documented here. For assistance, contact IBM support.

Procedure

  1. At the production site, perform step 1 through step 6 mentioned in the Using connector nodes for FC backup section. You must NOT run this process at the DR site.
  2. Assuming that you did ssh to the first connector node as explained in step 3a of Using connector nodes for FC backup, run:
    1. apstop
      Watch ap state -d until the system is stopped.
      Note: This involves an outage at the production site.
    2. mmunmount ext_mnt -a
      Note: If failing due to target is busy, watch ap node until the connector node shows DISABLED on the output and then restart the connector node. When the connector node restarts, run mmunmount ext_mnt-a again. If it still shows target is busy, then repeat the same steps for the other connector node on the same system.
    3. mmlsmount all -L
      Note: It might take 2 minutes for the ext_mnt file system to be unmounted from all of the nodes.
    4. mmexportfs ext_mnt -o /home/ext_san.config
    5. scp /home/ext_san.config to the DR system e1n1 at the same location. Then, ssh to the DR system at e1n1 and scp that file to all connector nodes at the same location.
  3. On the production system, run:
    1. mmimportfs ext_mnt -i /home/ext_san.config
    2. mmmount ext_mnt -a
  4. Monitor mmlsmount all -L until ext_mnt mounted on all expected nodes (typically five nodes when two connector nodes exist).
  5. Run:
    apstart
    Watch ap state -d until it shows the system is Ready.
  6. Proceed with step 7 of Using connector nodes for FC backup (bringing NPS online).
    Note: With this step, the outage at the production site is over.
  7. At the DR site, make sure the LUNs or volumes that are intended for use are scanned or visible. This is to make sure that you discover or scan them when you need it. For that:
    1. Confirm with your storage administrator whether they have gathered the WWPNs for the ports on the connector nodes on the DR site and added them to the access list for the volumes they provisioned for this project.
    2. Run the following command on the DR system to see which node is the VDB_MASTER. If it is a connector node, ssh to that connector node.
      ap node -d 
    3. Run the following command on the connector node and see if the WWIDs for the storage LUNs that your storage admin has provisioned for this project are visible by the connector node. If yes, jump to step 9.
      multipath -ll
    4. Run the following command.
      ls /sys/class/fc_host/
      Note: All the host numbers are 18,19,20,21 or similar. Make sure to change the numbers in this example to match your output if needed.
    5. Run the following commands:
      • /usr/bin/rescan-scsi-bus.sh --hosts={18,19,20,21}
      • systemctl reload multipathd
      • multipath -v3
      • multipath -ll
  8. Repeat step 7 for all connector nodes on the DR system.
  9. Optional: Remove access or un-present the LUNs from the DR system connector node WWPNs.
    Note: Your third-party storage-level replication software might require this step to activate the replication going from the production site to the DR site. Typically, the software might require that the storage LUNs are only presented to one side at a time.