Configuring switch failover for a Db2 pureScale environment on an InfiniBand network (AIX)

The configuration procedure detailed in this topic is specific to switches in environments with AIX® systems and an InfiniBand (IB) network. Switch failover capability is a high availability feature provided by the switch subnet manager (SM) that can be used in multiple switch environments.

Before you begin

Important: Starting from version 11.5.5, support for Infiniband (IB) adapters as the high-speed communication network between members and CFs in Db2® pureScale® on all supported platforms is deprecated and will be removed in a future release. Use Remote Direct Memory Access over Converged Ethernet (RoCE) network as the replacement.
  1. Ensure you have created your Db2 pureScale Feature installation plan. Your installation plan helps ensure that your system meets the prerequisites and that you have performed the preinstallation tasks.
  2. Ensure you have read about supported network topologies for Db2 pureScale environments in Network topology configuration support for Db2 pureScale environments.
  3. Power on the switch and connect an RJ11 serial cable or Ethernet cable to the switch.

About this task

The procedure details steps for configuring multiple switches to support switch failover, however configuring a single switch includes all steps except the last. Switch failover capability helps the resiliency, or fault tolerance, of a network. Switch failover helps to reduce the detrimental effects of a switch failure by having another switch become the subnet manager if the switch that is the subnet manager fails. Disabling the subnet manager failback setting helps to reduce the effect that the failure of the subnet manager has on network availability. By disabling subnet manager fail back, the secondary subnet manager remains the subnet manager when the original subnet manager rejoins the network after a failure.


Restrictions

Administrative access is required on the switches.

Procedure

  1. Connect a console, for example a notebook computer, to the switch.
    You can use a serial cable to connect to the switch. Alternatively, if you do not have access to a serial cable you can use an Ethernet cable. Follow the instructions to establish a connection for the cabling method you choose:
    Cable Instructions to establish a connection
    Serial cable
    1. Connect a console to the switch with a serial cable.
    2. Open a terminal session from the console to the switch with the following settings:
      • 8 data bits
      • no parity bits
      • 1 stop bit
      • 57.6K baud
      • VT100 emulation
      • Flow control = XON/XOFF
    Ethernet cable
    1. Connect a console to the switch with an Ethernet cable.
    2. Create a network connection, or modify an existing connection, to use an IP address on the same subnet as the switch. For example, if the IP address of the switch is 192.168.100.10 and the default netmask is 255.255.255.0, configure your console to have the IP address 192.168.100.9 with 255.255.255.0 as the netmask. If you do not know the IP address and netmask of the switch, see the documentation packaged with the switch for information about the default settings.
    3. Verify that you can ping the IP of the switch from the console.
    4. Open a telnet session to the switch.
  2. Configure the default IP and gateway for each switch.
    1. Logon to the command-line interface of the switch with the admin user ID and password.
      For information about the default admin ID and password, see the documentation for the switch. For information about switch name and type, see the installation prerequisites for Db2 pureScale Feature topic.
    2. Set the IP and subnet mask of the switch.
      Run the setCHassisIpAddr command with the -h parameter to specify the IP-address and the -m parameter to specify the subnet-mask.
      setCHassisIpAddr -h IP-address -m subnet-mask
    3. Set the default route for the switch with the setDefaultRoute command to use the default gateway IP.
      setDefaultRoute -h default-gateway
  3. Reboot the switches so that they use the new configuration.
    reboot
    
  4. Get the field replaceable unit (Fru) Global Unique Identifier (GUID) for each switch.
    You can use the web interface for the switch or the command-line interface (CLI):
    • In the web interface, click View Fru and take note the Fru guid field.
    • In the CLI, run the captureChassis command or the fruInfo command and take note of the FruGuid field.
    The field replaceable unit Global Unique Identifier is required to activate the license key for each switch.
  5. Activate the subnet manager license keys.
    You must activate the subnet manager license keys to allow connections to the switches. This can be done by activating the subnet manager license keys. For information about activating the subnet manager license keys, see the documentation packaged with your switch.
    Activate the switch. To activate switches that use the Intel firmware (like the IBM 7874 DDR switches), contact Intel support, and activate the keys for each switch.
    1. Click the License Key Activation link from the navigation menu. You might receive a prompt for input on how to handle an untrusted security certificate for the Intel website. You must accept the certificate to activate the license key.
    2. Enter the serial number of the switch you want to activate and click Continue. The serial number of the switch is in an envelope packaged with the switch. You might be required to enter an email address so that Intel can send the license key. Provide the email address of the network administrator responsible for the switch, or forward the email to the network administrator.
    3. Apply the license key by using the switch CLI or the web interface:
      • On the CLI of the switch, run the addkey command.
      • In the web interface, click License Keys > Key administration > Add key, enter the license key and click Apply.
    addkey XVARFW-5AKCQS-HDIWS1-EOCTKW-9J3K82-1
    showKeys
    --------------------------------------------------------
    Key number:  1
    Key:         XVARFW-5AKCQS-HDIWS1-EOCTKW-9J3K82-1
    Description: Subnet Manager License
    Status:      Active
    
    Note: New firmware versions from Intel do not require license key activation for the subnet manager license.
  6. Configure the switches so that the selection of the master subnet manager and standby subnet manager is automatic.
    Use the web interface or the CLI of the switches to start the subnet manager and configure the subnet manager to start when the switch reboots:
    • From the switch CLI, run the commandssmControl start to and smConfig startAtBoot.
    • Enter the web interface of the switch by entering its IP address into a browser. Click subnet manager > control > start to start the subnet manager. Click subnet manager > configuration > start at boot to start the subnet manager when the switch reboots.
    If the subnet manager is already running, you might encounter an error message reporting that the subnet manager is running. You can ignore this message:
    smControl start
    Starting the SM...
    Error trying to control the Subnet manager.
    Subnet manager is running. (master)
  7. Verify that the subnet manager is running.
    Run the smControl command with the status parameter. The subnet manager starts as master or standby:
    smControl status
    Subnet manager is running. (master)

    or

    smControl status
    Subnet manager is running. (standby)
    If the subnet manager starts as inactive, you must restart the subnet manager until it starts as either master or standby.
  8. Optional: If using multiple switches, you can change the priority on each switch to disable automatic failback of the subnet manager.
    In most switches, there are usually two priorities:
    • Switch priority - the switch priority determines which switch is selected as the subnet manager. A switch priority of 0 on all the switches results in the switches electing a subnet manager. Always set the switch priority to 0.
    • Elevated priority - the second priority (referred to as the elevated priority) is used to disable automatic failback to the original subnet manager. If this priority is set to 1, and the subnet manager fails and then comes back online, after the failed switch is available, the switch that took over as the subnet manager continues to be the subnet manager. This setting helps reduce unnecessary network delays that are incurred by failing back to the original subnet manager. Set this to 1.

    The steps to set the priorities is different for DDR and QDR InfiniBand switches.

    • DDR InfiniBand switch
      To set the two priorities, enter:
      smPriority 0 1
    • QDR InfiniBand switch
      To set the two priorities, modify the XML configuration file to set the priority and elevated priority:
      1. Download the file from the GUI. Go to Config File Admin > Subnet Manager Config File.
      2. Right click the name beside Current Config File, and save the file.
      3. Open the XML file and change <Priority> to 0, and <ElevatedPriority> to 1. For example:
        <!-- Priority and Elevated Priority control failover for SM, PM and BM. -->
        <!-- Priority is used during initial negotiation, high Priority wins. -->
        <!-- ElevatedPriority is assumed by winning master.  This can prevent -->
        <!-- fallback when previous master comes back on line.  -->
        <Priority>0</Priority> < !-- 0 to 15, higher wins --> 
        <ElevatedPriority>1</Priority> <!-- 0 to 15, higher wins --> 
      4. Save the XML file.
      5. Upload the modified XML file back onto the switch. Click Browse... beside the 'Upload config file: field. Select the modified file, and click Upload.
      6. To have the configuration file take effect, reboot the switch.

Results

The switch, or switches, are now configured for the Db2 pureScale environment.

Example

What to do next

Configure the network settings of the hosts, see Configuring the network settings of hosts in a Db2 pureScale environment on an InfiniBand network (AIX)