IBM Support

Diagnosing Problems with SMC-R and SMC-D

Product Documentation


Abstract

Things to look for when using SMC-R (RoCE) or SMC-D on z/OS Communications Server

Content

Diagnosing problems with SMC-R and SMC-D

Wireshark support for SMC-R is available with Wireshark Release 2.0 and beyond. The following URL provides the Wireshark 2.0 release notes indicating that the SMC-R support is now included (see section 2.3 New Protocol Support):

https://www.wireshark.org/docs/relnotes/wireshark-2.0.0.html

Note: Links to publications listed below will be added after z/OS GA.

SMC-R problems are often related to switch configuration, physical network ID (PNetID) configuration, and other configuration issues.

Common problems with using SMC-R communication include the following problems:

  • Switch configuration issues
  • Physical network ID configuration issues
  • No associated subnet mask
  • PFID status remains STARTING
  • Problem with SMC-R interaction with security function

The SMCReason field of the Netstat ALL/-A report and the SMCR field of the Netstat DEvlinks/-d report provide information that is related to SMC-R problems. For a complete list of SMCReason codes in the Netstat ALL/-A report and the SMCR Disabled reasons in the Netstat DEvlinks/-d report, see z/OS V2R1 Communications Server IP System Administrator's Commands.

Switch configuration issues

RDMA processing requires standard 10 GbE switch support, and distance limitations exist. Enable the global pause frame (a standard Ethernet switch feature for Ethernet flow control that is described in the IEEE 802.3x standard) on the switch.

When the SMCReason field of the Netstat ALL/-A report is 00005013 - RDMA CONNECTIVITY FAILURE, SMC-R was not able to complete the Link Confirm flow, which usually indicates a switch configuration issue. The Link Confirm message is the first data sent over the RoCE fabric. Check for the following issues:

  • If you are using VLANs, verify that the VLAN configuration on the RoCE switch ports is consistent with the VLAN configuration on the OSD switch ports.
    For example, the OSD switch ports might be configured properly with no VLAN ID or the default VLAN ID, but the RoCE switch ports have a different VLAN ID configured, such as trunk mode with VLAN IDs 400 and 500.
  • Verify that your cable is plugged into the correct port on the RoCE Express feature and into the correct port on the switch.
    For example, perhaps the cable is plugged into the correct port on the RoCE Express feature but it is plugged into the wrong port on the switch, or maybe the cable is plugged into the correct port on the switch but the wrong port on the RoCE Express feature.
  • Verify that the MTU value configured on the switch is large enough to support your configured MTU size for this interface.
    Hint: Enable jumbo frame support on the RoCE switch ports (when using 2K MTU).
  • Multiple switches are in use but the switch uplinks are not configured properly.

Verify that you have Ethernet flow control enabled on your switch. Ethernet flow control is implemented by using global pause frames. If this is not enabled, this can cause the switch to be over-run leading to packet loss.

For more information about configuring VLANs, see VLANID considerations in z/OS Communications Server IP Configuration Guide.

Physical network ID configuration issues

The TCP/IP stack must be able to determine which physical network is connected to a particular 10GbE RoCE Express interface, so that the 10GbE RoCE Express interface can be associated (associated RNIC interface) with the SMC-R capable IPAQENET or IPAQENET6 interfaces that connect to that same physical network.

Use the Netstat DEvlinks/-d and D NET,TRL,TRLE=xxxx commands to verify the physical network ID (PNetID) value on the OSD interfaces and the 10GbE RoCE Express interfaces.

  • If the Netstat DEvlinks/-d report for your OSD interface indicates SMCR: DISABLED (NO PNETID), ensure that you configured the PNetID value on the correct OSD port in the HCD definitions.
  • If you receive message EZD2028I with reason PNETID IS NOT CONFIGURED during 10GbE RoCE Express interface activation, ensure that you configured the PNetID value on the correct 10GbE RoCE Express port in the HCD definitions.
  • If the Netstat DEvlinks/-d report for your OSD interface indicates SMCR: Yes and your 10GbE RoCE Express interfaces initialized successfully, verify that the PNetID value of the OSD interface matches the PNetID value of the intended 10GbE RoCE Express interfaces.

In the HCD definitions, the Physical network ID 1 value is for port 1 on 10GbE RoCE Express features and port 0 on OSD adapters, and the Physical network ID 2 value is for port 2 on 10GbE RoCE Express features and port 1 on OSD adapters. The Physical network ID 3 and Physical network ID 4 values are not used.

For more information about configuring PNetIDs, see Physical network considerations in z/OS V2R1 Communications Server IP Configuration Guide.

No associated subnet mask

SMC-R is used only between peers whose IPv4 interfaces have the same subnet value or whose IPv6 interfaces have at least one prefix in common.

  • For IPv4, when a subnet mask value is not configured for the OSD interface, the SMCR field of the Netstat DEvlinks/-d report is DISABLED (NO SUBNET MASK).
  • For IPv4, you might also see that the SMCReason code in the Netstat ALL/-A report is 521E PEER SUBNET/PREFIX MISMATCH.
  • For IPv6, the SMCReason code in the Netstat ALL/-A report is 521E PEER SUBNET/PREFIX MISMATCH.

For information about associating your interfaces with the appropriate subnet or prefix, see Configuring Shared Memory Communications – RDMA in z/OS V2R1 Communications Server IP Configuration Guide.

PFID status remains STARTING

The PFIDSTATUS field is the RNIC interface PFID status. The following list describes the possible status values:

  • READY
    READY
    indicates that the initialization sequence with the PFID is complete and the PFID is now ready.
  • NOT ACTIVE
    NOT ACTIVE
    indicates that the PFID was never started or was stopped after it was started.
  • STARTING
    STARTING
    indicates that a START of the PFID was issued and TCP/IP sent an activation request to the Data Link Control (DLC) layer. This means z/OS Communications Server did not receive a port state change event that indicates the port is active from the RoCE Express adapter. Until the port state change event is received, the PFIDSTATUS remains in STARTING state.
  • If the PFIDSTATUS field does not change from STARTING to READY, take the following actions:

    • Check that your cables are connected properly.
    • Verify that the switch ports are enabled.
    • If the RoCE adapters are hard-wired to each other, the STARTING status is expected until the partner side has started the RNIC interface.
    • Verify that the optical cable used for the RoCE adapter is not damaged.
Problem with SMC-R interaction with security function

Generally, security functions that require TCP/IP to examine TCP packets cannot be used with SMC-R communications because data that is sent over SMC-R links is not converted into TCP packets. For more information, see Security functions in z/OS V2R1 Communications Server IP Configuration Guide.

Recommended Maintenance



See Info APAR
II14751
                           

[{"Product":{"code":"SSSN3L","label":"z\/OS Communications Server"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":"All","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"2.1;2.2","Edition":"","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

Document Information

Modified date:
17 June 2018

UID

swg27039578