Analyzing adjacency failures

An adjacency failure is reported by the neighbor event code in the OMPROUTE informational message provided. The events that are associated with adjacency failures are indicated by codes 7 – 15:

Table 1. Neighbor event codes
Event Description Explanation
1 Hello received OMPROUTE received hello packets from a neighbor.
2 Start sending hellos OMPROUTE is ready to start sending hello packets to a neighbor.
3 Two-way communication OMPROUTE reached bidirectional communication with a neighbor. The neighbor and OMPROUTE received and acknowledged hello packets from each other.
4 Ready to form adjacency OMPROUTE is ready to establish adjacency with a neighbor.
5 Master/slave role negotiation is done OMPROUTE completed negotiating the master and slave roles with a neighbor. This signals the start of sending and receiving of database descriptor packets.
6 Database exchange is done OMPROUTE completed exchanging the database descriptor packets for the network topology information. Each router now knows which part of its link state database (LSDB) is outdated.
7 Sequence number mismatch OMPROUTE received a sequence number mismatch in a database descriptor packet. A neighbor might be attempting to restart the adjacency for some reason, resulting in sequence number mismatches. This event indicates that the neighbor was not receiving hello packets from OMPROUTE, experienced event 12 on its side, and is trying to restart.
8 Bad link state request OMPROUTE received a bad Link State request (LSA) packet from a neighbor. A subverted router in the network possibly modified the contents of an LSA to result in maximum sequence number or maximum age attacks. From the LSA floods, the neighbors replace the good LSA with the bad LSA as newer in their databases until they naturally age out in a maximum of 1 hour.
9 Loading is done OMPROUTE completed the loading of the link state information for the link state database, which is based on the link state request packets from a neighbor.
10 One-way communication only OMPROUTE does not see its router ID in the hello packets that are received by a neighbor. If a neighbor is not receiving the hello packets with the OMPROUTE router ID, it assumes that OMPROUTE is down and removes the OMPROUTE router ID from its list of neighbors. The neighbor does not include the OMPROUTE router ID in the hello packets sent to OMPROUTE.
11 Neighbor is down Because the OMPROUTE routing interface is down, it is unable to communicate with a neighbor over that interface. The neighbor is set to down state.
12 No hellos were seen recently OMPROUTE did not receive hello packets from a neighbor for a full DEAD_ROUTER_INTERVAL and as a consequence, OMPROUTE assumes that the neighbor is down.
14 Start adjacency establishment Start of changeOMPROUTE starts to establish an adjacency with a neighbor after it reaches bidirectional communication with the neighbor. OMPROUTE sends database descriptor packets to negotiate master or slave status with the neighbor to establish an adjacency.End of change
15 Failure to thrive OMPROUTE was trying to establish adjacency with a neighbor but failed to complete within the DB_EXCHANGE_INTERVAL
16 Adjacency attempts threshold is reached Start of changeOMPROUTE has reached the futile neighbor state loop threshold (DR_MAX_ADJ_ATTEMPT) for adjacency attempts with a neighbor. For more information about the actions to be taken, see Preventing futile neighbor state loops during adjacency formation in z/OS Communications Server: IP Configuration Guide.End of change

With the neighbor event code, the old neighbor state indicates the highest neighbor state that OMPROUTE reached for the adjacency attempt and the new neighbor state indicates the changed state. When OMPROUTE cannot reach the two-way state (8) or loses the bidirectional communication with a neighboring router, OMPROUTE reverts to the lesser neighbor states 1 – 4. On non-multiaccess networks such as point-to-point links, OMPROUTE attempts to reach full adjacency with all neighboring routers. On multiaccess networks such as LANs (including HiperSockets™ LANs), OMPROUTE attempts to reach full adjacency only with neighbors that are designated routers or backup designated routers. If OMPROUTE is the designated router or the backup designated router on a multiaccess network, it attempts to reach full adjacency with all neighbors.

Guideline: OMPROUTE is not intended to be a designated router or a backup designated router on multiaccess networks that include dedicated routers. The dedicated routers can provide the designated router function. Configure OMPROUTE to assume the designated router role only when no dedicated routers are available, such as in a HiperSockets LAN in which all the neighbors are z/OS® LPARs.

Starting at the two-way state for bidirectional communication, the neighbor states 8 – 128 are used by OMPROUTE in attempts to reach full adjacency with a neighboring router, as shown in the following table:

Table 2. Neighbor states
State Description Explanation
1 Neighbor is down or unreachable No hello packets are received by the neighbor. Possible causes are as follows:
  • The routing interface in OMPROUTE is down
  • The neighbor is inactive
  • The neighbor is not OSPF-enabled
  • The neighbor is not sending hello packets with its unique router ID and matching attributes (hello interval, dead router interval, subnet mask, area ID, and security scheme)
  • Hello packets were dropped between OMPROUTE and the neighbor
  • Hello packets are corrupted
2 Attempting to contact neighbor OMPROUTE sent hello packets to the network but no hello packets were received from a neighbor. This state is valid only for manually configured neighbors in a non-broadcast multiaccess (NBMA) environment. For possible causes, see state 1.
4 One-way communication OMPROUTE received hello packets from a neighbor with the router ID of the neighbor. If OMPROUTE remained in this state, acknowledgments to hello packets that contain the OMPROUTE router ID were not received from the neighbor. Possible causes are as follows:
  • The neighbor did not receive the hello packets from OMPROUTE that contain its router ID
  • The neighbor is not including the OMPROUTE router ID in its hello packets that are sent to OMPROUTE
  • Hello packets were dropped between OMPROUTE and the neighbor
  • Hello packets are corrupted
  • New designated router must be reelected after lost communications with designated router and backup designated router
8 Two-way communication The neighbor and OMPROUTE received and acknowledged hello packets from each other. In the list of neighbors, OMPROUTE has the neighbor router ID and the neighbor has the OMPROUTE router ID as learned from the hello packets. If OMPROUTE remained in or reverted to this state during the adjacency formation, possible causes are as follows:
  • Neither OMPROUTE or the neighbor is the designated router or backup designated router for the network. On a multiaccess network, full adjacencies are established only with the designated router and its backup.
  • A database descriptor packet is received with:
    • Mismatched sequence number
    • Unexpected init bit set
    • Options field differs from the last Options field received
  • Acknowledgments to database descriptor packets were dropped between OMPROUTE and the neighbor.
16 Database exchange start OMPROUTE is negotiating master and slave roles with a neighbor. Designated routers and backup designated routers establish a master-slave relationship and choose the initial sequence number for the adjacency formation. The neighbor with the higher router ID becomes the master and starts the exchange. The master is also responsible for the sequence number increment. If OMPROUTE remained in this state, possible causes are as follows:
  • Mismatched interface maximum transmission unit (MTU) values between OMPROUTE and the neighbor result in packet losses
  • Duplicate router IDs on neighbors
  • Access list is blocking the unicasts
  • NAT translating the unicasts
32 Database exchange OMPROUTE is exchanging database information with a neighbor in the form of database descriptor packets. These packets contain LSA headers only and describe the contents of the entire link state database (LSDB). Each database descriptor packet has a sequence number that only the master can increment and that the slave explicitly acknowledges. The packet contents that are received are compared with the information contained in the LSDB for new or more current link state information. Routers also send link state request packets and link state update packets that contain the entire LSA. If OMPROUTE remained in this state, possible causes are as follows:
  • Corrupted database descriptor packets that are sent by a neighbor or by a network switch
  • See state 16 for other possible causes
64 Loading OMPROUTE is requesting newer pieces from a neighbor database that are more up-to-date in the form of link state request packets. Based on the information that the database descriptor packets provide, routers send link state request packets. The neighbor then provides the requested link state information in link state update packets. During the adjacency formation, if a router receives an outdated or missing LSA, it requests that LSA by sending a link state request packet. All link state update packets are acknowledged. If OMPROUTE remained in this state, possible causes are as follows:
  • Corrupted link state request packets that are sent by a neighbor or by a network switch
  • See state 16 for other possible causes
128 Full OMPROUTE established full adjacency with a neighbor. Routers achieve the full state with their designated router and backup designated router only, and neighbors always see each other as two-way state (8).

The order of the neighbor transit states for establishing adjacency is 8, 16, 32, 64, and 128. Whenever a problem is detected for some reason between those states before full adjacency is reached, OMPROUTE resets the neighbor state to two ways (8) and repeats the process on a continuous basis even to the point where it becomes futile. A futile neighbor state loop is seen as a successive repetitive pattern of transit states and ones that do not seem to reach full adjacency. For example, typical patterns are: 8-16, 8-16 or 8-16-32, and 8-16-32. After each adjacency failure, OMPROUTE continues to attempt to establish adjacency with a neighbor over the same network interface. If futile neighbor state loop detection is enabled and if there are redundant parallel interfaces (primary or backup) attached to the same LAN segment available, OMPROUTE suspends the problematic interface and tries the adjacency attempt over the alternative interface again. The other option is to use the MODIFY OMPROUTE commands to manually suspend and activate an alternative redundant parallel OSPF interface so that adjacency with the neighbor is attempted over that interface.

For information about futile neighbor state loops, see the topic about network design considerations with z/OS Communications Server in z/OS Communications Server: IP Configuration Guide. For details about MODIFY OMPROUTE commands, see z/OS Communications Server: IP System Administrator's Commands.

OMPROUTE can drop adjacencies under the following conditions:

  • Other workloads on the z/OS system keep OMPROUTE from dispatching enough processor cycles:
    • Dumps are being taken while OMPROUTE is running. All address spaces are marked non-dispatchable during a dump processing. If the dump takes longer than a DEAD_ROUTER_INTERVAL, the adjacencies fail.
    • There are too many other address spaces that are running at a higher priority than OMPROUTE. Because OMPROUTE is a time-sensitive application and manages the TCP/IP routing table, set the OMPROUTE priority to be one less than the TCP/IP dispatching priority. If you are using WLM goal modes, place OMPROUTE in the same service class as TCP/IP.
  • Not enough dispatching for OMPROUTE:
    • The dispatching priority for OMPROUTE is too low. Either increase the OMPROUTE dispatching priority or increase the DEAD_ROUTER_INTERVAL values.
    • OMPROUTE is running as a BPXBATCH program. Because there are other applications that are using the BPXBATCH program, they might steal processor cycles from OMPROUTE. Change OMPROUTE not to use the BPXBATCH program.
  • Increased workload in OMPROUTE:
    • OMPROUTE is a designated router or a backup designated router, and link state database management-related tasks can contribute to high workloads. These high workloads can affect OMPROUTE in processing of inbound hello packets necessary to maintain adjacencies with its neighbors. Either increase the DEAD_ROUTER_INTERVAL values or change the ROUTER_PRIORITY values to reduce the likelihood of OMPROUTE becoming elected as a designated router. A z/OS system is not designed to be a full-fledged router and it is best to offload this work of link state database management to the neighboring network routers. That is, configure the network routers on the attached LAN segment to be elected as designated routers when possible.
    • OMPROUTE is running with too much tracing. OMPROUTE debug trace with file I/O can contribute to adjacency failures (for example, missed hello packets). Use the OMPROUTE CTRACE method when possible.
    • OMPROUTE routing table is large. After OMPROUTE has more than 1000-2000 routes, adjacency failures (for example, missed hello packets) might occur because of the increased workload from processing routing table and link state updates. Configure OMPROUTE to use stub areas when possible and try to keep z/OS out of backbone areas.
    • OMPROUTE has too many adjacencies. Too many adjacencies might be notable when you are using XCF in a sysplex environment. Because of increased workload from adjacency communications, adjacency failures (for example, missed hello packets) might occur. Determine whether it is necessary for XCF interfaces to be configured to use OSPF.
  • Network hardware problem:
    • Attached switch or router not functioning correctly.
    • Poor or faulty network cable connections.

      A network hardware problem that is beyond detection by TCP/IP or OMPROUTE can contribute to adjacency failures and futile neighbor state loops. If futile neighbor state loop detection is enabled and if there are redundant parallel interfaces that are attached to the same LAN segment, OMPROUTE attempts adjacency with the neighbor over an alternative interface. When necessary, use the MODIFY OMPROUTE commands to manually suspend and activate an alternative redundant parallel OSPF interface so that adjacency with the neighbor is attempted over that interface. OMPROUTE might circumvent the network hardware problem by using the alternative interface.

For the symptom of missed inbound or outbound hello packets, the TCP/IP stack might not be getting dispatched often enough to forward the hello packets to or from OMPROUTE. In this case, ensure that appropriate dispatching priorities are assigned to the TCP/IP stack and OMPROUTE.

To track adjacency problems, take the following steps:

  • Issue the command to display the OSPF interfaces and analyze the following fields from the report:
    • STATE for the current interface state
    • #NBRS for the total number of neighbors whose hellos were received, plus those that were configured.
    • #ADJS for the total number of neighbors in state exchange or greater. These neighbors are the neighbors with whom the router is synchronized or is in the process of synchronization.
  • Issue the command to display a detailed OSPF interface and analyze the following fields from the report:
    • DESIGNATED ROUTER to determine whether OMPROUTE is a designated router or not.
    • BACKUP DR to determine whether OMPROUTE is a backup designated router or not.
    • DR PRIORITY to determine the interface router priority. A higher value indicates that this OMPROUTE is more likely to become the designated router. A value of 0 indicates that OMPROUTE can never become the designated router.
    • #NEIGHBORS for total number of neighbors whose hellos were received, plus those that were configured.
    • #ADJACENCIES for total number of neighbors in state Exchange or greater. These neighbors are the neighbors with whom the router is synchronized or is in the process of synchronization.
    • #FULL ADJS for total number of full adjacencies. This number is the number of neighbors whose state is Full (and therefore with which the router synchronized databases).
    • #MCAST FLOODS for the total number of link state updates that flooded the interface (not counting retransmissions).
  • Issue the command to display the OSPF neighbors and analyze the following field from the report:
    • STATE for the current neighbor state.
  • Issue the command to display a detailed OSPF neighbor and analyze the following fields from the report:
    • NEIGHBOR STATE for the current neighbor state.
    • DR PRIORITY to determine the neighbor router priority.
    • #ADJ RESETS for the total number of transitions to state ExStart from a higher state.
    • #NBR LOSSES for total number of times the neighbor made the transition to the down state.