IBM Support

Troubleshooting tftpd issue in xcatd service

Troubleshooting


Problem

In HPC system's xcatd service, if the tftpd server is abnormal, it leads to the installation of compute node hang because the package can't be downloaded from the head node. The screen snapshots is as below:
TFTP BOOT ---------------------------- Server IP...............10.116.54.66 Client IP...............10.116.54.120 Gateway IP..............10.116.54.66 Subnet Mask.............255.255.255.192 (1) Filename............/boot/grub2/grub2.ppc TFTP Retries............5 Block Size..............512

Diagnosing The Problem

The below command can be used to check the tftpd service status:

1. Check the tftpd process status on head node

#ps -eaf|grep tftpd
[root@cluster sbin]# ps -eaf |grep tftp
root 23974 1 0 Dec12 ? 00:00:00 /usr/sbin/in.tftpd -v -l -s /tftpboot -m /etc/tftpmapfile4xcat.conf

2. Use lsof -i to check tftpd is listening on udp port 69:

#lsof -i udp:69
[root@cluster sbin]# lsof -i udp:69
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
in.tftpd 23974 root 4u IPv4 77490 0t0 UDP *:tftp
in.tftpd 23974 root 5u IPv6 77491 0t0 UDP *:tftp
In the sample issue, the tftpd daemon process is missing.

Resolving The Problem

To fix this issue, restart head node or restart the tftpd service.

[{"Product":{"code":"SSENRW","label":"Platform HPC for System x"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"PXE-TFTP","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.2","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 May 2021

UID

isg3T1026724