Verifying a repair
Learn how to verify hardware operation after you make repairs to the system.
- Power on the system.
- Did you replace a graphics processing unit (GPU), PCIe adapter, disk drive, or solid-state
drive?
If Then Yes: Go to step 5. No: Continue with the next step. - Scan the system event logs (SELs) for serviceable events that occurred after system hardware was replaced. For information about SELs that require a service action, see Identifying a service action by using system event logs.
- Did any serviceable SEL events occur after hardware was replaced?
If Then Yes: The problem is not resolved. Go to Identifying a service action by using system event logs and complete the service actions indicated. This ends the procedure. No: The problem is resolved. This ends the procedure. - Use the following table to determine the verification action to complete:
Table 1. Determining a verification action for GPUs, PCIe adapters, and devices Adapter type Verification action Devices that are controlled by a RAID adapter Complete the following steps: - Install the arcconf utility for the RAID adapter.
- Type ARCCONF GETSMARTSTATS 1 at the command prompt and press Enter.
- Verify that the self-monitoring, analysis and reporting technology system (SMART) health assessment for the device passed.
Devices that are not controlled by a RAID adapter Complete the following steps: - Install the smartmontools utility.
- Type apt-get install smartmontools at the command prompt and press Enter.
- At the command prompt, type smartctl --all /dev/sdx, where x is the letter that is associated with the drive.
- Verify that the SMART health assessment passed.
GPU Complete the following steps: - Type nvidia-smi -L at the command prompt and press Enter. Verify that the GPU is listed.
- Type nvidia-smi -q at the command prompt and press Enter. Verify that no errors are listed.
Network adapter Complete the following steps: - At the command prompt, type ethtool ethx, where x is the number of the physical port that you are testing. Verify that the connection speed that is indicated in the output is correct.
- Perform a ping test to verify the network connectivity.
RAID adapter Complete the following steps: - Install the arcconf utility for the RAID adapter.
- Type ARCCONF GETLOGS 1 STATS at the command prompt and press Enter.
- Verify that usage statistics are returned. The presence of usage statistics indicates that the adapter is functioning properly.
Parent topic: Beginning troubleshooting and problem analysis
Last updated: Thu, December 02, 2021