I just came across the AIX 7.1 Troubleshooting documentation
. It covers many solutions and problem-solving strategies on key topics. Like all official documentation, it has to cover some pretty obscure topics, too. But there are enough big ticket items to make it worth bookmarking.
I've put in some of my comments
to whet your appetite for further reading, or to spare you the trouble for topics on configurations which are not so common these days.
Find troubleshooting information located in the AIX® documentation library.
Select from the following topics:
- Accounting errors encountered when running the runacct command
- ATE troubleshooting <- That's the Asynchronous Terminal Emulator. If you're old enough to remember it, you make me feel young.
- Booting a system that crashed <- Summary - plug everything back in, run diags and power it all on again.
- BNU troubleshooting <- uucp commands, primarily for the days of dial up modems.
- Deactivating a SLIP connection
- Debug flags for sendmail
- EtherChannel troubleshooting
- Fixing a corrupt magic number in the file system superblock <- This is for JFS (not JFS2).
- Fixing a damaged file system <- unmount it (if it isn't already unmounted); run fsck. No luck? Then restore.
- Fixing incorrect accounting file permissions
- Fixing accounting errors <- Accounting system. Probably better managed by IBM Systems Director or other utilities.
- Fixes for stalled or unwanted processes <- Links to 3 down (Freeing a terminal taken over by processes)
- Fixing tacct errors
- Fixing wtmp errors
- Freeing a terminal taken over by processes <- Search and destroy (ps -ef and kill - not kill -9). Or if you're feeling nice, use renice to downgrade the process from business class.
- Modem troubleshooting <- What's a modem?
- NIS and NIS+ troubleshooting
- Paging space troubleshooting <- Increase paging space. (not to mention allocating more memory, tuning and killing runaway processes)
- QoS troubleshooting
- Reactivation of an inactive system <-Hit that power button, baby! (Here "system" refers to the hardware - the whole managed system/server, not a single LPAR/virtual server)
- Recreating a corrupted boot image <- Boot off product media (or NIM, or Virtual CD-ROM) and run bosboot.
- Rebooting a unresponsive system remotely <- Sounds impressive, doesn't it? Bring in the big guns. The system will respond to nothing other than a boot. I know people like that.
- Responding to screen messages
- Restoring user files from a backup image <- assumes you've used the AIX backup command.
- Resolving telnet or rlogin problems
- SLIP troubleshooting
- Systems that will not boot <- Here "system" refers to the operating system, not the entire managed system / server.
- Shutting down the system in an emergency <- shutdown -F (which many people think stands for "forced" but in fact it means "fast" - don't give the users a minute's warning (or yourself a minute to rethink)
- SNMP daemon troubleshooting
- TCP/IP troubleshooting <- Lots of sub-topics which are very relevant, including Name resolution, routing and other TCP/IP configuration problems)
- Terminating a process
- Testing the system battery <- diag -B -c (although I wonder whether this is more a function of the HMC or SDMC these days)
- Troubleshooting a system that does not boot from the hard disk
- Troubleshooting disk drive problems <- really for directly attached disk rather than SAN LUNs.
- Troubleshooting I/O devices <- for dedicated devices, rather than fully virtualised LPARs / virtual servers.
- Troubleshooting LVM <- Read this if synclvodm, varyonvg or exportvg/importvg don't do the trick.
- Troubleshooting mobile IPv6
- Troubleshooting your installation <- covers installing from mksysb, migrations to new AIX releases, full /usr file system and BOS installation procedures and error messages.
- Updating the holidays file <- Part of the accounting system.
- Workload Manager troubleshooting
Once again, the main page is AIX 7.1 Troubleshooting documentation