HSM entries in the syslog file

The HSM client creates entries in the syslog file. The HSM entries in the syslog file and the dsmerror.log file can identify HSM problems.

The following HSM events are logged to the syslog file:

Start and stop of daemon processes
The start time and ID of daemon processes are logged as shown in the following figure.
Nov 20 08:48:38 nodeA dsmrecalld: HSM(pid:13756): start master 
Nov 20 08:48:38 nodeA dsmrecalld: HSM(pid:13797): start distributor 
Nov 20 08:48:38 nodeA dsmrecalld: HSM(pid:13798): start receiver 
Nov 20 08:48:38 nodeA dsmrecalld: HSM(pid:24026): start PERMANENT recall worker (ID:3;MIN:3;MAX:5)
Nov 20 08:48:38 nodeA dsmrecalld: HSM(pid:24027): start TEMPORARY recall worker (ID:4;MIN:3;MAX:5)
Nov 20 08:48:38 nodeA dsmrecalld: HSM(pid:24027): stop TEMPORARY recall worker (ID:4)
Nov 20 08:48:38 nodeA dsmrecalld: HSM(pid:20964): stop master

Log entries are created for the following daemon types:

Master

The master daemon owns the DMAPI sessions for a space-managed file system. The master daemon responds to data and system events of the DMAPI. Only one master daemon must run on each node where HSM is active. If the master daemon stops, the distributor and receiver daemons also stop.

Distributor

The distributor daemon manages recall operations on the node where the distributor daemon is running. The distributor daemon starts recall worker daemons. Only one distributor daemon must run on each node where HSM is active.

Receiver

The receiver daemon accepts recall requests from other cluster nodes and sends them to the local distributor daemon. Only one receiver daemon must run on each node where HSM is active.

Recall worker

The recall worker daemon completes recall operations.

If there are fewer recall worker daemons than the value of the MINRECALLDAEMONS option, the status of the recall worker daemon is permanent (PERMANENT). Otherwise, the status of the recall worker daemon is temporary (TEMPORARY). A temporary daemon is stopped after the file recall operation is finished.

The value of the MINRECALLDAEMONS option is indicated by the value of MIN in the log entry. The value of the MAXRECALLDAEMONS option is indicated by the value of MAX in the log entry.

Tip: Inspect the recall-worker daemon log entries and determine whether you have to adjust the value of the MAXRECALLDAEMONS and MINRECALLDAEMONS options to maximize recall processing.

If the syslog file contains many log entries where the recall worker daemon ID value equals the MAX value, increase the value of the MAXRECALLDAEMONS option.

If the syslog file indicates that many temporary recall worker daemons are stopped, increase the value of the MINRECALLDAEMONS option.

Send and receive signals

All signals that are sent from an HSM process or received by an HSM process are logged as shown in the following figure. Typically, a daemon process stops after the process receives a signal.

Nov 20 08:48:09 nodeA dsmwatchd: HSM(pid:7823): signal:15 (Terminated) send to pid:30579
Nov 20 08:48:09 nodeA dsmrecalld: HSM(pid:30579): signal:15 (Terminated) received

File system mount events

All mount events that are received by the HSM client are logged as shown in the following figure.

Nov 20 08:41:17 nodeA dsmrecalld: HSM(pid:30539): received DM_EVENT_MOUNT for fs:/gpfs2 type:DM_LOCAL_MOUNT
Nov 20 08:41:52 nodeA dsmrecalld: HSM(pid:30539): received DM_EVENT_MOUNT for fs:/gpfs1

Creation of dump files

A log entry is created when the dsmwatchd daemon creates a dump file.

Nov 20 08:41:52 nodeA dsmwatchd: HSM(pid:19418): created dump file: /tmp/hsm/dump.dmapi.2015.12.8.8.43.33
Nov 20 08:41:52 nodeA dsmwatchd: HSM(pid:19418): created dump file: /tmp/hsm/dump.dsmwatchd.2015.12.8.8.43.33

System events that stop the recall service
The following conditions can stop the dsmrecalld recall service:
  • The GPFS file system on a cluster node stops.
  • The dsmrecalld service does not respond. The PID file time stamp is not updated.
  • The number or combination of dsmrecalld daemon processes is not correct.
The following log shows system events that stop the recall service.
Nov 20 08:40:41 nodeA dsmwatchd: HSM(pid:7823): Stop local recall service. Reason: GPFS down
Nov 20 08:48:32 nodeA dsmwatchd: HSM(pid:7823): Restart local recall service. Reasons: invalid process list