Determining the health of integrated SMB server
There are some IBM Storage Scale commands to determine the health of the SMB server.
The following commands can be used to determine the health of SMB
services:
- To check the overall CES cluster state, issue the following command:
The system displays output similar to this:mmlscluster --ces
GPFS cluster information ======================== GPFS cluster name: boris.nsd001st001 GPFS cluster id: 3992680047366063927 Cluster Export Services global parameters ----------------------------------------- Shared root directory: /gpfs/fs0 Enabled Services: NFS SMB Log level: 2 Address distribution policy: even-coverage Node Daemon node name IP address CES IP address list ----------------------------------------------------------------------- 4 prt001st001 172.31.132.1 10.18.24.25 10.18.24.32 10.18.24.34 10.18.24.36 9.11.102.89 5 prt002st001 172.31.132.2 9.11.102.90 10.18.24.19 10.18.24.21 10.18.24.23 10.18.24.30 6 prt003st001 172.31.132.3 10.18.24.38 10.18.24.39 10.18.24.41 10.18.24.42 9.11.102.43 7 prt004st001 172.31.132.4 9.11.102.37 10.18.24.26 10.18.24.28 10.18.24.18 10.18.24.44 8 prt005st001 172.31.132.5 9.11.102.36 10.18.24.17 10.18.24.33 10.18.24.35 10.18.24.37 9 prt006st001 172.31.132.6 9.11.102.41 10.18.24.24 10.18.24.20 10.18.24.22 10.18.24.40 10 prt007st001 172.31.132.7 9.11.102.42 10.18.24.31 10.18.24.27 10.18.24.29 10.18.24.43
This shows at a glance whether nodes are failed or whether they host public IP addresses. For successful SMB operation at least one CES node must be HEALTHY and hosting at least one IP address.
- To show which services are enabled, issue the following command:
The system displays output similar to this:mmces service list
Enabled services: NFS SMB NFS is running, SMB is running
For successful SMB operation, SMB needs to be enabled and running.
- To determine the overall health state of SMB on all CES nodes, issue the following
command:
The system displays output similar to this:mmces state show SMB -a
NODE SMB prt001st001 HEALTHY prt002st001 HEALTHY prt003st001 HEALTHY prt004st001 HEALTHY prt005st001 HEALTHY prt006st001 HEALTHY prt007st001 HEALTHY
- To show the reason for a currently active (failed) state on all
nodes, issue the following command:
The system displays output similar to this:mmces events active SMB -a
NODE COMPONENT EVENT NAME SEVERITY DETAILS
In this case nothing is listed because all nodes are healthy and so there are no active events. If a node was unhealthy it would look similar to this:
NODE COMPONENT EVENT NAME SEVERITY DETAILS prt001st001 SMB ctdb_down ERROR CTDB process not running prt001st001 SMB smbd_down ERROR SMBD process not running
- To show the history of events generated by the monitoring framework, issue the following
command:
The system displays output similar to this:mmces events list SMB
NODE TIMESTAMP EVENT NAME SEVERITY DETAILS prt001st001 2015-05-27 14:15:48.540577+07:07MST smbd_up INFO SMBD process now running prt001st001 2015-05-27 14:16:03.572012+07:07MST smbport_up INFO SMB port 445 is now active prt001st001 2015-05-27 14:28:19.306654+07:07MST ctdb_recovery WARNING CTDB Recovery detected prt001st001 2015-05-27 14:28:34.329090+07:07MST ctdb_recovered INFO CTDB Recovery finished prt001st001 2015-05-27 14:33:06.002599+07:07MST ctdb_recovery WARNING CTDB Recovery detected prt001st001 2015-05-27 14:33:19.619583+07:07MST ctdb_recovered INFO CTDB Recovery finished prt001st001 2015-05-27 14:43:50.331985+07:07MST ctdb_recovery WARNING CTDB Recovery detected prt001st001 2015-05-27 14:44:20.285768+07:07MST ctdb_recovered INFO CTDB Recovery finished prt001st001 2015-05-27 15:06:07.302641+07:07MST ctdb_recovery WARNING CTDB Recovery detected prt001st001 2015-05-27 15:06:21.609064+07:07MST ctdb_recovered INFO CTDB Recovery finished prt001st001 2015-05-27 22:19:31.773404+07:07MST ctdb_recovery WARNING CTDB Recovery detected prt001st001 2015-05-27 22:19:46.839876+07:07MST ctdb_recovered INFO CTDB Recovery finished prt001st001 2015-05-27 22:22:47.346001+07:07MST ctdb_recovery WARNING CTDB Recovery detected prt001st001 2015-05-27 22:23:02.050512+07:07MST ctdb_recovered INFO CTDB Recovery finished
- To retrieve monitoring state from health monitoring component,
issue the following command:
The system displays output similar to this:mmces state show
NODE AUTH NETWORK NFS OBJECT SMB CES prt001st001 DISABLED HEALTHY HEALTHY DISABLED DISABLED HEALTHY
- To check the monitor log, issue the following
command:
The system displays output similar to this:grep smb /var/adm/ras/mmsysmonitor.log | head -n 10
2016-04-27T03:37:12.2 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596 2016-04-27T03:37:27.2 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596 2016-04-27T03:37:42.3 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596 2016-04-27T03:37:57.2 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596 2016-04-27T03:38:12.4 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596 2016-04-27T03:38:27.2 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596 2016-04-27T03:38:42.5 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596 2016-04-27T03:38:57.2 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596 2016-04-27T03:39:12.2 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596 2016-04-27T03:39:27.6 prt2st1 I Monitor smb service LocalState:HEALTHY Events:0 Entities:0 - Service.monitor:596
- The following logs can also be checked:
/var/adm/ras/* /var/log/messages