EGO log files
Log files contain important runtime information about the general health of EGO daemons, workload submissions, and other EGO system events. Log files are an essential troubleshooting tool during production and testing.
The naming convention for most EGO log files is the name of the daemon plus the host name the daemon is running on.
The following table outlines the daemons and their associated log file names. Log files on Windows hosts have a .txt extension.
Daemon | Log file name |
---|---|
ESC (EGO service controller) | esc.log.hostname |
named | named.log.hostname |
PEM (Process Execution Manager) | pem.log.hostname |
VEMKD (enterprise grid orchestrator kernel daemon) | vemkd.log.hostname |
WSG (web service gateway) | wsg.log |
Most log entries are informational in nature. It is not uncommon to have a large (and growing) log file and still have a healthy cluster.
EGO log file locations
By default, most IBM® Spectrum LSF log files are found in LSF_LOGDIR .
- The service controller log files are found in LSF_LOGDIR/ego/cluster_name/eservice/esc/log (Linux) or LSF_LOGDIR\ego\cluster_name\eservice\esc\log (Windows).
- Web service gateway log files are found in the following locations:
- On UNIX and Linux: LSF_LOGDIR/ego/cluster_name/eservice/wsg/log
- On Windows: LSF_LOGDIR\ego\cluster_name\eservice\wsg\log
-
The service directory log files, logged by BIND, are found in the following locations:
- On UNIX and Linux: LSF_LOGDIR/ego/cluster_name/eservice/esd/conf/named/namedb/named.log.hostname
- On Windows: LSF_LOGDIR\ego\cluster_name\eservice\esd\conf\named\namedb\named.log.hostname
EGO log entry format
Log file entries have the following format
date time_zone log_level [process_id:thread_id] action:description/message
where the date is expressed in YYYY-MM-DD hh-mm-ss.sss.
2006-03-14 11:02:44.000 Eastern Standard Time ERROR [2488:1036] vemkdexit: vemkd is halting
EGO log classes
Every log entry belongs to a log class. You can use log class as a mechanism to filter log entries by area. Log classes in combination with log levels allow you to troubleshoot using log entries that only address, for example, configuration.
Use egosh debug to adjust log classes at run time.
Class | Description |
---|---|
LC_ALLOC | Logs messages related to the resource allocation engine |
LC_AUTH | Logs messages related to users and authentication |
LC_CLIENT | Logs messages related to clients |
LC_COMM | Logs messages related to communications |
LC_CONF | Logs messages related to configuration |
LC_CONTAINER | Logs messages related to activities |
LC_EVENT | Logs messages related to the event notification service |
LC_MEM | Logs messages related to memory allocation |
LC_PEM | Logs messages related to the process execution manager (pem) |
LC_PERF | Logs messages related to performance |
LC_QUERY | Logs messages related to client queries |
LC_RECOVER | Logs messages related to recovery and data persistence |
LC_RSRC | Logs messages related to resources, including host status changes |
LC_SYS | Logs messages related to system calls |
LC_TRACE | Logs the steps of the program |
EGO log levels
There are nine log levels that allow administrators to control the level of event information that is logged.
When you are troubleshooting, increase the log level to obtain as much detailed information as you can. When you are finished troubleshooting, decrease the log level to prevent the log files from becoming too large.
Number | Level | Description |
---|---|---|
0 | LOG_EMERG | Log only those messages in which the system is unusable. |
1 | LOG_ALERT | Log only those messages for which action must be taken immediately. |
2 | LOG_CRIT | Log only those messages that are critical. |
3 | LOG_ERR | Log only those messages that indicate error conditions. |
4 | LOG_WARNING | Log only those messages that are warnings or more serious messages. This is the default level of debug information. |
5 | LOG_NOTICE | Log those messages that indicate normal but significant conditions or warnings and more serious messages. |
6 | LOG_INFO | Log all informational messages and more serious messages. |
7 | LOG_DEBUG | Log all debug-level messages. |
8 | LOG_TRACE | Log all available messages. |
EGO log level and class information retrieved from configuration files
When EGO is enabled, the pem and vemkd daemons read the ego.conf file to retrieve the following information (as corresponds to the particular daemon):
-
EGO_LOG_MASK: The log level used to determine the amount of detail logged.
-
EGO_DEBUG_PEM: The log class setting for pem.
-
EGO_DEBUG_VEMKD: The log class setting for vemkd.
The wsg daemon reads the wsg.conf file to retrieve the following information:
-
WSG_PORT: The port on which the web service gateway runs.
-
WSG_SSL: Whether the daemon should use Secure Socket Layer (SSL) for communication.
-
WSG_DEBUG_DETAIL: The log level used to determine the amount of detail logged for debugging purposes.
-
WSG_LOGDIR: The directory location where the wsg.log files are written.
The service director daemon (named) reads named.conf to retrieve the log class and severity information. The configured severity log class controlling the level of event information that is logged (critical, error, warning, notice, info, debug, or dynamic). In the case of a log class set to debug, a log level is required to determine the amount of detail logged for debugging purposes.
Why do log files grow so quickly?
Every time an EGO system event occurs, a log file entry is added to a log file. Most entries are informational in nature, except when there is an error condition. If your log levels provide entries for all information (for example, if you have set them to LOG_DEBUG), the files will grow quickly.
The following are suggested settings:
- During regular EGO operation, set your log levels to LOG_WARNING. With this setting, critical errors are logged but informational entries are not, keeping the log file size to a minimum.
- For troubleshooting purposes, set your log level to LOG_DEBUG. Because of the quantity of messages you will receive when subscribed to this log level, change the level back to LOG_WARNING as soon as you are finished troubleshooting.
How often should I maintain log files?
The growth rate of the log files is dependent on the log level and the complexity of your cluster. If you have a large cluster, daily log file maintenance may be required.
You should use a log file rotation utility to do unattended maintenance of your log files. Failure to do timely maintenance could result in a full file system which hinders system performance and operation.