How To
Summary
This document contains SQL queries for analyzing IBM i History Log (QHST) using the QSYS2.HISTORY_LOG_INFO table function. These queries help identify system issues, job failures, save, restore problems, and operational trends.
Objective
This document provides 10 comprehensive queries for operational analysis of IBM i History Log:
- Job Completion Analysis - Track job success and failure patterns using CPF1164 with End Code analysis
- Jobs That Ended Abnormally (CPF1164 Analysis) - Detailed analysis of abnormal job terminations with end code classification
- Subsystem Activity Monitoring - Monitor subsystem lifecycle and abnormal ends
- Message Frequency Analysis - Identify high-frequency messages and patterns
- Operational Summary Dashboard - Executive-level daily health metrics with trend analysis
- Device and Hardware Issues - Monitor device errors and hardware failures (excludes CPF2234)
- Communication and Network Issues - Track network errors and connectivity problems
- Save/Restore and Backup Operations - Monitor ENDSBS, SAVSYS, SAVLIB, SAVDLO, and SAV operations
- Performance and Resource Issues - Track storage, ASP, and system resource problems
- Comprehensive Message Pattern Analysis - Advanced pattern analysis across all message categories
All queries include:
- Comprehensive documentation
- Operational context
- Performance optimization with appropriate time windows
- Specific message ID references
- Use case examples
- Best practices for implementation
Use these queries as part of a comprehensive IBM i operational monitoring strategy to maintain system health, identify issues early, and optimize performance.
Environment
Minimum Version Required: IBM i V7R3M0 or higher
Steps
1. Job Completion Analysis
Description: Analyze job completion patterns to identify normal vs abnormal terminations
-- category: IBM i Services
-- description: Operations - Job completion analysis
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Job Completion Analysis
-- Purpose: Analyze job completion patterns:
-- - Normal completions
-- - Abnormal terminations
-- - Job timing patterns
-- - Completion trends
-- =====================================================================
SELECT
HL.MESSAGE_TIMESTAMP,
HL.FROM_JOB,
HL.FROM_USER,
HL.MESSAGE_ID,
HL.MESSAGE_TEXT,
-- Classify completion type
CASE HL.MESSAGE_ID
WHEN 'CPC1221' THEN 'NORMAL_COMPLETION'
WHEN 'CPC1224' THEN 'ABNORMAL_COMPLETION'
WHEN 'CPF1164' THEN 'JOB_ENDED'
WHEN 'CPF1124' THEN 'JOB_STARTED'
ELSE 'OTHER'
END AS COMPLETION_TYPE,
-- Extract job information from FROM_JOB (format: number/user/jobname)
CASE
WHEN LOCATE('/', HL.FROM_JOB) > 0 THEN
SUBSTR(HL.FROM_JOB, 1, LOCATE('/', HL.FROM_JOB) - 1)
ELSE ''
END AS JOB_NUMBER,
CASE
WHEN LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) > 0 THEN
SUBSTR(HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1,
LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) - LOCATE('/', HL.FROM_JOB) - 1)
ELSE ''
END AS JOB_USER,
CASE
WHEN LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) > 0 THEN
SUBSTR(HL.FROM_JOB, LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) + 1)
ELSE HL.FROM_JOB
END AS JOB_NAME_ONLY,
-- Time of day analysis
HOUR(HL.MESSAGE_TIMESTAMP) AS HOUR_OF_DAY,
DAYNAME(HL.MESSAGE_TIMESTAMP) AS DAY_OF_WEEK
FROM TABLE(QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 7 DAYS
)) AS HL
WHERE HL.MESSAGE_ID IN (
'CPC1221', -- Job completed normally
'CPC1224', -- Job completed abnormally
'CPF1164', -- Job ended
'CPF1124' -- Job started
)
ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;
Use Cases:
- Monitor job completion rates
- Identify jobs with frequent abnormal terminations
- Analyze job timing patterns
- Track batch job execution
Key Message IDs:
- CPC1221: Job completed normally
- CPC1224: Job completed abnormally (investigate cause)
- CPF1164: Job ended (general termination)
- CPF1124: Job started (for duration analysis)
2. Jobs That Ended Abnormally (CPF1164 Analysis)
Description: Analyze jobs that ended with CPF1164 message, focusing on abnormal end codes (20+)
-- category: IBM i Services
-- description: Operations - Jobs that ended abnormally with end code analysis
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Jobs That Ended Abnormally (CPF1164 with End Code >= 20)
-- Purpose: Identify jobs that ended abnormally by analyzing CPF1164
-- messages and extracting end codes to determine severity:
-- - End Code 20: Job exceeded end severity
-- - End Code 30: Job ended abnormally
-- - End Code 40+: Various abnormal termination scenarios
-- =====================================================================
WITH CPF1164_BASE AS (
SELECT
HL.MESSAGE_TIMESTAMP,
HL.FROM_JOB,
HL.FROM_USER,
HL.MESSAGE_ID,
HL.MESSAGE_TEXT,
-- Job number
SUBSTR(HL.FROM_JOB, 1, LOCATE('/', HL.FROM_JOB) - 1) AS JOB_NUMBER,
-- Job user
SUBSTR(
HL.FROM_JOB,
LOCATE('/', HL.FROM_JOB) + 1,
LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1)
- LOCATE('/', HL.FROM_JOB) - 1
) AS JOB_USER,
-- Job name
SUBSTR(
HL.FROM_JOB,
LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) + 1
) AS JOB_NAME_ONLY,
-- Extract End Code (lowercase "end code")
CASE
WHEN LOCATE('end code', HL.MESSAGE_TEXT) > 0 THEN
INTEGER(
RTRIM(
SUBSTR(
HL.MESSAGE_TEXT,
LOCATE('end code', HL.MESSAGE_TEXT)
+ LENGTH('end code ')
),
'.'
)
)
ELSE NULL
END AS END_CODE
FROM TABLE (
QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 7 DAYS
)
) AS HL
WHERE HL.MESSAGE_ID = 'CPF1164'
)
SELECT
MESSAGE_TIMESTAMP,
FROM_JOB,
FROM_USER,
MESSAGE_ID,
MESSAGE_TEXT,
JOB_NUMBER,
JOB_USER,
JOB_NAME_ONLY,
END_CODE,
-- End Code Meaning
CASE END_CODE
WHEN 0 THEN 'The job completed normally.'
WHEN 10 THEN 'The job completed normally during controlled ending or controlled subsystem ending.'
WHEN 20 THEN 'The job exceeded end severity (ENDSEV job attribute).'
WHEN 30 THEN 'The job ended abnormally.'
WHEN 40 THEN 'The job ended before becoming active.'
WHEN 50 THEN 'The job ended while the job was active.'
WHEN 60 THEN 'The subsystem ended abnormally while the job was active.'
WHEN 70 THEN 'The system ended abnormally while the job was active.'
WHEN 80 THEN 'The job ended (ENDJOBABN command).'
WHEN 90 THEN 'The job was forced to end after the time limit ended (ENDJOBABN).'
ELSE 'Unknown end code'
END AS END_CODE_MEANING,
HOUR(MESSAGE_TIMESTAMP) AS HOUR_OF_DAY,
DAYNAME(MESSAGE_TIMESTAMP) AS DAY_OF_WEEK
FROM CPF1164_BASE
-- Only End Codes 20 and above (abnormal endings)
WHERE END_CODE >= 20
ORDER BY MESSAGE_TIMESTAMP DESC
WITH UR;
Use Cases:
- Identify jobs that ended abnormally vs. normally
- Understand the severity of job terminations
- Track forced job endings (ENDJOBABN)
- Detect system or subsystem-related job failures
- Monitor jobs that exceeded end severity thresholds
End Code Severity Guide:
- End Code 20: Job exceeded ENDSEV - Review job logs for errors
- End Code 30: Abnormal termination - Investigate root cause
- End Code 40: Job ended before activation - Check job submission
- End Code 50: Job ended while active - Application or system issue
- End Code 60: Subsystem ended abnormally - Subsystem stability issue
- End Code 70: System ended abnormally - Critical system event
- End Code 80: Forced ending (ENDJOBABN) - Manual intervention
- End Code 90: Time limit exceeded - Job timeout issue
Key Insights:
- End codes 20-30 typically indicate application-level issues
- End codes 40-70 suggest system or subsystem problems
- End codes 80-90 indicate administrative intervention or timeouts
- Filtering for end codes >= 20 focuses on problematic terminations
3. Subsystem Activity Monitoring
Description: Monitor subsystem starts, ends, and activity patterns
-- category: IBM i Services
-- description: Operations - Subsystem activity monitoring
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Subsystem Activity Monitoring
-- Purpose: Track subsystem lifecycle events:
-- - Subsystem starts and ends
-- - Unexpected terminations
-- - Activity patterns
-- =====================================================================
SELECT
HL.MESSAGE_TIMESTAMP,
HL.MESSAGE_ID,
HL.MESSAGE_TEXT,
HL.MESSAGE_SECOND_LEVEL_TEXT,
HL.FROM_USER,
HL.FROM_JOB,
-- Extract subsystem name from message text
CASE
WHEN HL.MESSAGE_TEXT LIKE '%subsystem%' THEN
TRIM(SUBSTR(HL.MESSAGE_TEXT,
LOCATE('subsystem', LOWER(HL.MESSAGE_TEXT)) + 10,
10))
ELSE 'UNKNOWN'
END AS SUBSYSTEM_NAME,
-- Classify event type
CASE HL.MESSAGE_ID
WHEN 'CPF1103' THEN 'SUBSYSTEM_STARTED'
WHEN 'CPF1804' THEN 'SUBSYSTEM_STARTING'
WHEN 'CPC1206' THEN 'SUBSYSTEM_ENDING_CONTROLLED'
WHEN 'CPC1207' THEN 'SUBSYSTEM_ENDING_IMMEDIATE'
WHEN 'CPD1418' THEN 'SUBSYSTEM_ENDING'
WHEN 'CPF0927' THEN 'SUBSYSTEM_ENDED'
ELSE 'OTHER_SUBSYSTEM_EVENT'
END AS EVENT_TYPE,
-- Time analysis
HOUR(HL.MESSAGE_TIMESTAMP) AS EVENT_HOUR,
DAYNAME(HL.MESSAGE_TIMESTAMP) AS DAY_OF_WEEK,
-- Severity
CASE HL.MESSAGE_ID
WHEN 'CPC1207' THEN 'HIGH' -- Immediate shutdown
WHEN 'CPC1206' THEN 'NORMAL' -- Controlled shutdown
WHEN 'CPF1103' THEN 'NORMAL' -- Subsystem started
WHEN 'CPF1804' THEN 'NORMAL' -- Subsystem starting
WHEN 'CPD1418' THEN 'MEDIUM' -- Subsystem ending
WHEN 'CPF0927' THEN 'MEDIUM' -- Subsystem ended
ELSE 'MEDIUM'
END AS SEVERITY
FROM TABLE(QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 30 DAYS
)) AS HL
WHERE HL.MESSAGE_ID IN (
'CPC1206', -- Subsystem is ending controlled
'CPC1207', -- Subsystem ending immediately
'CPD1418', -- Subsystem ending
'CPF0927', -- Subsystem ended
'CPF1103', -- Subsystem started
'CPF1804' -- Subsystem starting
)
ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;
Use Cases:
- Monitor subsystem availability
- Detect unexpected subsystem terminations
- Track subsystem restart patterns
- Identify subsystem stability issues
Key Events:
- CPF1103: Subsystem started (normal operation)
- CPF1804: Subsystem starting (subsystem initialization)
- CPC1206: Subsystem is ending controlled (planned shutdown)
- CPC1207: Subsystem ending immediately (immediate shutdown - investigate)
- CPD1418: Subsystem ending (subsystem termination in progress)
- CPF0927: Subsystem ended (subsystem has terminated)
4. Message Frequency Analysis
Description: Identify most frequent messages and their patterns
-- category: IBM i Services
-- description: Operations - Message frequency analysis
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Message Frequency Analysis
-- Purpose: Identify most common messages:
-- - High-frequency messages
-- - Message trends
-- - Potential noise vs signal
-- =====================================================================
SELECT
HL.MESSAGE_ID,
HL.MESSAGE_TYPE,
COUNT(*) AS OCCURRENCE_COUNT,
COUNT(DISTINCT HL.FROM_USER) AS UNIQUE_USERS,
COUNT(DISTINCT HL.FROM_PROGRAM) AS UNIQUE_PROGRAMS,
COUNT(DISTINCT DATE(HL.MESSAGE_TIMESTAMP)) AS DAYS_OCCURRED,
-- Sample message text (first occurrence)
MIN(HL.MESSAGE_TEXT) AS SAMPLE_MESSAGE_TEXT,
-- Time span
MIN(HL.MESSAGE_TIMESTAMP) AS FIRST_OCCURRENCE,
MAX(HL.MESSAGE_TIMESTAMP) AS LAST_OCCURRENCE,
-- Calculate frequency
DECIMAL(COUNT(*) / NULLIF(COUNT(DISTINCT DATE(HL.MESSAGE_TIMESTAMP)), 0), 10, 2) AS AVG_PER_DAY,
-- Classify frequency
CASE
WHEN COUNT(*) >= 1000 THEN 'VERY_HIGH'
WHEN COUNT(*) >= 500 THEN 'HIGH'
WHEN COUNT(*) >= 100 THEN 'MEDIUM'
WHEN COUNT(*) >= 50 THEN 'LOW'
ELSE 'RARE'
END AS FREQUENCY_LEVEL,
-- Assess if this is likely noise or signal
CASE
WHEN COUNT(*) >= 1000 AND HL.MESSAGE_TYPE = '*INFO' THEN 'LIKELY_NOISE'
WHEN COUNT(*) >= 500 AND HL.MESSAGE_TYPE IN ('*ESCAPE', '*NOTIFY') THEN 'INVESTIGATE'
WHEN COUNT(*) >= 100 THEN 'MONITOR'
ELSE 'NORMAL'
END AS ASSESSMENT
FROM TABLE(QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 30 DAYS
)) AS HL
GROUP BY HL.MESSAGE_ID, HL.MESSAGE_TYPE
HAVING COUNT(*) >= 50 -- Minimum threshold for analysis
ORDER BY OCCURRENCE_COUNT DESC
WITH UR;
Use Cases:
- Identify noisy messages for filtering
- Detect abnormal message patterns
- Optimize history log retention
- Focus on actionable messages
Frequency Levels:
- VERY_HIGH (1000+): May need filtering or investigation
- HIGH (500-999): Monitor for patterns
- MEDIUM (100-499): Normal operational messages
- LOW (50-99): Occasional events
5. Operational Summary Dashboard
Description: Executive dashboard of operational metrics
-- category: IBM i Services
-- description: Operations - Operational summary dashboard
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Operational Summary Dashboard
-- Purpose: High-level operational metrics:
-- - Job completion rates
-- - Error trends
-- - System health indicators
-- =====================================================================
WITH DailyMetrics AS (
SELECT
DATE(HL.MESSAGE_TIMESTAMP) AS METRIC_DATE,
-- Job metrics using CPF1164 with End Code extraction
SUM(CASE
WHEN HL.MESSAGE_ID = 'CPF1164'
AND (LOCATE('end code', HL.MESSAGE_TEXT) = 0
OR INTEGER(RTRIM(SUBSTR(HL.MESSAGE_TEXT,
LOCATE('end code', HL.MESSAGE_TEXT) + LENGTH('end code ')), '.')) < 20)
THEN 1
ELSE 0
END) AS JOBS_COMPLETED_NORMAL,
SUM(CASE
WHEN HL.MESSAGE_ID = 'CPF1164'
AND LOCATE('end code', HL.MESSAGE_TEXT) > 0
AND INTEGER(RTRIM(SUBSTR(HL.MESSAGE_TEXT,
LOCATE('end code', HL.MESSAGE_TEXT) + LENGTH('end code ')), '.')) >= 20
THEN 1
ELSE 0
END) AS JOBS_COMPLETED_ABNORMAL,
-- Error metrics
SUM(CASE WHEN HL.MESSAGE_TYPE = '*ESCAPE' THEN 1 ELSE 0 END) AS ESCAPE_MESSAGES,
SUM(CASE WHEN HL.MESSAGE_ID LIKE 'MCH%' THEN 1 ELSE 0 END) AS MACHINE_ERRORS,
-- Subsystem metrics
SUM(CASE WHEN HL.MESSAGE_ID = 'CPI0955' THEN 1 ELSE 0 END) AS SUBSYSTEM_ABNORMAL_ENDS,
-- Total message volume
COUNT(*) AS TOTAL_MESSAGES
FROM TABLE(QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 30 DAYS
)) AS HL
GROUP BY DATE(HL.MESSAGE_TIMESTAMP)
)
SELECT
DM.METRIC_DATE,
DM.JOBS_COMPLETED_NORMAL,
DM.JOBS_COMPLETED_ABNORMAL,
-- Calculate job success rate
CASE
WHEN (DM.JOBS_COMPLETED_NORMAL + DM.JOBS_COMPLETED_ABNORMAL) > 0 THEN
DECIMAL(DM.JOBS_COMPLETED_NORMAL * 100.0 /
(DM.JOBS_COMPLETED_NORMAL + DM.JOBS_COMPLETED_ABNORMAL), 5, 2)
ELSE 0
END AS JOB_SUCCESS_RATE_PCT,
DM.ESCAPE_MESSAGES,
DM.MACHINE_ERRORS,
DM.SUBSYSTEM_ABNORMAL_ENDS,
DM.TOTAL_MESSAGES,
-- Health score (0-100)
CASE
WHEN DM.MACHINE_ERRORS > 10 OR DM.SUBSYSTEM_ABNORMAL_ENDS > 5 THEN 'POOR'
WHEN DM.ESCAPE_MESSAGES > 100 OR DM.JOBS_COMPLETED_ABNORMAL > 50 THEN 'FAIR'
WHEN DM.ESCAPE_MESSAGES > 50 OR DM.JOBS_COMPLETED_ABNORMAL > 20 THEN 'GOOD'
ELSE 'EXCELLENT'
END AS DAILY_HEALTH_STATUS,
-- Trend vs previous day
LAG(DM.ESCAPE_MESSAGES, 1) OVER (ORDER BY DM.METRIC_DATE) AS PREV_DAY_ESCAPES,
CASE
WHEN DM.ESCAPE_MESSAGES > LAG(DM.ESCAPE_MESSAGES, 1) OVER (ORDER BY DM.METRIC_DATE) * 1.5
THEN 'WORSENING'
WHEN DM.ESCAPE_MESSAGES < LAG(DM.ESCAPE_MESSAGES, 1) OVER (ORDER BY DM.METRIC_DATE) * 0.5
THEN 'IMPROVING'
ELSE 'STABLE'
END AS TREND
FROM DailyMetrics DM
ORDER BY DM.METRIC_DATE DESC
WITH UR;
Use Cases:
- Daily operational reporting
- Trend analysis and forecasting
- Executive dashboards
- SLA monitoring and compliance
Health Status Criteria:
- EXCELLENT: Minimal errors, high job success rate
- GOOD: Normal operational levels
- FAIR: Elevated error rates, needs monitoring
- POOR: Critical issues requiring immediate attention
6. Device and Hardware Issues
Description: Monitor device errors, hardware failures, and I/O problems
-- category: IBM i Services
-- description: Operations - Device and hardware issue monitoring
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Device and Hardware Issue Monitoring
-- Purpose: Track device-related problems:
-- - Device errors and failures
-- - I/O errors
-- - Hardware issues
-- - Communication problems
-- =====================================================================
SELECT
HL.MESSAGE_TIMESTAMP,
HL.MESSAGE_ID,
HL.MESSAGE_TEXT,
HL.MESSAGE_SECOND_LEVEL_TEXT,
HL.FROM_USER,
HL.FROM_JOB,
HL.FROM_PROGRAM,
-- Classify device issue type
CASE
WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%device%' THEN 'DEVICE_ERROR'
WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%I/O%' THEN 'IO_ERROR'
WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%disk%' THEN 'DISK_ERROR'
WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%tape%' THEN 'TAPE_ERROR'
WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%printer%' THEN 'PRINTER_ERROR'
WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%communication%' THEN 'COMM_ERROR'
ELSE 'OTHER_HARDWARE'
END AS ISSUE_TYPE,
-- Extract device name if present
CASE
WHEN LOCATE('device', LOWER(HL.MESSAGE_TEXT)) > 0 THEN
TRIM(SUBSTR(HL.MESSAGE_TEXT,
LOCATE('device', LOWER(HL.MESSAGE_TEXT)) + 7,
10))
ELSE 'UNKNOWN'
END AS DEVICE_NAME,
-- Severity assessment
CASE HL.MESSAGE_TYPE
WHEN '*ESCAPE' THEN 'HIGH'
WHEN '*NOTIFY' THEN 'MEDIUM'
ELSE 'LOW'
END AS SEVERITY
FROM TABLE(QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 7 DAYS
)) AS HL
WHERE (HL.MESSAGE_ID LIKE 'CPF%' OR HL.MESSAGE_ID LIKE 'CPI%')
AND HL.MESSAGE_ID != 'CPF2234' -- Exclude CPF2234
AND (LOWER(HL.MESSAGE_TEXT) LIKE '%device%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%i/o%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%disk%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%tape%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%printer%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%hardware%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%communication%')
ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;
Use Cases:
- Monitor hardware health
- Detect failing devices early
- Track I/O performance issues
- Plan hardware maintenance
Common Device Issues:
- Disk errors (may indicate failing drives)
- Tape errors (backup/restore problems)
- Printer errors (output queue issues)
- Communication errors (network problems)
7. Communication and Network Issues
Description: Monitor network errors, communication failures, and connectivity problems
-- category: IBM i Services
-- description: Operations - Communication and network issue monitoring
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Communication and Network Issue Monitoring
-- Purpose: Track network-related problems:
-- - Communication errors
-- - Network failures
-- - TCP/IP issues
-- - Line errors
-- =====================================================================
SELECT
HL.MESSAGE_TIMESTAMP,
HL.MESSAGE_ID,
HL.MESSAGE_TEXT,
HL.MESSAGE_SECOND_LEVEL_TEXT,
HL.FROM_USER,
HL.FROM_JOB,
HL.FROM_PROGRAM,
-- Classify communication issue
CASE
WHEN HL.MESSAGE_ID LIKE 'TCP%' THEN 'TCP_IP_ERROR'
WHEN HL.MESSAGE_TEXT LIKE '%network%' THEN 'NETWORK_ERROR'
WHEN HL.MESSAGE_TEXT LIKE '%communication%' THEN 'COMM_ERROR'
WHEN HL.MESSAGE_TEXT LIKE '%line%' THEN 'LINE_ERROR'
WHEN HL.MESSAGE_TEXT LIKE '%connection%' THEN 'CONNECTION_ERROR'
WHEN HL.MESSAGE_TEXT LIKE '%socket%' THEN 'SOCKET_ERROR'
ELSE 'OTHER_COMM'
END AS ISSUE_TYPE,
-- Extract IP or host information if present
CASE
WHEN HL.MESSAGE_TEXT LIKE '%[0-9]%.[0-9]%.[0-9]%.[0-9]%' THEN 'IP_PRESENT'
WHEN LOCATE('host', LOWER(HL.MESSAGE_TEXT)) > 0 THEN 'HOST_PRESENT'
ELSE 'NO_ADDRESS'
END AS ADDRESS_INFO,
-- Time pattern analysis
HOUR(HL.MESSAGE_TIMESTAMP) AS ERROR_HOUR,
DAYNAME(HL.MESSAGE_TIMESTAMP) AS DAY_OF_WEEK,
-- Severity
CASE HL.MESSAGE_TYPE
WHEN '*ESCAPE' THEN 'HIGH'
WHEN '*NOTIFY' THEN 'MEDIUM'
ELSE 'LOW'
END AS SEVERITY
FROM TABLE(QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 7 DAYS
)) AS HL
WHERE (HL.MESSAGE_ID LIKE 'TCP%'
OR HL.MESSAGE_ID LIKE 'CPF%')
AND (LOWER(HL.MESSAGE_TEXT) LIKE '%network%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%communication%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%tcp%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%line%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%connection%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%socket%')
ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;
Use Cases:
- Monitor network connectivity
- Detect communication failures
- Track TCP/IP issues
- Identify intermittent network problems
Common Communication Issues:
- TCP/IP errors: Network protocol problems
- Connection errors: Failed connections
- Line errors: Physical line problems
- Socket errors: Application communication issues
8. Save/Restore and Backup Operations
Description: Monitor backup, save, and restore operations
-- category: IBM i Services
-- description: Operations - Save/Restore and backup monitoring
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Save/Restore and Backup Operation Monitoring
-- Purpose: Track backup and restore activities:
-- - ENDSBS operations
-- - SAVSYS operations
-- - SAVLIB operations
-- - SAVDLO operations
-- - SAV operations
-- - Backup failures and warnings
-- =====================================================================
SELECT
HL.MESSAGE_TIMESTAMP,
HL.MESSAGE_ID,
HL.MESSAGE_TEXT,
HL.MESSAGE_SECOND_LEVEL_TEXT,
HL.FROM_USER,
HL.FROM_JOB,
HL.FROM_PROGRAM,
-- Classify operation type
CASE
WHEN HL.MESSAGE_ID IN ('CPF0994', 'CPF0968') THEN 'ENDSBS'
WHEN HL.MESSAGE_ID IN ('CPC3702', 'CPF3772') THEN 'SAVSYS'
WHEN HL.MESSAGE_ID IN ('CPC3701', 'CPC3707', 'CPC3722', 'CPC3704', 'CPF3751', 'CPF3763', 'CPF3770', 'CPF3771', 'CPF3777') THEN 'SAVLIB'
WHEN HL.MESSAGE_ID IN ('CPC9410', 'CPF9410', 'CPF902E') THEN 'SAVDLO'
WHEN HL.MESSAGE_ID IN ('CPC370C', 'CPF3837') THEN 'SAV'
ELSE 'OTHER_BACKUP'
END AS OPERATION_TYPE,
-- Operation status
CASE
WHEN HL.MESSAGE_ID IN ('CPC3702', 'CPC3701', 'CPC3707', 'CPC3722', 'CPC9410', 'CPC370C') THEN 'SUCCESS'
WHEN HL.MESSAGE_ID IN ('CPC3704', 'CPF3772', 'CPF3751', 'CPF3771', 'CPF3777', 'CPF9410', 'CPF902E', 'CPF3837') THEN 'PARTIAL'
WHEN HL.MESSAGE_ID IN ('CPF3763', 'CPF3770') THEN 'WARNING'
WHEN HL.MESSAGE_ID IN ('CPF0994', 'CPF0968') THEN 'SYSTEM_END'
ELSE 'UNKNOWN'
END AS OPERATION_STATUS,
-- Extract library or object count if present
CASE
WHEN LOCATE('objects', LOWER(HL.MESSAGE_TEXT)) > 0 THEN
TRIM(SUBSTR(HL.MESSAGE_TEXT, 1, LOCATE('objects', LOWER(HL.MESSAGE_TEXT)) + 7))
WHEN LOCATE('libraries', LOWER(HL.MESSAGE_TEXT)) > 0 THEN
TRIM(SUBSTR(HL.MESSAGE_TEXT, 1, LOCATE('libraries', LOWER(HL.MESSAGE_TEXT)) + 9))
ELSE 'N/A'
END AS OPERATION_DETAIL,
-- Time analysis (backups often run at specific times)
HOUR(HL.MESSAGE_TIMESTAMP) AS OPERATION_HOUR,
-- Severity
CASE
WHEN HL.MESSAGE_ID LIKE 'CPC%' THEN 'NORMAL'
WHEN HL.MESSAGE_ID IN ('CPF3772', 'CPF3751', 'CPF3771', 'CPF3777', 'CPF9410', 'CPF902E', 'CPF3837') THEN 'WARNING'
WHEN HL.MESSAGE_ID IN ('CPF3763', 'CPF3770') THEN 'MEDIUM'
ELSE 'HIGH'
END AS SEVERITY
FROM TABLE(QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 30 DAYS
)) AS HL
WHERE HL.MESSAGE_ID IN (
-- ENDSBS
'CPF0994', 'CPF0968',
-- SAVSYS
'CPC3702', 'CPF3772',
-- SAVLIB
'CPC3701', 'CPC3707', 'CPC3722', 'CPC3704', 'CPF3751', 'CPF3763', 'CPF3770', 'CPF3771', 'CPF3777',
-- SAVDLO
'CPC9410', 'CPF9410', 'CPF902E',
-- SAV
'CPC370C', 'CPF3837'
)
ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;
Use Cases:
- Monitor backup success rates
- Track save operations (SAVSYS, SAVLIB, SAVDLO, SAV)
- Identify partial saves or failures
- Verify backup schedules
- Monitor system end operations (ENDSBS)
Key Message IDs:
ENDSBS:
- CPF0994: ENDSBS SBS(*ALL) command being processed
- CPF0968: System ended to restricted condition
SAVSYS:
- CPC3702: Save System (SAVSYS) command completed
- CPF3772: SAVSYS completed with objects not saved
SAVLIB:
- CPC3701: Objects saved from library (success)
- CPC3707: Libraries saved (success)
- CPC3722: Objects saved from library (success)
- CPC3704: Objects saved with some not included (partial)
- CPF3751: Some libraries not saved (warning)
- CPF3763: Previously damaged objects (warning)
- CPF3770: No objects saved or restored (warning)
- CPF3771: Objects saved with some not saved (partial)
- CPF3777: Libraries saved with some partially saved (partial)
SAVDLO:
- CPC9410: Document library objects saved (success)
- CPF9410: Document library objects saved with some not saved (partial)
- CPF902E: Document library objects saved with some not saved (partial)
SAV:
- CPC370C: Objects saved (success)
- CPF3837: Objects saved with some not saved (partial)
9. Performance and Resource Issues
Description: Identify performance problems and resource constraints
-- category: IBM i Services
-- description: Operations - Performance and resource issue monitoring
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Performance and Resource Issue Monitoring
-- Purpose: Track performance-related problems:
-- - Storage issues
-- - Memory problems
-- - CPU constraints
-- - ASP issues
-- =====================================================================
SELECT
HL.MESSAGE_TIMESTAMP,
HL.MESSAGE_ID,
HL.MESSAGE_TEXT,
HL.MESSAGE_SECOND_LEVEL_TEXT,
HL.FROM_USER,
HL.FROM_JOB,
HL.FROM_PROGRAM,
-- Classify performance issue
CASE
WHEN HL.MESSAGE_TEXT LIKE '%storage%' OR HL.MESSAGE_TEXT LIKE '%ASP%' THEN 'STORAGE_ISSUE'
WHEN HL.MESSAGE_TEXT LIKE '%memory%' THEN 'MEMORY_ISSUE'
WHEN HL.MESSAGE_TEXT LIKE '%CPU%' OR HL.MESSAGE_TEXT LIKE '%processor%' THEN 'CPU_ISSUE'
WHEN HL.MESSAGE_TEXT LIKE '%threshold%' THEN 'THRESHOLD_EXCEEDED'
WHEN HL.MESSAGE_TEXT LIKE '%capacity%' THEN 'CAPACITY_ISSUE'
WHEN HL.MESSAGE_TEXT LIKE '%performance%' THEN 'PERFORMANCE_DEGRADATION'
ELSE 'OTHER_RESOURCE'
END AS ISSUE_TYPE,
-- Extract percentage or threshold if present
CASE
WHEN HL.MESSAGE_TEXT LIKE '%[0-9]%percent%' OR HL.MESSAGE_TEXT LIKE '%[0-9]%%%' THEN 'PERCENTAGE_PRESENT'
ELSE 'NO_PERCENTAGE'
END AS METRIC_INFO,
-- Severity based on message type and content
CASE
WHEN HL.MESSAGE_TYPE = '*ESCAPE' THEN 'CRITICAL'
WHEN HL.MESSAGE_TEXT LIKE '%90%' OR HL.MESSAGE_TEXT LIKE '%95%' THEN 'HIGH'
WHEN HL.MESSAGE_TEXT LIKE '%80%' OR HL.MESSAGE_TEXT LIKE '%85%' THEN 'MEDIUM'
ELSE 'LOW'
END AS SEVERITY
FROM TABLE(QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 7 DAYS
)) AS HL
WHERE (LOWER(HL.MESSAGE_TEXT) LIKE '%storage%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%asp%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%memory%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%cpu%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%threshold%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%capacity%'
OR LOWER(HL.MESSAGE_TEXT) LIKE '%performance%')
ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;
Use Cases:
- Monitor system resource utilization
- Detect capacity issues early
- Track performance degradation
- Plan capacity upgrades
Common Performance Issues:
- Storage issues: ASP threshold exceeded
- Memory issues: Insufficient memory
- CPU issues: High CPU utilization
- Capacity issues: Resource constraints
10. Comprehensive Message Pattern Analysis
Description: Advanced pattern analysis across all message types
-- category: IBM i Services
-- description: Operations - Comprehensive message pattern analysis
-- minvrm: v7r3m0
-- =====================================================================
-- Query: Comprehensive Message Pattern Analysis
-- Purpose: Analyze message patterns across categories:
-- - Identify trending issues
-- - Detect anomalies
-- - Correlate related problems
-- =====================================================================
WITH MessageCategories AS (
SELECT
HL.MESSAGE_TIMESTAMP,
HL.MESSAGE_ID,
HL.MESSAGE_TYPE,
HL.FROM_USER,
HL.FROM_JOB,
DATE(HL.MESSAGE_TIMESTAMP) AS MESSAGE_DATE,
HOUR(HL.MESSAGE_TIMESTAMP) AS MESSAGE_HOUR,
-- Categorize messages
CASE
WHEN HL.MESSAGE_ID IN ('CPC1221', 'CPC1224', 'CPF1164', 'CPF1124') THEN 'JOB_CONTROL'
WHEN HL.MESSAGE_ID LIKE 'CPI095%' THEN 'SUBSYSTEM'
WHEN HL.MESSAGE_ID LIKE 'SQL%' OR HL.MESSAGE_ID LIKE 'CPF50%' THEN 'DATABASE'
WHEN HL.MESSAGE_ID LIKE 'TCP%' OR LOWER(HL.MESSAGE_TEXT) LIKE '%network%' THEN 'NETWORK'
WHEN HL.MESSAGE_ID LIKE 'CPC37%' OR HL.MESSAGE_ID LIKE 'CPC38%' THEN 'BACKUP_RESTORE'
WHEN LOWER(HL.MESSAGE_TEXT) LIKE '%device%' OR LOWER(HL.MESSAGE_TEXT) LIKE '%hardware%' THEN 'HARDWARE'
WHEN LOWER(HL.MESSAGE_TEXT) LIKE '%storage%' OR LOWER(HL.MESSAGE_TEXT) LIKE '%asp%' THEN 'STORAGE'
WHEN HL.MESSAGE_TYPE = '*ESCAPE' THEN 'ERROR'
ELSE 'OTHER'
END AS MESSAGE_CATEGORY
FROM TABLE(QSYS2.HISTORY_LOG_INFO(
START_TIME => CURRENT_TIMESTAMP - 14 DAYS
)) AS HL
)
SELECT
MC.MESSAGE_DATE,
MC.MESSAGE_CATEGORY,
COUNT(*) AS MESSAGE_COUNT,
COUNT(DISTINCT MC.MESSAGE_ID) AS UNIQUE_MESSAGE_IDS,
COUNT(DISTINCT MC.FROM_USER) AS UNIQUE_USERS,
-- Calculate hourly distribution
SUM(CASE WHEN MC.MESSAGE_HOUR BETWEEN 0 AND 5 THEN 1 ELSE 0 END) AS OVERNIGHT_COUNT,
SUM(CASE WHEN MC.MESSAGE_HOUR BETWEEN 6 AND 11 THEN 1 ELSE 0 END) AS MORNING_COUNT,
SUM(CASE WHEN MC.MESSAGE_HOUR BETWEEN 12 AND 17 THEN 1 ELSE 0 END) AS AFTERNOON_COUNT,
SUM(CASE WHEN MC.MESSAGE_HOUR BETWEEN 18 AND 23 THEN 1 ELSE 0 END) AS EVENING_COUNT,
-- Trend analysis
LAG(COUNT(*), 1) OVER (PARTITION BY MC.MESSAGE_CATEGORY ORDER BY MC.MESSAGE_DATE) AS PREV_DAY_COUNT,
CASE
WHEN COUNT(*) > LAG(COUNT(*), 1) OVER (PARTITION BY MC.MESSAGE_CATEGORY ORDER BY MC.MESSAGE_DATE) * 2
THEN 'SPIKE'
WHEN COUNT(*) > LAG(COUNT(*), 1) OVER (PARTITION BY MC.MESSAGE_CATEGORY ORDER BY MC.MESSAGE_DATE) * 1.5
THEN 'INCREASING'
WHEN COUNT(*) < LAG(COUNT(*), 1) OVER (PARTITION BY MC.MESSAGE_CATEGORY ORDER BY MC.MESSAGE_DATE) * 0.5
THEN 'DECREASING'
ELSE 'STABLE'
END AS TREND,
-- Risk assessment
CASE
WHEN MC.MESSAGE_CATEGORY IN ('ERROR', 'STORAGE', 'DATABASE') AND COUNT(*) > 100 THEN 'HIGH'
WHEN COUNT(*) > 500 THEN 'MEDIUM'
ELSE 'LOW'
END AS RISK_LEVEL
FROM MessageCategories MC
GROUP BY MC.MESSAGE_DATE, MC.MESSAGE_CATEGORY
HAVING COUNT(*) >= 10 -- Minimum threshold
ORDER BY MC.MESSAGE_DATE DESC, MESSAGE_COUNT DESC
WITH UR;
Use Cases:
- Identify emerging issues across categories
- Detect unusual activity patterns
- Correlate related problems
- Executive-level trend reporting
Pattern Indicators:
- SPIKE: Sudden increase (2x previous day)
- INCREASING: Growing trend (1.5x previous day)
- DECREASING: Declining trend (0.5x previous day)
- STABLE: Normal variation
Best Practices and Guidelines
Performance Optimization
Always specify START_TIME: Limits data scanned
START_TIME => CURRENT_TIMESTAMP - 7 DAYS- Use appropriate time windows:
- Real-time monitoring: Last 1-2 hours
- Daily analysis: Last 24 hours
- Trend analysis: Last 7-30 days
- Historical review: Last 90 days (if retained)
Filter early in WHERE clause: Reduces result set
WHERE MESSAGE_ID IN ('CPC1221', 'CPC1224')Use WITH UR: Avoids locking for read-only queries
WITH UR;
Query Guidelines
- History log retention: Default is typically 30 days (QHSTLOGSIZ)
- Table function syntax: Always use TABLE() wrapper
- Message types: *ESCAPE, *NOTIFY, *INFO, *STATUS, *DIAGNOSTIC
- Time zones: Timestamps are in system time zone
- Job name format: number/user/jobname
Operational Monitoring Strategy
- Real-time (every 15-30 minutes):
- Job failures
- Subsystem abnormal ends
- Critical system errors
- Daily:
- Job completion analysis
- Error trend review
- Operational summary dashboard
- Weekly:
- Message frequency analysis
- Program failure patterns
- Trend analysis
- Monthly:
- Historical trend reports
- Capacity planning metrics
- Performance baseline updates
Common Message ID Reference
Job Control
- CPF1124: Job started
- CPF1164: Job ended
- CPC1221: Job completed normally
- CPC1224: Job completed abnormally
- CPF1301: Job ended abnormally
- CPF1302: Job ended abnormally
Subsystem Control
- CPI0953: Subsystem started
- CPI0954: Subsystem ended normally
- CPI0955: Subsystem ended abnormally
System Errors
- CPFxxxx: System errors (various)
- MCHxxxx: Machine check errors
- RNXxxxx: Runtime errors
- SQLxxxx: SQL errors
Enhanced Message ID Reference
Job Control Messages
- CPF1124: Job started
- CPF1164: Job ended
- CPC1221: Job completed normally
- CPC1224: Job completed abnormally
- CPF1301: Job ended abnormally
- CPF1302: Job ended abnormally
- CPF1321: Job ended abnormally
- CPF1322: Job ended abnormally
Subsystem Messages
- CPI0953: Subsystem started
- CPI0954: Subsystem ended normally
- CPI0955: Subsystem ended abnormally
- CPF0001: Error in subsystem
Database and File Messages
- SQL0xxx: SQL errors (various)
- CPF5001: Duplicate key value
- CPF5006: Record format not found
- CPF5025: Duplicate record key
- CPF5029: Member not found
- CPF5035: Data mapping error
- CPF50xx: File operation errors
Communication Messages
- TCP0xxx: TCP/IP errors
- CPFxxxx: Communication errors (various)
Save/Restore Messages
- CPC3702: Objects saved
- CPC3703: Save operation completed
- CPC3708: Objects restored
- CPC3709: Restore operation completed
- CPF37xx: Save operation errors
- CPF38xx: Restore operation errors
Device and Hardware Messages
- Device-related CPF messages (identified by message text)
- I/O error messages
- Hardware failure messages
Performance Messages
- Storage threshold messages
- ASP capacity messages
- Memory constraint messages
- CPU utilization messages
Updated Monitoring Strategy
Critical (Check every 15-30 minutes)
- Job abnormal completions (CPC1224)
- Subsystem abnormal ends (CPI0955)
- Storage threshold exceeded
- Database errors (SQL, CPF50xx)
- Hardware failures
High Priority (Check hourly)
- Communication errors
- Device issues
- Backup/restore failures
- Performance degradation
Medium Priority (Check daily)
- Job completion trends
- Message frequency patterns
- Error trends by category
Low Priority (Check weekly/monthly)
- Historical trend analysis
- Capacity planning metrics
- Long-term pattern analysis
Additional Information
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
05 May 2026
UID
ibm17271431