IBM Support

IBM i History Log Analysis SQL Queries - QSYS2.HISTORY_LOG_INFO

How To


Summary

This document contains SQL queries for analyzing IBM i History Log (QHST) using the QSYS2.HISTORY_LOG_INFO table function. These queries help identify system issues, job failures, save, restore problems, and operational trends.

Objective

This document provides 10 comprehensive queries for operational analysis of IBM i History Log:

  1. Job Completion Analysis - Track job success and failure patterns using CPF1164 with End Code analysis
  2. Jobs That Ended Abnormally (CPF1164 Analysis) - Detailed analysis of abnormal job terminations with end code classification
  3. Subsystem Activity Monitoring - Monitor subsystem lifecycle and abnormal ends
  4. Message Frequency Analysis - Identify high-frequency messages and patterns
  5. Operational Summary Dashboard - Executive-level daily health metrics with trend analysis
  6. Device and Hardware Issues - Monitor device errors and hardware failures (excludes CPF2234)
  7. Communication and Network Issues - Track network errors and connectivity problems
  8. Save/Restore and Backup Operations - Monitor ENDSBS, SAVSYS, SAVLIB, SAVDLO, and SAV operations
  9. Performance and Resource Issues - Track storage, ASP, and system resource problems
  10. Comprehensive Message Pattern Analysis - Advanced pattern analysis across all message categories

 

All queries include:

  • Comprehensive documentation
  • Operational context
  • Performance optimization with appropriate time windows
  • Specific message ID references
  • Use case examples
  • Best practices for implementation

 

Use these queries as part of a comprehensive IBM i operational monitoring strategy to maintain system health, identify issues early, and optimize performance.

Environment

Minimum Version Required: IBM i V7R3M0 or higher

Steps

1. Job Completion Analysis

Description: Analyze job completion patterns to identify normal vs abnormal terminations

--  category:  IBM i Services
--  description:  Operations - Job completion analysis
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Job Completion Analysis
-- Purpose: Analyze job completion patterns:
--          - Normal completions
--          - Abnormal terminations
--          - Job timing patterns
--          - Completion trends
-- =====================================================================

SELECT
    HL.MESSAGE_TIMESTAMP,
    HL.FROM_JOB,
    HL.FROM_USER,
    HL.MESSAGE_ID,
    HL.MESSAGE_TEXT,
    
    -- Classify completion type
    CASE HL.MESSAGE_ID
        WHEN 'CPC1221' THEN 'NORMAL_COMPLETION'
        WHEN 'CPC1224' THEN 'ABNORMAL_COMPLETION'
        WHEN 'CPF1164' THEN 'JOB_ENDED'
        WHEN 'CPF1124' THEN 'JOB_STARTED'
        ELSE 'OTHER'
    END AS COMPLETION_TYPE,
    
    -- Extract job information from FROM_JOB (format: number/user/jobname)
    CASE
        WHEN LOCATE('/', HL.FROM_JOB) > 0 THEN
            SUBSTR(HL.FROM_JOB, 1, LOCATE('/', HL.FROM_JOB) - 1)
        ELSE ''
    END AS JOB_NUMBER,
    
    CASE
        WHEN LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) > 0 THEN
            SUBSTR(HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1,
                   LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) - LOCATE('/', HL.FROM_JOB) - 1)
        ELSE ''
    END AS JOB_USER,
    
    CASE
        WHEN LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) > 0 THEN
            SUBSTR(HL.FROM_JOB, LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) + 1)
        ELSE HL.FROM_JOB
    END AS JOB_NAME_ONLY,
    
    -- Time of day analysis
    HOUR(HL.MESSAGE_TIMESTAMP) AS HOUR_OF_DAY,
    DAYNAME(HL.MESSAGE_TIMESTAMP) AS DAY_OF_WEEK

FROM TABLE(QSYS2.HISTORY_LOG_INFO(
    START_TIME => CURRENT_TIMESTAMP - 7 DAYS
)) AS HL

WHERE HL.MESSAGE_ID IN (
    'CPC1221',  -- Job completed normally
    'CPC1224',  -- Job completed abnormally
    'CPF1164',  -- Job ended
    'CPF1124'   -- Job started
)

ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;

 

Use Cases:

  • Monitor job completion rates
  • Identify jobs with frequent abnormal terminations
  • Analyze job timing patterns
  • Track batch job execution

Key Message IDs:

  • CPC1221: Job completed normally
  • CPC1224: Job completed abnormally (investigate cause)
  • CPF1164: Job ended (general termination)
  • CPF1124: Job started (for duration analysis)

 

2. Jobs That Ended Abnormally (CPF1164 Analysis)

Description: Analyze jobs that ended with CPF1164 message, focusing on abnormal end codes (20+)

--  category:  IBM i Services
--  description:  Operations - Jobs that ended abnormally with end code analysis
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Jobs That Ended Abnormally (CPF1164 with End Code >= 20)
-- Purpose: Identify jobs that ended abnormally by analyzing CPF1164
--          messages and extracting end codes to determine severity:
--          - End Code 20: Job exceeded end severity
--          - End Code 30: Job ended abnormally
--          - End Code 40+: Various abnormal termination scenarios
-- =====================================================================

WITH CPF1164_BASE AS (
    SELECT
        HL.MESSAGE_TIMESTAMP,
        HL.FROM_JOB,
        HL.FROM_USER,
        HL.MESSAGE_ID,
        HL.MESSAGE_TEXT,

        -- Job number
        SUBSTR(HL.FROM_JOB, 1, LOCATE('/', HL.FROM_JOB) - 1) AS JOB_NUMBER,

        -- Job user
        SUBSTR(
            HL.FROM_JOB,
            LOCATE('/', HL.FROM_JOB) + 1,
            LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1)
              - LOCATE('/', HL.FROM_JOB) - 1
        ) AS JOB_USER,

        -- Job name
        SUBSTR(
            HL.FROM_JOB,
            LOCATE('/', HL.FROM_JOB, LOCATE('/', HL.FROM_JOB) + 1) + 1
        ) AS JOB_NAME_ONLY,

        -- Extract End Code (lowercase "end code")
        CASE
            WHEN LOCATE('end code', HL.MESSAGE_TEXT) > 0 THEN
                INTEGER(
                    RTRIM(
                        SUBSTR(
                            HL.MESSAGE_TEXT,
                            LOCATE('end code', HL.MESSAGE_TEXT)
                              + LENGTH('end code ')
                        ),
                        '.'
                    )
                )
            ELSE NULL
        END AS END_CODE

    FROM TABLE (
        QSYS2.HISTORY_LOG_INFO(
            START_TIME => CURRENT_TIMESTAMP - 7 DAYS
        )
    ) AS HL
    WHERE HL.MESSAGE_ID = 'CPF1164'
)

SELECT
    MESSAGE_TIMESTAMP,
    FROM_JOB,
    FROM_USER,
    MESSAGE_ID,
    MESSAGE_TEXT,
    JOB_NUMBER,
    JOB_USER,
    JOB_NAME_ONLY,
    END_CODE,

    -- End Code Meaning
    CASE END_CODE
        WHEN 0  THEN 'The job completed normally.'
        WHEN 10 THEN 'The job completed normally during controlled ending or controlled subsystem ending.'
        WHEN 20 THEN 'The job exceeded end severity (ENDSEV job attribute).'
        WHEN 30 THEN 'The job ended abnormally.'
        WHEN 40 THEN 'The job ended before becoming active.'
        WHEN 50 THEN 'The job ended while the job was active.'
        WHEN 60 THEN 'The subsystem ended abnormally while the job was active.'
        WHEN 70 THEN 'The system ended abnormally while the job was active.'
        WHEN 80 THEN 'The job ended (ENDJOBABN command).'
        WHEN 90 THEN 'The job was forced to end after the time limit ended (ENDJOBABN).'
        ELSE 'Unknown end code'
    END AS END_CODE_MEANING,

    HOUR(MESSAGE_TIMESTAMP) AS HOUR_OF_DAY,
    DAYNAME(MESSAGE_TIMESTAMP) AS DAY_OF_WEEK

FROM CPF1164_BASE

-- Only End Codes 20 and above (abnormal endings)
WHERE END_CODE >= 20

ORDER BY MESSAGE_TIMESTAMP DESC
WITH UR;

 

Use Cases:

  • Identify jobs that ended abnormally vs. normally
  • Understand the severity of job terminations
  • Track forced job endings (ENDJOBABN)
  • Detect system or subsystem-related job failures
  • Monitor jobs that exceeded end severity thresholds

End Code Severity Guide:

  • End Code 20: Job exceeded ENDSEV - Review job logs for errors
  • End Code 30: Abnormal termination - Investigate root cause
  • End Code 40: Job ended before activation - Check job submission
  • End Code 50: Job ended while active - Application or system issue
  • End Code 60: Subsystem ended abnormally - Subsystem stability issue
  • End Code 70: System ended abnormally - Critical system event
  • End Code 80: Forced ending (ENDJOBABN) - Manual intervention
  • End Code 90: Time limit exceeded - Job timeout issue

Key Insights:

  • End codes 20-30 typically indicate application-level issues
  • End codes 40-70 suggest system or subsystem problems
  • End codes 80-90 indicate administrative intervention or timeouts
  • Filtering for end codes >= 20 focuses on problematic terminations

 

3. Subsystem Activity Monitoring

Description: Monitor subsystem starts, ends, and activity patterns

--  category:  IBM i Services
--  description:  Operations - Subsystem activity monitoring
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Subsystem Activity Monitoring
-- Purpose: Track subsystem lifecycle events:
--          - Subsystem starts and ends
--          - Unexpected terminations
--          - Activity patterns
-- =====================================================================

SELECT
    HL.MESSAGE_TIMESTAMP,
    HL.MESSAGE_ID,
    HL.MESSAGE_TEXT,
    HL.MESSAGE_SECOND_LEVEL_TEXT,
    HL.FROM_USER,
    HL.FROM_JOB,
    
    -- Extract subsystem name from message text
    CASE
        WHEN HL.MESSAGE_TEXT LIKE '%subsystem%' THEN
            TRIM(SUBSTR(HL.MESSAGE_TEXT,
                 LOCATE('subsystem', LOWER(HL.MESSAGE_TEXT)) + 10,
                 10))
        ELSE 'UNKNOWN'
    END AS SUBSYSTEM_NAME,
    
    -- Classify event type
    CASE HL.MESSAGE_ID
        WHEN 'CPF1103' THEN 'SUBSYSTEM_STARTED'
        WHEN 'CPF1804' THEN 'SUBSYSTEM_STARTING'
        WHEN 'CPC1206' THEN 'SUBSYSTEM_ENDING_CONTROLLED'
        WHEN 'CPC1207' THEN 'SUBSYSTEM_ENDING_IMMEDIATE'
        WHEN 'CPD1418' THEN 'SUBSYSTEM_ENDING'
        WHEN 'CPF0927' THEN 'SUBSYSTEM_ENDED'
        ELSE 'OTHER_SUBSYSTEM_EVENT'
    END AS EVENT_TYPE,
    
    -- Time analysis
    HOUR(HL.MESSAGE_TIMESTAMP) AS EVENT_HOUR,
    DAYNAME(HL.MESSAGE_TIMESTAMP) AS DAY_OF_WEEK,
    
    -- Severity
    CASE HL.MESSAGE_ID
        WHEN 'CPC1207' THEN 'HIGH'        -- Immediate shutdown
        WHEN 'CPC1206' THEN 'NORMAL'      -- Controlled shutdown
        WHEN 'CPF1103' THEN 'NORMAL'      -- Subsystem started
        WHEN 'CPF1804' THEN 'NORMAL'      -- Subsystem starting
        WHEN 'CPD1418' THEN 'MEDIUM'      -- Subsystem ending
        WHEN 'CPF0927' THEN 'MEDIUM'      -- Subsystem ended
        ELSE 'MEDIUM'
    END AS SEVERITY

FROM TABLE(QSYS2.HISTORY_LOG_INFO(
    START_TIME => CURRENT_TIMESTAMP - 30 DAYS
)) AS HL

WHERE HL.MESSAGE_ID IN (
    'CPC1206',  -- Subsystem is ending controlled
    'CPC1207',  -- Subsystem ending immediately
    'CPD1418',  -- Subsystem ending
    'CPF0927',  -- Subsystem ended
    'CPF1103',  -- Subsystem started
    'CPF1804'   -- Subsystem starting
)

ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;

 

Use Cases:

  • Monitor subsystem availability
  • Detect unexpected subsystem terminations
  • Track subsystem restart patterns
  • Identify subsystem stability issues

Key Events:

  • CPF1103: Subsystem started (normal operation)
  • CPF1804: Subsystem starting (subsystem initialization)
  • CPC1206: Subsystem is ending controlled (planned shutdown)
  • CPC1207: Subsystem ending immediately (immediate shutdown - investigate)
  • CPD1418: Subsystem ending (subsystem termination in progress)
  • CPF0927: Subsystem ended (subsystem has terminated)

 

4. Message Frequency Analysis

Description: Identify most frequent messages and their patterns

--  category:  IBM i Services
--  description:  Operations - Message frequency analysis
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Message Frequency Analysis
-- Purpose: Identify most common messages:
--          - High-frequency messages
--          - Message trends
--          - Potential noise vs signal
-- =====================================================================

SELECT 
    HL.MESSAGE_ID,
    HL.MESSAGE_TYPE,
    COUNT(*) AS OCCURRENCE_COUNT,
    COUNT(DISTINCT HL.FROM_USER) AS UNIQUE_USERS,
    COUNT(DISTINCT HL.FROM_PROGRAM) AS UNIQUE_PROGRAMS,
    COUNT(DISTINCT DATE(HL.MESSAGE_TIMESTAMP)) AS DAYS_OCCURRED,
    
    -- Sample message text (first occurrence)
    MIN(HL.MESSAGE_TEXT) AS SAMPLE_MESSAGE_TEXT,
    
    -- Time span
    MIN(HL.MESSAGE_TIMESTAMP) AS FIRST_OCCURRENCE,
    MAX(HL.MESSAGE_TIMESTAMP) AS LAST_OCCURRENCE,
    
    -- Calculate frequency
    DECIMAL(COUNT(*) / NULLIF(COUNT(DISTINCT DATE(HL.MESSAGE_TIMESTAMP)), 0), 10, 2) AS AVG_PER_DAY,
    
    -- Classify frequency
    CASE 
        WHEN COUNT(*) >= 1000 THEN 'VERY_HIGH'
        WHEN COUNT(*) >= 500 THEN 'HIGH'
        WHEN COUNT(*) >= 100 THEN 'MEDIUM'
        WHEN COUNT(*) >= 50 THEN 'LOW'
        ELSE 'RARE'
    END AS FREQUENCY_LEVEL,
    
    -- Assess if this is likely noise or signal
    CASE 
        WHEN COUNT(*) >= 1000 AND HL.MESSAGE_TYPE = '*INFO' THEN 'LIKELY_NOISE'
        WHEN COUNT(*) >= 500 AND HL.MESSAGE_TYPE IN ('*ESCAPE', '*NOTIFY') THEN 'INVESTIGATE'
        WHEN COUNT(*) >= 100 THEN 'MONITOR'
        ELSE 'NORMAL'
    END AS ASSESSMENT

FROM TABLE(QSYS2.HISTORY_LOG_INFO(
    START_TIME => CURRENT_TIMESTAMP - 30 DAYS
)) AS HL

GROUP BY HL.MESSAGE_ID, HL.MESSAGE_TYPE

HAVING COUNT(*) >= 50  -- Minimum threshold for analysis

ORDER BY OCCURRENCE_COUNT DESC
WITH UR;

 

Use Cases:

  • Identify noisy messages for filtering
  • Detect abnormal message patterns
  • Optimize history log retention
  • Focus on actionable messages

Frequency Levels:

  • VERY_HIGH (1000+): May need filtering or investigation
  • HIGH (500-999): Monitor for patterns
  • MEDIUM (100-499): Normal operational messages
  • LOW (50-99): Occasional events

 

5. Operational Summary Dashboard

Description: Executive dashboard of operational metrics

--  category:  IBM i Services
--  description:  Operations - Operational summary dashboard
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Operational Summary Dashboard
-- Purpose: High-level operational metrics:
--          - Job completion rates
--          - Error trends
--          - System health indicators
-- =====================================================================

WITH DailyMetrics AS (
    SELECT
        DATE(HL.MESSAGE_TIMESTAMP) AS METRIC_DATE,
        
        -- Job metrics using CPF1164 with End Code extraction
        SUM(CASE
            WHEN HL.MESSAGE_ID = 'CPF1164'
                AND (LOCATE('end code', HL.MESSAGE_TEXT) = 0
                     OR INTEGER(RTRIM(SUBSTR(HL.MESSAGE_TEXT,
                                             LOCATE('end code', HL.MESSAGE_TEXT) + LENGTH('end code ')), '.')) < 20)
            THEN 1
            ELSE 0
        END) AS JOBS_COMPLETED_NORMAL,
        
        SUM(CASE
            WHEN HL.MESSAGE_ID = 'CPF1164'
                AND LOCATE('end code', HL.MESSAGE_TEXT) > 0
                AND INTEGER(RTRIM(SUBSTR(HL.MESSAGE_TEXT,
                                         LOCATE('end code', HL.MESSAGE_TEXT) + LENGTH('end code ')), '.')) >= 20
            THEN 1
            ELSE 0
        END) AS JOBS_COMPLETED_ABNORMAL,
        
        -- Error metrics
        SUM(CASE WHEN HL.MESSAGE_TYPE = '*ESCAPE' THEN 1 ELSE 0 END) AS ESCAPE_MESSAGES,
        SUM(CASE WHEN HL.MESSAGE_ID LIKE 'MCH%' THEN 1 ELSE 0 END) AS MACHINE_ERRORS,
        
        -- Subsystem metrics
        SUM(CASE WHEN HL.MESSAGE_ID = 'CPI0955' THEN 1 ELSE 0 END) AS SUBSYSTEM_ABNORMAL_ENDS,
        
        -- Total message volume
        COUNT(*) AS TOTAL_MESSAGES
        
    FROM TABLE(QSYS2.HISTORY_LOG_INFO(
        START_TIME => CURRENT_TIMESTAMP - 30 DAYS
    )) AS HL
    
    GROUP BY DATE(HL.MESSAGE_TIMESTAMP)
)

SELECT
    DM.METRIC_DATE,
    DM.JOBS_COMPLETED_NORMAL,
    DM.JOBS_COMPLETED_ABNORMAL,
    
    -- Calculate job success rate
    CASE
        WHEN (DM.JOBS_COMPLETED_NORMAL + DM.JOBS_COMPLETED_ABNORMAL) > 0 THEN
            DECIMAL(DM.JOBS_COMPLETED_NORMAL * 100.0 /
                   (DM.JOBS_COMPLETED_NORMAL + DM.JOBS_COMPLETED_ABNORMAL), 5, 2)
        ELSE 0
    END AS JOB_SUCCESS_RATE_PCT,
    
    DM.ESCAPE_MESSAGES,
    DM.MACHINE_ERRORS,
    DM.SUBSYSTEM_ABNORMAL_ENDS,
    DM.TOTAL_MESSAGES,
    
    -- Health score (0-100)
    CASE
        WHEN DM.MACHINE_ERRORS > 10 OR DM.SUBSYSTEM_ABNORMAL_ENDS > 5 THEN 'POOR'
        WHEN DM.ESCAPE_MESSAGES > 100 OR DM.JOBS_COMPLETED_ABNORMAL > 50 THEN 'FAIR'
        WHEN DM.ESCAPE_MESSAGES > 50 OR DM.JOBS_COMPLETED_ABNORMAL > 20 THEN 'GOOD'
        ELSE 'EXCELLENT'
    END AS DAILY_HEALTH_STATUS,
    
    -- Trend vs previous day
    LAG(DM.ESCAPE_MESSAGES, 1) OVER (ORDER BY DM.METRIC_DATE) AS PREV_DAY_ESCAPES,
    
    CASE
        WHEN DM.ESCAPE_MESSAGES > LAG(DM.ESCAPE_MESSAGES, 1) OVER (ORDER BY DM.METRIC_DATE) * 1.5
            THEN 'WORSENING'
        WHEN DM.ESCAPE_MESSAGES < LAG(DM.ESCAPE_MESSAGES, 1) OVER (ORDER BY DM.METRIC_DATE) * 0.5
            THEN 'IMPROVING'
        ELSE 'STABLE'
    END AS TREND

FROM DailyMetrics DM

ORDER BY DM.METRIC_DATE DESC
WITH UR;

 

Use Cases:

  • Daily operational reporting
  • Trend analysis and forecasting
  • Executive dashboards
  • SLA monitoring and compliance

Health Status Criteria:

  • EXCELLENT: Minimal errors, high job success rate
  • GOOD: Normal operational levels
  • FAIR: Elevated error rates, needs monitoring
  • POOR: Critical issues requiring immediate attention

 

6. Device and Hardware Issues

Description: Monitor device errors, hardware failures, and I/O problems

--  category:  IBM i Services
--  description:  Operations - Device and hardware issue monitoring
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Device and Hardware Issue Monitoring
-- Purpose: Track device-related problems:
--          - Device errors and failures
--          - I/O errors
--          - Hardware issues
--          - Communication problems
-- =====================================================================

SELECT 
    HL.MESSAGE_TIMESTAMP,
    HL.MESSAGE_ID,
    HL.MESSAGE_TEXT,
    HL.MESSAGE_SECOND_LEVEL_TEXT,
    HL.FROM_USER,
    HL.FROM_JOB,
    HL.FROM_PROGRAM,
    
    -- Classify device issue type
    CASE 
        WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%device%' THEN 'DEVICE_ERROR'
        WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%I/O%' THEN 'IO_ERROR'
        WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%disk%' THEN 'DISK_ERROR'
        WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%tape%' THEN 'TAPE_ERROR'
        WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%printer%' THEN 'PRINTER_ERROR'
        WHEN HL.MESSAGE_ID LIKE 'CPF%' AND HL.MESSAGE_TEXT LIKE '%communication%' THEN 'COMM_ERROR'
        ELSE 'OTHER_HARDWARE'
    END AS ISSUE_TYPE,
    
    -- Extract device name if present
    CASE 
        WHEN LOCATE('device', LOWER(HL.MESSAGE_TEXT)) > 0 THEN
            TRIM(SUBSTR(HL.MESSAGE_TEXT, 
                 LOCATE('device', LOWER(HL.MESSAGE_TEXT)) + 7, 
                 10))
        ELSE 'UNKNOWN'
    END AS DEVICE_NAME,
    
    -- Severity assessment
    CASE HL.MESSAGE_TYPE
        WHEN '*ESCAPE' THEN 'HIGH'
        WHEN '*NOTIFY' THEN 'MEDIUM'
        ELSE 'LOW'
    END AS SEVERITY

FROM TABLE(QSYS2.HISTORY_LOG_INFO(
    START_TIME => CURRENT_TIMESTAMP - 7 DAYS
)) AS HL

WHERE (HL.MESSAGE_ID LIKE 'CPF%' OR HL.MESSAGE_ID LIKE 'CPI%')
  AND HL.MESSAGE_ID != 'CPF2234'  -- Exclude CPF2234
  AND (LOWER(HL.MESSAGE_TEXT) LIKE '%device%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%i/o%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%disk%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%tape%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%printer%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%hardware%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%communication%')

ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;

 

Use Cases:

  • Monitor hardware health
  • Detect failing devices early
  • Track I/O performance issues
  • Plan hardware maintenance

Common Device Issues:

  • Disk errors (may indicate failing drives)
  • Tape errors (backup/restore problems)
  • Printer errors (output queue issues)
  • Communication errors (network problems)

 

7. Communication and Network Issues

Description: Monitor network errors, communication failures, and connectivity problems

--  category:  IBM i Services
--  description:  Operations - Communication and network issue monitoring
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Communication and Network Issue Monitoring
-- Purpose: Track network-related problems:
--          - Communication errors
--          - Network failures
--          - TCP/IP issues
--          - Line errors
-- =====================================================================

SELECT 
    HL.MESSAGE_TIMESTAMP,
    HL.MESSAGE_ID,
    HL.MESSAGE_TEXT,
    HL.MESSAGE_SECOND_LEVEL_TEXT,
    HL.FROM_USER,
    HL.FROM_JOB,
    HL.FROM_PROGRAM,
    
    -- Classify communication issue
    CASE 
        WHEN HL.MESSAGE_ID LIKE 'TCP%' THEN 'TCP_IP_ERROR'
        WHEN HL.MESSAGE_TEXT LIKE '%network%' THEN 'NETWORK_ERROR'
        WHEN HL.MESSAGE_TEXT LIKE '%communication%' THEN 'COMM_ERROR'
        WHEN HL.MESSAGE_TEXT LIKE '%line%' THEN 'LINE_ERROR'
        WHEN HL.MESSAGE_TEXT LIKE '%connection%' THEN 'CONNECTION_ERROR'
        WHEN HL.MESSAGE_TEXT LIKE '%socket%' THEN 'SOCKET_ERROR'
        ELSE 'OTHER_COMM'
    END AS ISSUE_TYPE,
    
    -- Extract IP or host information if present
    CASE 
        WHEN HL.MESSAGE_TEXT LIKE '%[0-9]%.[0-9]%.[0-9]%.[0-9]%' THEN 'IP_PRESENT'
        WHEN LOCATE('host', LOWER(HL.MESSAGE_TEXT)) > 0 THEN 'HOST_PRESENT'
        ELSE 'NO_ADDRESS'
    END AS ADDRESS_INFO,
    
    -- Time pattern analysis
    HOUR(HL.MESSAGE_TIMESTAMP) AS ERROR_HOUR,
    DAYNAME(HL.MESSAGE_TIMESTAMP) AS DAY_OF_WEEK,
    
    -- Severity
    CASE HL.MESSAGE_TYPE
        WHEN '*ESCAPE' THEN 'HIGH'
        WHEN '*NOTIFY' THEN 'MEDIUM'
        ELSE 'LOW'
    END AS SEVERITY

FROM TABLE(QSYS2.HISTORY_LOG_INFO(
    START_TIME => CURRENT_TIMESTAMP - 7 DAYS
)) AS HL

WHERE (HL.MESSAGE_ID LIKE 'TCP%'
       OR HL.MESSAGE_ID LIKE 'CPF%')
  AND (LOWER(HL.MESSAGE_TEXT) LIKE '%network%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%communication%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%tcp%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%line%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%connection%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%socket%')

ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;

 

Use Cases:

  • Monitor network connectivity
  • Detect communication failures
  • Track TCP/IP issues
  • Identify intermittent network problems

Common Communication Issues:

  • TCP/IP errors: Network protocol problems
  • Connection errors: Failed connections
  • Line errors: Physical line problems
  • Socket errors: Application communication issues

 

8. Save/Restore and Backup Operations

Description: Monitor backup, save, and restore operations

--  category:  IBM i Services
--  description:  Operations - Save/Restore and backup monitoring
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Save/Restore and Backup Operation Monitoring
-- Purpose: Track backup and restore activities:
--          - ENDSBS operations
--          - SAVSYS operations
--          - SAVLIB operations
--          - SAVDLO operations
--          - SAV operations
--          - Backup failures and warnings
-- =====================================================================

SELECT
    HL.MESSAGE_TIMESTAMP,
    HL.MESSAGE_ID,
    HL.MESSAGE_TEXT,
    HL.MESSAGE_SECOND_LEVEL_TEXT,
    HL.FROM_USER,
    HL.FROM_JOB,
    HL.FROM_PROGRAM,
    
    -- Classify operation type
    CASE
        WHEN HL.MESSAGE_ID IN ('CPF0994', 'CPF0968') THEN 'ENDSBS'
        WHEN HL.MESSAGE_ID IN ('CPC3702', 'CPF3772') THEN 'SAVSYS'
        WHEN HL.MESSAGE_ID IN ('CPC3701', 'CPC3707', 'CPC3722', 'CPC3704', 'CPF3751', 'CPF3763', 'CPF3770', 'CPF3771', 'CPF3777') THEN 'SAVLIB'
        WHEN HL.MESSAGE_ID IN ('CPC9410', 'CPF9410', 'CPF902E') THEN 'SAVDLO'
        WHEN HL.MESSAGE_ID IN ('CPC370C', 'CPF3837') THEN 'SAV'
        ELSE 'OTHER_BACKUP'
    END AS OPERATION_TYPE,
    
    -- Operation status
    CASE
        WHEN HL.MESSAGE_ID IN ('CPC3702', 'CPC3701', 'CPC3707', 'CPC3722', 'CPC9410', 'CPC370C') THEN 'SUCCESS'
        WHEN HL.MESSAGE_ID IN ('CPC3704', 'CPF3772', 'CPF3751', 'CPF3771', 'CPF3777', 'CPF9410', 'CPF902E', 'CPF3837') THEN 'PARTIAL'
        WHEN HL.MESSAGE_ID IN ('CPF3763', 'CPF3770') THEN 'WARNING'
        WHEN HL.MESSAGE_ID IN ('CPF0994', 'CPF0968') THEN 'SYSTEM_END'
        ELSE 'UNKNOWN'
    END AS OPERATION_STATUS,
    
    -- Extract library or object count if present
    CASE
        WHEN LOCATE('objects', LOWER(HL.MESSAGE_TEXT)) > 0 THEN
            TRIM(SUBSTR(HL.MESSAGE_TEXT, 1, LOCATE('objects', LOWER(HL.MESSAGE_TEXT)) + 7))
        WHEN LOCATE('libraries', LOWER(HL.MESSAGE_TEXT)) > 0 THEN
            TRIM(SUBSTR(HL.MESSAGE_TEXT, 1, LOCATE('libraries', LOWER(HL.MESSAGE_TEXT)) + 9))
        ELSE 'N/A'
    END AS OPERATION_DETAIL,
    
    -- Time analysis (backups often run at specific times)
    HOUR(HL.MESSAGE_TIMESTAMP) AS OPERATION_HOUR,
    
    -- Severity
    CASE
        WHEN HL.MESSAGE_ID LIKE 'CPC%' THEN 'NORMAL'
        WHEN HL.MESSAGE_ID IN ('CPF3772', 'CPF3751', 'CPF3771', 'CPF3777', 'CPF9410', 'CPF902E', 'CPF3837') THEN 'WARNING'
        WHEN HL.MESSAGE_ID IN ('CPF3763', 'CPF3770') THEN 'MEDIUM'
        ELSE 'HIGH'
    END AS SEVERITY

FROM TABLE(QSYS2.HISTORY_LOG_INFO(
    START_TIME => CURRENT_TIMESTAMP - 30 DAYS
)) AS HL

WHERE HL.MESSAGE_ID IN (
    -- ENDSBS
    'CPF0994', 'CPF0968',
    -- SAVSYS
    'CPC3702', 'CPF3772',
    -- SAVLIB
    'CPC3701', 'CPC3707', 'CPC3722', 'CPC3704', 'CPF3751', 'CPF3763', 'CPF3770', 'CPF3771', 'CPF3777',
    -- SAVDLO
    'CPC9410', 'CPF9410', 'CPF902E',
    -- SAV
    'CPC370C', 'CPF3837'
)

ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;

 

Use Cases:

  • Monitor backup success rates
  • Track save operations (SAVSYS, SAVLIB, SAVDLO, SAV)
  • Identify partial saves or failures
  • Verify backup schedules
  • Monitor system end operations (ENDSBS)

Key Message IDs:

ENDSBS:

  • CPF0994: ENDSBS SBS(*ALL) command being processed
  • CPF0968: System ended to restricted condition

SAVSYS:

  • CPC3702: Save System (SAVSYS) command completed
  • CPF3772: SAVSYS completed with objects not saved

SAVLIB:

  • CPC3701: Objects saved from library (success)
  • CPC3707: Libraries saved (success)
  • CPC3722: Objects saved from library (success)
  • CPC3704: Objects saved with some not included (partial)
  • CPF3751: Some libraries not saved (warning)
  • CPF3763: Previously damaged objects (warning)
  • CPF3770: No objects saved or restored (warning)
  • CPF3771: Objects saved with some not saved (partial)
  • CPF3777: Libraries saved with some partially saved (partial)

SAVDLO:

  • CPC9410: Document library objects saved (success)
  • CPF9410: Document library objects saved with some not saved (partial)
  • CPF902E: Document library objects saved with some not saved (partial)

SAV:

  • CPC370C: Objects saved (success)
  • CPF3837: Objects saved with some not saved (partial)

 

9. Performance and Resource Issues

Description: Identify performance problems and resource constraints

--  category:  IBM i Services
--  description:  Operations - Performance and resource issue monitoring
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Performance and Resource Issue Monitoring
-- Purpose: Track performance-related problems:
--          - Storage issues
--          - Memory problems
--          - CPU constraints
--          - ASP issues
-- =====================================================================

SELECT 
    HL.MESSAGE_TIMESTAMP,
    HL.MESSAGE_ID,
    HL.MESSAGE_TEXT,
    HL.MESSAGE_SECOND_LEVEL_TEXT,
    HL.FROM_USER,
    HL.FROM_JOB,
    HL.FROM_PROGRAM,
    
    -- Classify performance issue
    CASE 
        WHEN HL.MESSAGE_TEXT LIKE '%storage%' OR HL.MESSAGE_TEXT LIKE '%ASP%' THEN 'STORAGE_ISSUE'
        WHEN HL.MESSAGE_TEXT LIKE '%memory%' THEN 'MEMORY_ISSUE'
        WHEN HL.MESSAGE_TEXT LIKE '%CPU%' OR HL.MESSAGE_TEXT LIKE '%processor%' THEN 'CPU_ISSUE'
        WHEN HL.MESSAGE_TEXT LIKE '%threshold%' THEN 'THRESHOLD_EXCEEDED'
        WHEN HL.MESSAGE_TEXT LIKE '%capacity%' THEN 'CAPACITY_ISSUE'
        WHEN HL.MESSAGE_TEXT LIKE '%performance%' THEN 'PERFORMANCE_DEGRADATION'
        ELSE 'OTHER_RESOURCE'
    END AS ISSUE_TYPE,
    
    -- Extract percentage or threshold if present
    CASE 
        WHEN HL.MESSAGE_TEXT LIKE '%[0-9]%percent%' OR HL.MESSAGE_TEXT LIKE '%[0-9]%%%' THEN 'PERCENTAGE_PRESENT'
        ELSE 'NO_PERCENTAGE'
    END AS METRIC_INFO,
    
    -- Severity based on message type and content
    CASE 
        WHEN HL.MESSAGE_TYPE = '*ESCAPE' THEN 'CRITICAL'
        WHEN HL.MESSAGE_TEXT LIKE '%90%' OR HL.MESSAGE_TEXT LIKE '%95%' THEN 'HIGH'
        WHEN HL.MESSAGE_TEXT LIKE '%80%' OR HL.MESSAGE_TEXT LIKE '%85%' THEN 'MEDIUM'
        ELSE 'LOW'
    END AS SEVERITY

FROM TABLE(QSYS2.HISTORY_LOG_INFO(
    START_TIME => CURRENT_TIMESTAMP - 7 DAYS
)) AS HL

WHERE (LOWER(HL.MESSAGE_TEXT) LIKE '%storage%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%asp%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%memory%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%cpu%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%threshold%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%capacity%'
       OR LOWER(HL.MESSAGE_TEXT) LIKE '%performance%')

ORDER BY HL.MESSAGE_TIMESTAMP DESC
WITH UR;

 

Use Cases:

  • Monitor system resource utilization
  • Detect capacity issues early
  • Track performance degradation
  • Plan capacity upgrades

Common Performance Issues:

  • Storage issues: ASP threshold exceeded
  • Memory issues: Insufficient memory
  • CPU issues: High CPU utilization
  • Capacity issues: Resource constraints

 

10. Comprehensive Message Pattern Analysis

Description: Advanced pattern analysis across all message types

--  category:  IBM i Services
--  description:  Operations - Comprehensive message pattern analysis
--  minvrm:  v7r3m0

-- =====================================================================
-- Query: Comprehensive Message Pattern Analysis
-- Purpose: Analyze message patterns across categories:
--          - Identify trending issues
--          - Detect anomalies
--          - Correlate related problems
-- =====================================================================

WITH MessageCategories AS (
    SELECT 
        HL.MESSAGE_TIMESTAMP,
        HL.MESSAGE_ID,
        HL.MESSAGE_TYPE,
        HL.FROM_USER,
        HL.FROM_JOB,
        DATE(HL.MESSAGE_TIMESTAMP) AS MESSAGE_DATE,
        HOUR(HL.MESSAGE_TIMESTAMP) AS MESSAGE_HOUR,
        
        -- Categorize messages
        CASE 
            WHEN HL.MESSAGE_ID IN ('CPC1221', 'CPC1224', 'CPF1164', 'CPF1124') THEN 'JOB_CONTROL'
            WHEN HL.MESSAGE_ID LIKE 'CPI095%' THEN 'SUBSYSTEM'
            WHEN HL.MESSAGE_ID LIKE 'SQL%' OR HL.MESSAGE_ID LIKE 'CPF50%' THEN 'DATABASE'
            WHEN HL.MESSAGE_ID LIKE 'TCP%' OR LOWER(HL.MESSAGE_TEXT) LIKE '%network%' THEN 'NETWORK'
            WHEN HL.MESSAGE_ID LIKE 'CPC37%' OR HL.MESSAGE_ID LIKE 'CPC38%' THEN 'BACKUP_RESTORE'
            WHEN LOWER(HL.MESSAGE_TEXT) LIKE '%device%' OR LOWER(HL.MESSAGE_TEXT) LIKE '%hardware%' THEN 'HARDWARE'
            WHEN LOWER(HL.MESSAGE_TEXT) LIKE '%storage%' OR LOWER(HL.MESSAGE_TEXT) LIKE '%asp%' THEN 'STORAGE'
            WHEN HL.MESSAGE_TYPE = '*ESCAPE' THEN 'ERROR'
            ELSE 'OTHER'
        END AS MESSAGE_CATEGORY
        
    FROM TABLE(QSYS2.HISTORY_LOG_INFO(
        START_TIME => CURRENT_TIMESTAMP - 14 DAYS
    )) AS HL
)

SELECT 
    MC.MESSAGE_DATE,
    MC.MESSAGE_CATEGORY,
    COUNT(*) AS MESSAGE_COUNT,
    COUNT(DISTINCT MC.MESSAGE_ID) AS UNIQUE_MESSAGE_IDS,
    COUNT(DISTINCT MC.FROM_USER) AS UNIQUE_USERS,
    
    -- Calculate hourly distribution
    SUM(CASE WHEN MC.MESSAGE_HOUR BETWEEN 0 AND 5 THEN 1 ELSE 0 END) AS OVERNIGHT_COUNT,
    SUM(CASE WHEN MC.MESSAGE_HOUR BETWEEN 6 AND 11 THEN 1 ELSE 0 END) AS MORNING_COUNT,
    SUM(CASE WHEN MC.MESSAGE_HOUR BETWEEN 12 AND 17 THEN 1 ELSE 0 END) AS AFTERNOON_COUNT,
    SUM(CASE WHEN MC.MESSAGE_HOUR BETWEEN 18 AND 23 THEN 1 ELSE 0 END) AS EVENING_COUNT,
    
    -- Trend analysis
    LAG(COUNT(*), 1) OVER (PARTITION BY MC.MESSAGE_CATEGORY ORDER BY MC.MESSAGE_DATE) AS PREV_DAY_COUNT,
    
    CASE 
        WHEN COUNT(*) > LAG(COUNT(*), 1) OVER (PARTITION BY MC.MESSAGE_CATEGORY ORDER BY MC.MESSAGE_DATE) * 2 
            THEN 'SPIKE'
        WHEN COUNT(*) > LAG(COUNT(*), 1) OVER (PARTITION BY MC.MESSAGE_CATEGORY ORDER BY MC.MESSAGE_DATE) * 1.5 
            THEN 'INCREASING'
        WHEN COUNT(*) < LAG(COUNT(*), 1) OVER (PARTITION BY MC.MESSAGE_CATEGORY ORDER BY MC.MESSAGE_DATE) * 0.5 
            THEN 'DECREASING'
        ELSE 'STABLE'
    END AS TREND,
    
    -- Risk assessment
    CASE 
        WHEN MC.MESSAGE_CATEGORY IN ('ERROR', 'STORAGE', 'DATABASE') AND COUNT(*) > 100 THEN 'HIGH'
        WHEN COUNT(*) > 500 THEN 'MEDIUM'
        ELSE 'LOW'
    END AS RISK_LEVEL

FROM MessageCategories MC

GROUP BY MC.MESSAGE_DATE, MC.MESSAGE_CATEGORY

HAVING COUNT(*) >= 10  -- Minimum threshold

ORDER BY MC.MESSAGE_DATE DESC, MESSAGE_COUNT DESC
WITH UR;

 

Use Cases:

  • Identify emerging issues across categories
  • Detect unusual activity patterns
  • Correlate related problems
  • Executive-level trend reporting

Pattern Indicators:

  • SPIKE: Sudden increase (2x previous day)
  • INCREASING: Growing trend (1.5x previous day)
  • DECREASING: Declining trend (0.5x previous day)
  • STABLE: Normal variation

 

Best Practices and Guidelines

Performance Optimization

  1. Always specify START_TIME: Limits data scanned

    START_TIME => CURRENT_TIMESTAMP - 7 DAYS
  2. Use appropriate time windows:
    • Real-time monitoring: Last 1-2 hours
    • Daily analysis: Last 24 hours
    • Trend analysis: Last 7-30 days
    • Historical review: Last 90 days (if retained)
  3. Filter early in WHERE clause: Reduces result set

    WHERE MESSAGE_ID IN ('CPC1221', 'CPC1224')
  4. Use WITH UR: Avoids locking for read-only queries

    WITH UR;

 

Query Guidelines

  1. History log retention: Default is typically 30 days (QHSTLOGSIZ)
  2. Table function syntax: Always use TABLE() wrapper
  3. Message types: *ESCAPE, *NOTIFY, *INFO, *STATUS, *DIAGNOSTIC
  4. Time zones: Timestamps are in system time zone
  5. Job name format: number/user/jobname

 

Operational Monitoring Strategy

  1. Real-time (every 15-30 minutes):
    • Job failures
    • Subsystem abnormal ends
    • Critical system errors
  2. Daily:
    • Job completion analysis
    • Error trend review
    • Operational summary dashboard
  3. Weekly:
    • Message frequency analysis
    • Program failure patterns
    • Trend analysis
  4. Monthly:
    • Historical trend reports
    • Capacity planning metrics
    • Performance baseline updates

 

Common Message ID Reference

Job Control

  • CPF1124: Job started
  • CPF1164: Job ended
  • CPC1221: Job completed normally
  • CPC1224: Job completed abnormally
  • CPF1301: Job ended abnormally
  • CPF1302: Job ended abnormally

Subsystem Control

  • CPI0953: Subsystem started
  • CPI0954: Subsystem ended normally
  • CPI0955: Subsystem ended abnormally

System Errors

  • CPFxxxx: System errors (various)
  • MCHxxxx: Machine check errors
  • RNXxxxx: Runtime errors
  • SQLxxxx: SQL errors

 

Enhanced Message ID Reference

Job Control Messages

  • CPF1124: Job started
  • CPF1164: Job ended
  • CPC1221: Job completed normally
  • CPC1224: Job completed abnormally
  • CPF1301: Job ended abnormally
  • CPF1302: Job ended abnormally
  • CPF1321: Job ended abnormally
  • CPF1322: Job ended abnormally

Subsystem Messages

  • CPI0953: Subsystem started
  • CPI0954: Subsystem ended normally
  • CPI0955: Subsystem ended abnormally
  • CPF0001: Error in subsystem

Database and File Messages

  • SQL0xxx: SQL errors (various)
  • CPF5001: Duplicate key value
  • CPF5006: Record format not found
  • CPF5025: Duplicate record key
  • CPF5029: Member not found
  • CPF5035: Data mapping error
  • CPF50xx: File operation errors

Communication Messages

  • TCP0xxx: TCP/IP errors
  • CPFxxxx: Communication errors (various)

Save/Restore Messages

  • CPC3702: Objects saved
  • CPC3703: Save operation completed
  • CPC3708: Objects restored
  • CPC3709: Restore operation completed
  • CPF37xx: Save operation errors
  • CPF38xx: Restore operation errors

Device and Hardware Messages

  • Device-related CPF messages (identified by message text)
  • I/O error messages
  • Hardware failure messages

Performance Messages

  • Storage threshold messages
  • ASP capacity messages
  • Memory constraint messages
  • CPU utilization messages

 

Updated Monitoring Strategy

Critical (Check every 15-30 minutes)

  • Job abnormal completions (CPC1224)
  • Subsystem abnormal ends (CPI0955)
  • Storage threshold exceeded
  • Database errors (SQL, CPF50xx)
  • Hardware failures

High Priority (Check hourly)

  • Communication errors
  • Device issues
  • Backup/restore failures
  • Performance degradation

Medium Priority (Check daily)

  • Job completion trends
  • Message frequency patterns
  • Error trends by category

Low Priority (Check weekly/monthly)

  • Historical trend analysis
  • Capacity planning metrics
  • Long-term pattern analysis

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SWG60","label":"IBM i"},"ARM Category":[{"code":"a8mgJ00000005kTQAQ","label":"IBM i-\u003EWork Management"}],"ARM Case Number":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"and future releases;7.4.0;7.5.0;7.6.0"}]

Document Information

Modified date:
05 May 2026

UID

ibm17271431