IBM Support

How to fix the issue about null value in column "end_time" in postgresql database log

Troubleshooting


Problem

How to fix the issue about null value in column "end_time" in postgresql database log?

Symptom

The detail log is as below:
2018-01-26 00:00:00 CET xcatdb xcatadm ERROR: null value in column "end_time" violates not-null constraint
2018-01-26 00:00:00 CET xcatdb xcatadm STATEMENT: INSERT INTO LSF_FINISH_JOB (JOB_ID, JOB_ARRAY_INDEX, JOB_NAME, USER_NAME, JOB_NAME_FULL, QUEUE, SUBMIT_TIME, START_TIME, END_TIME, EXEC_HOSTSTR, PRIORITY, FROM_HOST, COMMAND, IN_FILE, OUT_FILE, PROJECT_NAME, CWD, EXECCWD, ASKHOSTSTR, APPLICATION, DESCRIPTION, CLUSTER_NAME, SUBMIT_TIME_GMT, START_TIME_GMT, USER_GROUP, JOB_GROUP, RES_REQ, EXIT_STATUS, FROM_HOST_CLUSTER, PEND_REASONS, JOBPID, LICENSEPROJECT, RESERVETIME, ESTIMATEDRUNTIME, EXECHOME, EXECUSER, MEM, SWAP, NTHREADS, PGIDSTR, PIDSTR, JOB_STATUS, NUM_PROCESSORS, RU_UTIME, RU_STIME, CPU_TIME, NUM_SLOTS, NUM_EXEC_PROCS, CPU_UTIL, WORKLOAD_TYPE, DURATION, PATH, CREATE_TIME_GMT, GRAPHIC_SESSION_ID, GRAPHIC_HOST, APPLICATION_NAME, PATH_ALIAS, NOTIFICATION, BPOST_MSG, RUNTIME, FLOW_ID, END_TIME_GMT, PATH_REMOVABLE, INSERT_SEQ) SELECT JOB_ID, JOB_ARRAY_INDEX, JOB_NAME, USER_NAME, JOB_NAME_FULL, QUEUE, SUBMIT_TIME, START_TIME, END_TIME, EXEC_HOSTSTR, PRIORITY, FROM_HOST, COMMAND, IN_FILE, OUT_FILE, PROJECT_NAME, CWD, EXECCWD, ASKHOSTSTR, APPLICATION, DESCRIPTION, CLUSTER_NAME, SUBMIT_TIME_GMT, START_TIME_GMT, USER_GROUP, JOB_GROUP, RES_REQ, EXIT_STATUS, FROM_HOST_CLUSTER, PEND_REASONS, JOBPID, LICENSEPROJECT, RESERVETIME, ESTIMATEDRUNTIME, EXECHOME, EXECUSER, MEM, SWAP, NTHREADS, PGIDSTR, PIDSTR, JOB_STATUS, NUM_PROCESSORS, RU_UTIME, RU_STIME, CPU_TIME, NUM_SLOTS, NUM_EXEC_PROCS, CPU_UTIL, WORKLOAD_TYPE, DURATION, PATH, CREATE_TIME_GMT, GRAPHIC_SESSION_ID, GRAPHIC_HOST, APPLICATION_NAME, PATH_ALIAS, NOTIFICATION, BPOST_MSG, RUNTIME, FLOW_ID, END_TIME_GMT, PATH_REMOVABLE, INSERT_SEQ FROM LSF_LIVE_JOB a WHERE a.JOB_ID = $1 AND a.JOB_ARRAY_INDEX = $2 AND a.CLUSTER_NAME = $3

Cause

When this issue happens, there are corresponding errors in commondataloader log as below:
2018-01-26 00:00:00.878 CET WARN com.platform.perf.dataloader.lsf.commonjobloader.RecoverJobs.recoverJobs(RecoverJobs.java:112) RecoverJobs: Exception when moving jobs. org.postgresql.util.PSQLException: ERROR: null value in column "end_time" violates not-null constraint

This issue belongs to the FinishedJobThread of commonjobloader.

Diagnosing The Problem

Because there are some dangling records with null end_time in LSF_LIVE_JOB table, to resolve this issue, the dangling records need be found out and be deleted.

Resolving The Problem

According to the below steps, the issue can be resolved:

1. Open the debug log, you can see the detail job ID information for the dangling records (Dangling job ID are 159017 and 159018).


2018-01-26 00:00:00.876 CET DEBUG com.platform.perf.dataloader.lsf.commonjobloader.RecoverJobs.recoverJobs(RecoverJobs.java:89) Dangling job found: 159017:0
2018-01-26 00:00:00.876 CET DEBUG com.platform.perf.dataloader.lsf.commonjobloader.RecoverJobs.recoverJobs(RecoverJobs.java:89) Dangling job found: 159018:0

2. Enter the database admin command, execute this command: #psql -U xcatadm xcatdb

3. Run the below command to find out the dangling records,


xcatdb=> select job_id,job_name,command,submit_time,start_time,end_time from LSF_LIVE_JOB where JOB_ID=159017;
 job_id | job_name | command  |     submit_time     |     start_time      |      end_time      
--------+----------+----------+---------------------+---------------------+---------------------
 159017 | Wal_opti |#!/bin/bash;... | 2017-06-16 09:25:31 | 2017-06-16 09:25:32 |

xcatdb=> select job_id,job_name,command,submit_time,start_time,end_time from LSF_LIVE_JOB where JOB_ID=159018;
 job_id | job_name | command  |     submit_time     |     start_time      |      end_time      
--------+----------+----------+---------------------+---------------------+---------------------
 159018 | Wal_opti |#!/bin/bash;.. | 2017-06-16 09:25:31 | 2017-06-16 09:25:32 |
4. As you see, <1> The commonjobloader debug log show this two records can't be moved to finished table; <2> According to the database checking, you can find that the submit_time and start_time of two records are very old from now. So you can make a conclusion that the two records are dangling records which need be removed from database. After the below database action, the issue is fixed:
xcatdb=> DELETE FROM LSF_LIVE_JOB WHERE JOB_ID = 159017;
xcatdb=> DELETE FROM LSF_LIVE_JOB WHERE JOB_ID = 159018;
[{"Product":{"code":"SSENRW","label":"Platform HPC for System x"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Database-Postgres","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.2","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

More support for:
Platform HPC for System x

Software version:
4.2

Operating system(s):
Linux

Document number:
634031

Modified date:
17 June 2018

UID

isg3T1026936