Question & Answer
Question
This document describes:
- Various scenarios that prevent a core dump from being generated when a process terminates abnormally
- How to avoid the problems with the chcore and syscorepath commands.
Answer
Introduction
The default core dump facility on AIX normally creates a file named core in the current working directory for the process that terminated abnormally. If a core file is created successfully, a CORE_DUMP entry is written into the error report. Sometimes a core file is not created, and a CORE_DUMP_FAILED error might be added to the error report to log the failure. This error contains a reason code that can be used to help determine why the core file was not created. The reason code is an errno code, a system error code that is used to report errors from library functions. errno codes are listed in the AIX header file /usr/include/sys/errno.h.
Some of the causes for core dump failure can be avoided by configuring the core dump facility with the chcore command or the older syscorepath command. These commands enable a user to set up a directory where all core files will be written. If the chcore -n on option is used, the syscorepath and chcore commands will create unique core file names with the following format:
core.pid.ddhhmmss (where pid is the process ID)
dd: Day of the month,
hh: Hour in 24-hour format
mm: Minutes
ss: Seconds.
See the man pages for chcore and syscorepath for details, and the AIX Core Dump Facility technical note.
CORE_DUMP_FAILED Error
The following output is an example CORE_DUMP_FAILED error. Note the REASON CODE field near the bottom of the entry.
The SIGNAL NUMBER section contains the signal that caused the program to terminate. These signals can be listed by running the command kill -l. The CORE FILE NAME section contains the location and name of the core file that would have been written if there was no failure. The PROGRAM NAME section contains the name of the program that terminated. The REASON CODE section contains an errno constant that can be used to diagnose the cause of the core dump failure. The errno constants can be viewed in the file /usr/include/sys/errno.h. Only some of the errno codes are used as reason codes.
Note: On some older versions of AIX, the Probable Causes section contains the line "SYSTEM RUNNING OUT OF PAGING SPACE", and the Recommended Actions section contains the line "DEFINE ADDITIONAL PAGING SPACE". These messages are misleading and can be ignored.
errno Codes
Here are some of the errno codes that could be listed in a CORE_DUMP_FAILED error. The most common codes are in bold text.
Failure Scenarios
The following table contains various scenarios that can keep a core file from being created when a process terminates abnormally. For each scenario, information is provided about the CORE_DUMP_FAILED error if one is added to the error report.
CORE_DUMP_FAILED Error
The following output is an example CORE_DUMP_FAILED error. Note the REASON CODE field near the bottom of the entry.
LABEL: CORE_DUMP_FAILEDIDENTIFIER: 45C7A35BDate/Time: Mon Jan 17 14:15:43 MST Sequence Number: 39603Machine Id: 0008ADAA4C00Node Id: p620Class: SType: PERMResource Name: SYSPROC DescriptionSOFTWARE PROGRAM ABNORMALLY TERMINATEDProbable CausesINTERNAL SOFTWARE ERRORUser CausesUSER GENERATED SIGNALFailure CausesCORE DUMP FAILED - SEE A REASON CODE BELOW Recommended Actions RERUN THE APPLICATION PROGRAM IF PROBLEM PERSISTS THEN DO THE FOLLOWING CONTACT APPROPRIATE SERVICE REPRESENTATIVEDetail DataSIGNAL NUMBER 11USER'S PROCESS ID: 57812REASON CODE 11USER ID 232PROCESSOR ID 0CORE FILE NAME/u1/GA.PROD/corePROGRAM NAMEuvshThe SIGNAL NUMBER section contains the signal that caused the program to terminate. These signals can be listed by running the command kill -l. The CORE FILE NAME section contains the location and name of the core file that would have been written if there was no failure. The PROGRAM NAME section contains the name of the program that terminated. The REASON CODE section contains an errno constant that can be used to diagnose the cause of the core dump failure. The errno constants can be viewed in the file /usr/include/sys/errno.h. Only some of the errno codes are used as reason codes.
Note: On some older versions of AIX, the Probable Causes section contains the line "SYSTEM RUNNING OUT OF PAGING SPACE", and the Recommended Actions section contains the line "DEFINE ADDITIONAL PAGING SPACE". These messages are misleading and can be ignored.
errno Codes
Here are some of the errno codes that could be listed in a CORE_DUMP_FAILED error. The most common codes are in bold text.
#define EPERM 1 /* Operation not permitted */ #define EIO 5 /* I/O error */ #define EAGAIN 11 /* Resource temporarily unavailable */ #define EACCES 13 /* Permission denied */ #define EBUSY 16 /* Resource busy */ #define EEXIST 17 /* File exists */ #define ENFILE 23 /* Too many open files in system */ #define EMFILE 24 /* Too many open files */ #define EFBIG 27 /* File too large */ #define ENOSPC 28 /* No space left on device */ Failure Scenarios
The following table contains various scenarios that can keep a core file from being created when a process terminates abnormally. For each scenario, information is provided about the CORE_DUMP_FAILED error if one is added to the error report.
| Scenario | CORE_DUMP_FAILED |
| There is not enough space in the file system to write the core file. | REASON CODE ENOSPC 28 |
| The ulimit for core is set to 0 in the account where the program is running. This disables core file creation. | REASON CODE EPERM 1 CORE FILE NAME blank |
| The process sets a current working directory where it does not have write permissions. Since the core file is written into the current working directory, the core file cannot be written. Note: Use the chcore or syscorepath command to avoid this failure. |
REASON CODE EACCES 13 CORE FILE NAME path (path to where the system attempted to write the core file) |
| By default, all core files that are generated on an AIX system will have the name core. If a process is core dumping and the core file is being written, and another process terminates and attempts to write a core file in the same directory, the file core will be busy and the second process will not be able to write to the file. Note: Use the chcore or syscorepath command and unique core file naming to avoid this failure. |
REASON CODE EAGAIN 11 OR EACCES 13 |
| The process has set the SA_NODUMP flag in the call to sigaction(). You would need the source code for the program to verify that this is the reason for the core dump failure. Any program can prevent a core dump by setting this flag in a sigaction request. | REASON CODE EPERM 1 |
| If the suid or sgid bit is set on the executable, then it is possible that a core file will not be created. This can happen if the real user or group id is not identical to the effective user or group id. Notes See Example 1 |
REASON CODE EPERM 1 CORE FILE NAME blank |
| A process attempts to write a core file into a directory where a core file already exists and the ownership and permissions on the file do not allow it to be overwritten. Notes See Example 2 Use the chcore or syscorepath command to avoid this failure. |
REASON CODE EACCES 13 CORE FILE NAME path (path to where the system attempted to write the core file) |
| A process attempts to write a core file into a directory where a core file already exists. This core file is owned by another user but has write permissions enabled on either group or other. The attempt to write the new core file results in the core file being zeroed out. Notes See Example 3 Use the chcore or syscorepath command to avoid this failure. |
REASON CODE EPERM 1 CORE FILE NAME path (path to where the system attempted to write the core file) Note: Some versions of AIX might not add the CORE_DUMP_FAILED entry to the error report. |
| A process traps the signal whose default action is to create a core file but does not call the abort() function to actually create the core file. | None |
| A process ignores a signal that would, by default, generate a core file. Notes See Example 4 |
None |
Example 1
If the suid or sgid bit is set on the executable, then a core file may not be created. This can happen if the real user or group id is not identical to the effective user or group id. According to the man pages for core, a core dump is not be created if the saved user id and the effective user id are not the same, or if the saved group id and the effective group id are not the same.
chmod +s program.exeThis command turns on both suid and sgid. This prevents creation of a core file.
chmod u+s program.exeThis command will turn on only suid.
If sgid is turned on, then the core file is not created, because the real group id and the effective group id is not the same.
- Example A
Permissions of program.exe are root:fnusr, 0755
chmod +s program.exe
Permissions of program.exe are root:fnusr, 6755
From root, execute program.exe:
Real/Saved user id : root
Effective user id : root
Real/Saved group id : system
Effective group id : fnusr
Note: The saved group id is not the same as the effective group id, so no core file is created. - Example B
Permissions of program.exe are root:fnusr, 0755
chmod u+s program.exe
Permissions of program.exe are root:fnusr, 4755
From root, execute program.exe:
Real/Saved user id : root
Effective user id : root
Real/Saved group id : system
Effective group id : system
Note: The saved and effective user ids are the same, and the saved and effective group ids are the same, so a core file is created.
Example 2
A process attempts to write a core file into a directory where a core file already exists, and the ownership and permissions on the file do not allow it to be overwritten.
$ ls -l core-rw-r--r-- 1 rej staff 769727 Oct 04 08:59 core$ iduid=709(chris) gid=1(staff)$ sleep 100 &[1] 352458$ kill -6 352458$[1] + IOT/Abort trap sleep 100 &$ ls -l core-rw-r--r-- 1 rej staff 769727 Oct 04 08:59 core$ errpt -aJ CORE_DUMP_FAILED---------------------------------------------------------------------------LABEL: CORE_DUMP_FAILEDIDENTIFIER: FAA1D46FDate/Time: Tue Oct 4 09:04:01 CDT 2005Sequence Number: 543Machine Id: 000870664C00Node Id: vegasClass: SType: PERMResource Name: SYSPROCDescriptionSOFTWARE PROGRAM ABNORMALLY TERMINATEDProbable CausesINTERNAL SOFTWARE ERRORUser CausesUSER GENERATED SIGNALFailure CausesCORE DUMP FAILED - SEE A REASON CODE BELOW Recommended Actions RERUN THE APPLICATION PROGRAM IF PROBLEM PERSISTS THEN DO THE FOLLOWING CONTACT APPROPRIATE SERVICE REPRESENTATIVEDetail DataSIGNAL NUMBER 6USER'S PROCESS ID: 352458REASON CODE 13USER ID 709PROCESSOR ID -1CORE FILE NAME/home/chris/corePROGRAM NAMEsleepExample 3
A process attempts to write a core file into a directory where a core file already exists. This core file is owned by another user but has write permissions enabled on either group or other. The attempt to write the new core file results in the core file being zeroed out.
$ ls -l core-rw-rw-r-- 1 rej staff 769727 Oct 04 08:49 core$ iduid=709(chris) gid=1(staff)$ sleep 100 &[1] 237786$ kill -6 237786$[1] + IOT/Abort trap sleep 100 &$ ls -l core-rw-rw-r-- 1 rej staff 0 Oct 04 08:52 core$ errpt -aJ CORE_DUMP_FAILED---------------------------------------------------------------------------LABEL: CORE_DUMP_FAILEDIDENTIFIER: FAA1D46FDate/Time: Tue Oct 4 08:52:36 CDT 2005Sequence Number: 541Machine Id: 000870664C00Node Id: vegasClass: SType: PERMResource Name: SYSPROCDescriptionSOFTWARE PROGRAM ABNORMALLY TERMINATEDProbable CausesINTERNAL SOFTWARE ERRORUser CausesUSER GENERATED SIGNALFailure CausesCORE DUMP FAILED - SEE A REASON CODE BELOW Recommended Actions RERUN THE APPLICATION PROGRAM IF PROBLEM PERSISTS THEN DO THE FOLLOWING CONTACT APPROPRIATE SERVICE REPRESENTATIVEDetail DataSIGNAL NUMBER 6USER'S PROCESS ID: 237786REASON CODE 1USER ID 709PROCESSOR ID -1CORE FILE NAME/home/chris/corePROGRAM NAMEsleepExample 4
A process ignores a signal that would, by default, generate a core file. We can determine if a signal is ignored by using the procsig command.
This command will list all signal actions defined for process 237786:
procsig 237786The output of this command might look like this:HUP caught INT caught QUIT caught ILL caught TRAP caught ABRT caught EMT caught FPE caught KILL default RESTART BUS caught SEGV default SYS caught PIPE caught ALRM caught TERM ignored URG default STOP default TSTP ignored CONT default ...chcore and syscorepath
To avoid some of the problems which can cause a core file to not be generated, the chcore or syscorepath commands can be used to direct core files to be written into a user specified directory. In this example, the directory where the core files are copied is /tmp/corefiles.
chcore -p on -n on -l /tmp/corefiles -dThe older syscorepath command can also be used to direct core files to a central location. Unlike chcore, syscorepath can be used to generate core files from suid and sgid executable files.
syscorepath -p /tmp/corefilesSee the man pages for these commands for more details, and the AIX Core Dump Facility technical note.
Conclusion
Normally a core file is written when a process terminates abnormally. The core file can be analyzed to help determine why the process failed. However, there are a number of scenarios that will prohibit a core file from being created. In some of these cases, a CORE_DUMP_FAILED entry is written into the error report. The REASON CODE section in this entry can be used to determine why the core file was not created. For cases where a CORE_DUMP_FAILED entry is not written into the error report, the running process, the process executable file, or the process source code must be investigated to determine why a core file was not generated.
[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Support information","Platform":[{"code":"PF002","label":"AIX"}],"Version":"5.3;6.1;7.1","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]
Was this topic helpful?
Document Information
Modified date:
06 December 2019
UID
isg3T1011240