Topic
  • No replies
lzrycki
lzrycki
29 Posts

Pinned topic Cobol for AIX - Return-Code

‏2012-10-04T19:04:46Z |
Hi people,

running a program I have a SIGSEGV cancellation.

About cancellations and return-codes, COBOL for AIX Programming Guide syas:

Unrecoverable exception: When a program encounters an unrecoverable exception,
the user return code is set to 128 plus the signal number.

For instance, SIGABRT ( data exceptions, etc ) generates the return-code 134 = 128 + 6 ( SIGABRT number ).

I was surprised when my SIGSEGV gave me also 134, because SIGSEGV number is 11 and following the Programming
Guide the return code must be 128 + 11 = 139.

Someone could tell me why the cancellation gives SIGSEGV return code 134 and not 139.

Thanks

Regards

Leonardo

prudebug-19:44:50-PASO 2IWZ995C SIGSEGV signal received while executing code at location 0xf838.
prudebug-19:44:50-PASO 2 Message routine called from offset 0x14c of routine EVS29336.
prudebug-19:44:50-PASO 2IWZ901S Program exits due to severe or critical error.
prudebug-19:44:50-PASO 2-======================================
prudebug-19:44:50-PASO 2-DEBUG IKJRP_RC --->>> 134
prudebug-19:44:50-PASO 2-======================================
Updated on 2013-01-23T21:35:34Z at 2013-01-23T21:35:34Z by outlaw
  • SystemAdmin
    SystemAdmin
    403 Posts

    Re: Cobol for AIX - Return-Code

    ‏2012-10-12T20:29:20Z  
    This sounds like a possible product defect, you should call IBM service and report this problem.

    COBOL is the Language of the Future!
    Tom
  • lzrycki
    lzrycki
    29 Posts

    Re: Cobol for AIX - Return-Code

    ‏2012-10-15T20:15:25Z  
    This sounds like a possible product defect, you should call IBM service and report this problem.

    COBOL is the Language of the Future!
    Tom
    Thanks Tom.

    We have another problem with Return Codes. First of all, all my experience was in mainframe Z/OS-Assembler-COBOL-CICS-DB2-Mq, etc.

    In this moment I'm working upgrading an AIX-COBOL 2-Oracle 10g - TX Series 6 to COBOL 4.1.1.6-Oracle 11g - TX Series 7, testing our application system in a Lab environment ( COBOL 4.1.1.6 ) comparing it with Production ( COBOL 2 ).

    The same program with the same data cancelled with overpunch in COBOL 2 and data exception in COBOL 4. Overpunch is a synonym of data exception, the classical mainframe 0C7; the issue: in our Production environment the Return Code is 134 but in LAB ( COBOL 4 ) the RC is 1, and this is a very big problem because RC 1 is a common RC managed by all our programs.

    RC 134 is the correct RC.

    Could someone tell me why we received 4 RC 1 in COBOL 4.1.1.6 ?

    Regards

    Leonardo

    Production
    IWZ039S An invalid overpunched sign was detected.
    Message routine called from offset 0x34 of routine

    iwzWriteERRmsg called from offset 0xa0 of routine
    _iwzcBCD_CONV_Pckd_To_ZndUS.
    _iwzcBCD_CONV_Pckd_To_ZndUS called from offset 0x298fc of

    CCR00270.
    CCR00270 called from offset 0x2d9e8 of routine CCB00010.
    IWZ901S Program exits due to severe or critical error.
    IKJEFT01 RUNPROGRAM CCB00010 RC:134

    LAB
    <Thread 1> Traceback:
    <Thread 1> Offset 0x00000564 in procedure writeERRmsg
    <Thread 1> Offset 0x00000044 in procedure iwzWriteERRmsg
    <Thread 1> Offset 0x000000c8 in procedure _iwzcBCD_CONV_Pckd_To_ZndUS
    <Thread 1> Offset 0x00033d88 in procedure CCR00270
    <Thread 1> Offset 0x000343fc in procedure CCB00010
    <Thread 1> --- End of call chain ---
    IWZ903S The system detected a data exception.
    IWZ901S Program exits due to severe or critical error.
    IKJEFT01 RUNPROGRAM CCB00010 RC:1
  • lzrycki
    lzrycki
    29 Posts

    Re: Cobol for AIX - Return-Code

    ‏2012-10-23T15:34:41Z  
    • lzrycki
    • ‏2012-10-15T20:15:25Z
    Thanks Tom.

    We have another problem with Return Codes. First of all, all my experience was in mainframe Z/OS-Assembler-COBOL-CICS-DB2-Mq, etc.

    In this moment I'm working upgrading an AIX-COBOL 2-Oracle 10g - TX Series 6 to COBOL 4.1.1.6-Oracle 11g - TX Series 7, testing our application system in a Lab environment ( COBOL 4.1.1.6 ) comparing it with Production ( COBOL 2 ).

    The same program with the same data cancelled with overpunch in COBOL 2 and data exception in COBOL 4. Overpunch is a synonym of data exception, the classical mainframe 0C7; the issue: in our Production environment the Return Code is 134 but in LAB ( COBOL 4 ) the RC is 1, and this is a very big problem because RC 1 is a common RC managed by all our programs.

    RC 134 is the correct RC.

    Could someone tell me why we received 4 RC 1 in COBOL 4.1.1.6 ?

    Regards

    Leonardo

    Production
    IWZ039S An invalid overpunched sign was detected.
    Message routine called from offset 0x34 of routine

    iwzWriteERRmsg called from offset 0xa0 of routine
    _iwzcBCD_CONV_Pckd_To_ZndUS.
    _iwzcBCD_CONV_Pckd_To_ZndUS called from offset 0x298fc of

    CCR00270.
    CCR00270 called from offset 0x2d9e8 of routine CCB00010.
    IWZ901S Program exits due to severe or critical error.
    IKJEFT01 RUNPROGRAM CCB00010 RC:134

    LAB
    <Thread 1> Traceback:
    <Thread 1> Offset 0x00000564 in procedure writeERRmsg
    <Thread 1> Offset 0x00000044 in procedure iwzWriteERRmsg
    <Thread 1> Offset 0x000000c8 in procedure _iwzcBCD_CONV_Pckd_To_ZndUS
    <Thread 1> Offset 0x00033d88 in procedure CCR00270
    <Thread 1> Offset 0x000343fc in procedure CCB00010
    <Thread 1> --- End of call chain ---
    IWZ903S The system detected a data exception.
    IWZ901S Program exits due to severe or critical error.
    IKJEFT01 RUNPROGRAM CCB00010 RC:1
    Hi people, sorry for being so persistent with this topic but I’m very confused about cancellations in our environment AIX + COBOL 4.1.1.6 + Oracle 11g.

    Making deeper our analysis we found that we have a general problem that can be showed in a very simple COBOL program.

    IDENTIFICATION DIVISION.
    PROGRAM-ID . EVS29336.
    ENVIRONMENT DIVISION.
    DATA DIVISION.
    WORKING-STORAGE SECTION.
    01 USERNAME PIC X VALUE '/'.
    01 CUENTA PIC 9(9).
    * *
    * Trick to generate a Data Exception *
    * *
    01 HORA PIC X(3) VALUE X'001000'.
    01 HORAC REDEFINES HORA PIC S9(5) COMP-3.
    ****************************************************
    EXEC SQL
    INCLUDE SQLCA
    END-EXEC.

    PROCEDURE DIVISION.

    COMPUTE CUENTA = HORAC * 10

    EXEC SQL
    CONNECT :USERNAME
    END-EXEC

    STOP RUN.

    Running this version of the program, a cancellation occurs in the COMPUTE statement; the SQL Connection is after COMPUTE and was not executed. In this case the Return Code is 134 consistent with the documentation:
    cancelation RC = 128 + 6 (SIGABRT number) = 134
    If we change the order of the statements

    EXEC SQL
    CONNECT :USERNAME
    END-EXEC

    COMPUTE CUENTA = HORAC * 10

    the abend also is done in the COMPUTE statement, but, the Return Code is 1; in our real programs when a SQL statement was executed ,no matter the SQL sentence, if we have an abend at any point of a program always the RC is 1.

    How is possible that SQL statements generated by ProCobol, the Oracle precompiler, can influence the RC when a future abend is produced ?

    For us is a big problem, because a lot of programs deals internally with RC =1 ; for our scheduler TWS RC=1 is a warning.

    Any idea about this issue ?

    Thanks

    Regards

    Leonardo
  • outlaw
    outlaw
    39 Posts

    Re: Cobol for AIX - Return-Code

    ‏2012-10-23T21:10:27Z  
    You get rc=134 because we captured the SEGV, issued some (hopefully) useful data, then we abort the program - so you wind up with SIGABRT as the final signal.

    There are some cases where we might be able to provide more specific values, assuming no cascading of signals, but doing so across the board (thread, non-thread, etc.) quickly becomes a quagmire :(

    The salient information is that an rc > 127 means something when 'orribly wrong.

    If you require a higher level of detail than is currently provided, you'll need to go through channels to get a feature request started.

    Richard A Nelson (Rick)
    COBOL Development IBM Silicon Valley Laboratory
    http://www.ibm.com/software/awdtools/cobol/
  • outlaw
    outlaw
    39 Posts

    Re: Cobol for AIX - Return-Code

    ‏2012-10-23T21:31:56Z  
    • lzrycki
    • ‏2012-10-23T15:34:41Z
    Hi people, sorry for being so persistent with this topic but I’m very confused about cancellations in our environment AIX + COBOL 4.1.1.6 + Oracle 11g.

    Making deeper our analysis we found that we have a general problem that can be showed in a very simple COBOL program.

    IDENTIFICATION DIVISION.
    PROGRAM-ID . EVS29336.
    ENVIRONMENT DIVISION.
    DATA DIVISION.
    WORKING-STORAGE SECTION.
    01 USERNAME PIC X VALUE '/'.
    01 CUENTA PIC 9(9).
    * *
    * Trick to generate a Data Exception *
    * *
    01 HORA PIC X(3) VALUE X'001000'.
    01 HORAC REDEFINES HORA PIC S9(5) COMP-3.
    ****************************************************
    EXEC SQL
    INCLUDE SQLCA
    END-EXEC.

    PROCEDURE DIVISION.

    COMPUTE CUENTA = HORAC * 10

    EXEC SQL
    CONNECT :USERNAME
    END-EXEC

    STOP RUN.

    Running this version of the program, a cancellation occurs in the COMPUTE statement; the SQL Connection is after COMPUTE and was not executed. In this case the Return Code is 134 consistent with the documentation:
    cancelation RC = 128 + 6 (SIGABRT number) = 134
    If we change the order of the statements

    EXEC SQL
    CONNECT :USERNAME
    END-EXEC

    COMPUTE CUENTA = HORAC * 10

    the abend also is done in the COMPUTE statement, but, the Return Code is 1; in our real programs when a SQL statement was executed ,no matter the SQL sentence, if we have an abend at any point of a program always the RC is 1.

    How is possible that SQL statements generated by ProCobol, the Oracle precompiler, can influence the RC when a future abend is produced ?

    For us is a big problem, because a lot of programs deals internally with RC =1 ; for our scheduler TWS RC=1 is a warning.

    Any idea about this issue ?

    Thanks

    Regards

    Leonardo
    Ah, now we're getting somewhere! I was all set to blame IKJEFT01, but it appears to be an Oracle issue, which you can likely find in your Oracle product documentation.

    Oracle, not surprisingly, appears to be using atexit() handlers to facilitate commit/rollback handling, and setting the final step rc (likely 0==ok/commit, 1==bad/rollback).

    The atexit() hook will be driven after COBOL issues the abort() (or workalike), and has no control over other product's use of atexit().

    The fact that you get rc=134 if you abort before the SQL call, and rc=1 afterwards supports this hypothesis, in that Oracle doesn't get control, and therefore can't set the atexit() hook until the first call.

    These days, there are vastly superior methods of handling most things done in an atexit() hook - mostly thanks to C++ requirements for constructor/destructor, and these techniques alleviate most of the atexit() issues that have long been known.

    However, use of atexit() for commit/rollback still seems to be a common technique for a variety of database products... These products want you to know if the transaction was commited, or rolledback - hence the overwriting of the rc value.

    The long and short of it all is that you need to peruse the Oracle documentation to see what its return code conventions are - and likely avoid that set of values from your own programs.

    Richard A Nelson (Rick)
    COBOL Development IBM Silicon Valley Laboratory
    http://www.ibm.com/software/awdtools/cobol/
  • lzrycki
    lzrycki
    29 Posts

    Re: Cobol for AIX - Return-Code

    ‏2012-10-24T19:51:01Z  
    • outlaw
    • ‏2012-10-23T21:31:56Z
    Ah, now we're getting somewhere! I was all set to blame IKJEFT01, but it appears to be an Oracle issue, which you can likely find in your Oracle product documentation.

    Oracle, not surprisingly, appears to be using atexit() handlers to facilitate commit/rollback handling, and setting the final step rc (likely 0==ok/commit, 1==bad/rollback).

    The atexit() hook will be driven after COBOL issues the abort() (or workalike), and has no control over other product's use of atexit().

    The fact that you get rc=134 if you abort before the SQL call, and rc=1 afterwards supports this hypothesis, in that Oracle doesn't get control, and therefore can't set the atexit() hook until the first call.

    These days, there are vastly superior methods of handling most things done in an atexit() hook - mostly thanks to C++ requirements for constructor/destructor, and these techniques alleviate most of the atexit() issues that have long been known.

    However, use of atexit() for commit/rollback still seems to be a common technique for a variety of database products... These products want you to know if the transaction was commited, or rolledback - hence the overwriting of the rc value.

    The long and short of it all is that you need to peruse the Oracle documentation to see what its return code conventions are - and likely avoid that set of values from your own programs.

    Richard A Nelson (Rick)
    COBOL Development IBM Silicon Valley Laboratory
    http://www.ibm.com/software/awdtools/cobol/
    Hi Rick,

    thanks for your explantion; IKJEFT01 come from mainframe because our application system is a downsizing from mainframe Z/OS to ISeries AIX. All the JCLs was translate to AIX scripts 7 years ago.

    Oracle ProCobol documentation said:

    "The contents of the RETURN-CODE special register (for those systems that support it) are unpredictable after any SQL statement or SQLLIB function."

    because of this our programs retain the return-code before the execution of a SQL statmente and restore it after this execution.

    This is the only thing that I found in Oracle documentation about RC; in no one place is written that Oracles routines will intercept a cancellation and put the final RC.

    We had some support discussions with IBM; yesterday was solved and we put a PMR with all our issues.

    Dan K. and/or David G. will work with us; I'll send all your comments.

    Thanks a lot for your time.

    Regards

    Leonardo
  • outlaw
    outlaw
    39 Posts

    Re: Cobol for AIX - Return-Code

    ‏2012-10-24T21:30:31Z  
    • lzrycki
    • ‏2012-10-24T19:51:01Z
    Hi Rick,

    thanks for your explantion; IKJEFT01 come from mainframe because our application system is a downsizing from mainframe Z/OS to ISeries AIX. All the JCLs was translate to AIX scripts 7 years ago.

    Oracle ProCobol documentation said:

    "The contents of the RETURN-CODE special register (for those systems that support it) are unpredictable after any SQL statement or SQLLIB function."

    because of this our programs retain the return-code before the execution of a SQL statmente and restore it after this execution.

    This is the only thing that I found in Oracle documentation about RC; in no one place is written that Oracles routines will intercept a cancellation and put the final RC.

    We had some support discussions with IBM; yesterday was solved and we put a PMR with all our issues.

    Dan K. and/or David G. will work with us; I'll send all your comments.

    Thanks a lot for your time.

    Regards

    Leonardo
    Oracle libraries have no access to the RETURN-CODE special register, unless their pre-processor adds references to it in their translated COBOL.

    Your saving and restoring of RETURN-CODE will not help with the 134 vs 1 issue :(

    Oracle does document that a program failure will cause an automatic roll-back - and the only way to do that is with one or more of library init/term routines, atexit() hooks, and (percolating) signal handlers.

    It is very likely that this auto-rollback process is changing the 134 to 1.

    If that is, indeed, what is happening, and I'll wager heavily that it is, there is nothing IBM can do, and it is something that very likely is documented somewhere in the Oracle documentation, but finding it may not be trivial.

    Interestingly enough, I wonder if IBM's DB2 has this same problem?

    Richard A Nelson (Rick)
    COBOL Development IBM Silicon Valley Laboratory
    http://www.ibm.com/software/awdtools/cobol/
  • outlaw
    outlaw
    39 Posts

    Re: Cobol for AIX - Return-Code

    ‏2013-01-23T21:35:34Z  
    • outlaw
    • ‏2012-10-23T21:10:27Z
    You get rc=134 because we captured the SEGV, issued some (hopefully) useful data, then we abort the program - so you wind up with SIGABRT as the final signal.

    There are some cases where we might be able to provide more specific values, assuming no cascading of signals, but doing so across the board (thread, non-thread, etc.) quickly becomes a quagmire :(

    The salient information is that an rc > 127 means something when 'orribly wrong.

    If you require a higher level of detail than is currently provided, you'll need to go through channels to get a feature request started.

    Richard A Nelson (Rick)
    COBOL Development IBM Silicon Valley Laboratory
    http://www.ibm.com/software/awdtools/cobol/
    What, if anything, have you found from the Oracle folks on this issue ?

    Richard A Nelson (Rick)
    COBOL Development IBM Silicon Valley Laboratory
    http://www.ibm.com/software/awdtools/cobol/