IBM Support

IT31102: THE "RECLAIM CONTAINERS" PROCESS FOR CLOUD-CONTAINER STORAGE POOL MAY LEAD TO SERVER CRASH

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • The "RECLAIM CONTAINERS" process for cloud-container storage
    pool may cause the server to crash.
    The call stack will show a segmentation fault when calling a
    java method for the initiation of a multipart upload to the
    cloud.
    
    
    Example from getcoreinfo.txt on Linux:
    
    Program terminated with signal 11, Segmentation fault.
    
    #0  0x00007f7aed4b1caa in runCallInMethod ()
    #1  0x00007f7aed4cb4c9 in gpProtectedRunCallInMethod ()
    #2  0x00007f7aecda128b in omrsig_protect ()
    #3  0x00007f7aed50a08c in gpProtectAndRun ()
    #4  0x00007f7aed4cc57c in gpCheckCallin ()
    #5  0x00007f7aed4c888c in callVirtualObjectMethod ()
    #6  0x0000000000e299ae in SdCloudWriteInitPartUpload
    (cntrP=0x7f77b40168c0,
        resultP=resultP@entry=0x7f798e090de0,
        bytesWritten=bytesWritten@entry=0x7f798e090b98,
    msgList=0x7f77bc0c98e0,
        cenv=0x7f77bc0c5150) at sdcloud.c:3123
    #7  0x0000000000e7684c in SdMoveContainer (ctlP=0x7f7800276350,
        srcCntrId=srcCntrId@entry=102577,
    ---Type <return> to continue, or q <return> to quit---
        cntrType=cntrType@entry=SdContainerTypeCloud,
    newDirP=<optimized out>,
        newDirP@entry=0x0, procCtlDescP=procCtlDescP@entry=0x0) at
    sddefrag.c:2738
    #8  0x0000000000e799ec in sdCloudReclamationMoveContainer
    (cntrId=102577,
        poolId=32, procId=81) at sddefrag.c:1486
    #9  0x0000000000e029ec in ReclWorkerThread
    (childCtlP=0x7f780403ed90)
        at screclaim.c:929
    #10 0x0000000001335350 in StartThread (startInfoP=<optimized
    out>)
        at pkthread.c:4039
    #11 0x00007f7b96854aa1 in __pthread_initialize_minimal_internal
    ()
       from /lib64/libpthread.so.0
    #12 0x0000000000000000 in ?? ()
    
    
    
    
    
    Customer/L2 Diagnostics:
    The issue here is that when a request to commit a multipart
    upload goes to the cloud and fails (during reclamation, or
    tiering),
    there is an issue in the error handling path that could lead to
    a crash.
    
    The FFDC log may show the following exception at problem
    runtime:
    .....
    Exception: com.amazonaws.services.s3.model.AmazonS3Exception:
    One or more of the specified parts could not be found. The part
    might not have been uploaded, or the specified entity tag might
    not have matched the part's entity tag. (Service: Amazon S3;
    Status Code: 400; Error Code: InvalidPart; Request ID:
    656947673), S3 Extended Request ID: null One or more of the
    specified parts could not be found. The part might not have been
    uploaded, or the specified entity tag might not have matched the
    part's entity tag. (Service: Amazon S3; Status Code: 400; Error
    Code: InvalidPart; Request ID: 656947673)
     com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleError
    Response(AmazonHttpClient.java:1588)
     com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneR
    equest(AmazonHttpClient.java:1258)
     com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelp
    er(AmazonHttpClient.java:1030)
     com.amazonaws.http.AmazonHttpClient$Req'.
    16:58:35.850 [382][jvm.c][1615][JavaSideTrace]: Argument
    [ID1:16:58:35.847], part 2
    text(len=889)='uestExecutor.doExecute(AmazonHttpClient.java:742)
     com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWith
    Timer(AmazonHttpClient.java:716)
     com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(Ama
    zonHttpClient.java:699)
     com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(
    AmazonHttpClient.java:667)
     com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl
    .execute(AmazonHttpClient.java:649)
     com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.ja
    va:513)
     com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.
    java:4221)
     com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.
    java:4168)
     com.amazonaws.services.s3.AmazonS3Client.completeMultipartUploa
    d(AmazonS3Client.java:2970)
     com.tivoli.dsm.cloud.api.S3Client.completeMultipartUpload(S3Cli
    ent.java:911)
     com.tivoli.dsm.cloud.api.ProviderS3.completeWriteObjectPart(Pro
    viderS3'.
    
    
    IBM Spectrum Protect versions affected:
    Server 8.1.7 and higher on all supported platforms
    
    
    
    
    Initial Impact:
    High
    
    
    Additional Keywords:
    TSM "Spectrum Protect" TS002409913 reclaim container cloud crash
    core abend
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * All IBM Spectrum Protect server users.                       *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See error description.                                       *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Apply fixing level when available. This problem is currently *
    * projected to be fixed in levels 8.1.8.200 and 8.1.9. Note    *
    * that this is subject to change at the discretion of IBM.     *
    ****************************************************************
    

Problem conclusion

  • This problem was fixed.
    Affected platforms for reported release:  AIX, Linux, and
    Windows.
    Platforms fixed:  AIX, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT31102

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    81L

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-11-28

  • Closed date

    2019-12-05

  • Last modified date

    2019-12-05

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

[{"Business Unit":{"code":"BU029","label":"Software"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"81L"}]

Document Information

Modified date:
14 September 2023