IBM Support

JR49560: Unstructured Data stage throws OutOfMemoryError when reading a large Exel (xlsx) file

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as fixed if next.

Error description

  • When reading the small size of Excel data, the
    customer job works fine. But when reading the large
    size of Excel data (over 50MB), then Unstructured cause
    OutofMemory due to the needs of the extra space.
    

Local fix

  • NO
    

Problem summary

  • Unstructured Data stage internally uses an API of Apache POI
    library that loads all of uncompressed contents of an
    Excel(xlsx) file on memory. As a result, it requires more than
    10 times of Java heap memory as the file size.
    

Problem conclusion

Temporary fix

  • Specify Java heap memory size about 10 to 20 times as the file
    size using CC_UNST_JAVA_HEAP environment variable.
    

Comments

APAR Information

  • APAR number

    JR49560

  • Reported component name

    WIS DATASTAGE

  • Reported component ID

    5724Q36DS

  • Reported release

    810

  • Status

    CLOSED FIN

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2014-03-07

  • Closed date

    2014-03-27

  • Last modified date

    2014-04-14

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

Applicable component levels

  • R910 PSY

       UP

  • R912 PSY

       UP

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"InfoSphere DataStage"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.1","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
12 October 2021