Dictionary Dataflow Scenario Fails with java.langOutOfMemory: Java heap space

Problem

Some of the database DictionaryDataflow scenarios fails with PLATFORM_CLI_ERRORS FAILED_EXECUTING_SCENARIO java.lang.OutOfMemoryError: Java heap space after processing all the batches. The complete error message in the log is:

2023-08-04 17:34:24.243 [CLI] 0 INFO  eu.profinit.manta.dataflow.generator.modelutils.GraphScenario Processed 438 of 438 files in 1,486,121 ms.
2023-08-04 17:34:29.585 [CLI] 0 INFO  eu.profinit.manta.dataflow.repository.merger.client.AbstractMergerWriter Sending data to server http://localhost:8080/manta-dataflow-server/api/merge.
2023-08-04 17:34:32.858 [CLI] 0 INFO  eu.profinit.manta.dataflow.repository.merger.client.AbstractMergerWriter Response from server: {"processingReport":{"time":2531,"newObjects":{"node":0,"edge_attribute":0,"edge":0,"resource":0,"node_attribute":20059,"source_code":0,"layer":0},"errorObjects":{"node":0,"edge_attribute":0,"edge":0,"resource":0,"node_attribute":0,"source_code":0,"layer":0},"existedObjects":{"node":4100,"edge_attribute":0,"edge":0,"resource":1,"node_attribute":1,"source_code":0,"layer":1},"unknownTypesCount":0,"processedObjectsCount":24162,"requestedSourceCodes":0},"processingTime":2563}. 
2023-08-04 17:34:32.863 [CLI] 0 INFO  eu.profinit.manta.dataflow.repository.merger.client.StandardSourceCodeService Waiting for source code upload to finish.
2023-08-04 17:34:32.867 [CLI] 0 INFO  eu.profinit.manta.dataflow.repository.merger.client.StandardSourceCodeService Source code upload has been completed.

2023-08-04 17:34:32.891 [CLI] ISSUE eu.profinit.manta.platform.cli.CliImpl
Event: SCENARIO_FAILURE_EVENT
Message: Scenario 'snowflakeDictionaryDataflowScenario' failed to execute.
Type: ISSUE
Priority: HIGH


2023-08-04 17:34:32.915 [CLI] 0 ERROR eu.profinit.manta.platform.cli.CliImpl 
PLATFORM_CLI_ERRORS FAILED_EXECUTING_SCENARIO
User message: Failed executing scenario snowflakeDictionaryDataflowScenario
Technical message: Failed executing scenario snowflakeDictionaryDataflowScenario
Solution: Please contact MANTA Support at portal.getmanta.com and submit a support bundle/log export.
Impact: SCENARIO

java.lang.OutOfMemoryError: Java heap space
    at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:106) ~[manta-connector-jms-artemis-client-37.1.0.jar:?]
    at org.apache.commons.io.output.AbstractByteArrayOutputStream.writeImpl(AbstractByteArrayOutputStream.java:135) ~[manta-connector-jms-artemis-client-37.1.0.jar:?]
    at org.apache.commons.io.output.ByteArrayOutputStream.write(ByteArrayOutputStream.java:66) ~[manta-connector-jms-artemis-client-37.1.0.jar:?]
    at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233) ~[?:?]
    at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:303) ~[?:?]
    at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:281) ~[?:?]
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[?:?]
    at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:135) ~[?:?]
    at java.io.OutputStreamWriter.write(OutputStreamWriter.java:226) ~[?:?]
    at java.io.PrintWriter.write(PrintWriter.java:542) ~[?:?]
    at java.io.PrintWriter.write(PrintWriter.java:542) ~[?:?]
    at java.io.PrintWriter.write(PrintWriter.java:559) ~[?:?]
    at au.com.bytecode.opencsv.CSVWriter.writeNext(CSVWriter.java:243) ~[?:?]
    at eu.profinit.manta.dataflow.repository.merger.common.GraphCsvSerializationHelper.printNodeAtributes(GraphCsvSerializationHelper.java:181) ~[?:?]
    at eu.profinit.manta.dataflow.repository.merger.common.GraphCsvSerializationHelper.writeGraph(GraphCsvSerializationHelper.java:82) ~[?:?]
    at eu.profinit.manta.dataflow.repository.merger.client.MergerWriter.createData(MergerWriter.java:103) ~[?:?]
    at eu.profinit.manta.dataflow.repository.merger.client.MergerWriter.writeInnerGraph(MergerWriter.java:82) ~[?:?]
    at eu.profinit.manta.dataflow.repository.merger.client.MergerWriter.write(MergerWriter.java:52) ~[?:?]
    at eu.profinit.manta.dataflow.repository.merger.client.MergerWriter.write(MergerWriter.java:31) ~[?:?]
    at eu.profinit.manta.dataflow.generator.modelutils.GraphScenario.doExecute(GraphScenario.java:151) ~[?:?]
    at eu.profinit.manta.platform.automation.AbstractScenario.execute(AbstractScenario.java:97) ~[manta-platform-automation-37.1.0.jar:?]
    at eu.profinit.manta.platform.cli.CliImpl.execute(CliImpl.java:261) [manta-platform-cli-37.1.0.jar:?]
    at eu.profinit.manta.platform.cli.launcher.Main.lambda$main$1(Main.java:66) [manta-platform-cli-launcher-37.1.0.jar:37.1.0]
    at eu.profinit.manta.platform.cli.launcher.Main$$Lambda$2/0x0000000800067440.run(Unknown Source) [manta-platform-cli-launcher-37.1.0.jar:37.1.0]
    at java.lang.Thread.run(Thread.java:829) [?:?]

More Details

The OutOfMemory message may appear when running the dictionary dataflow analysis for a large database with more objects than fit into the memory. The memory requirements are about 1kB per object (schema, table, column, procedure, datatype, etc.). With the default 3GB RAM for scenario, approximately 3M objects can be processed. For larger databases, it is necessary to increase the memory allocated to the dictionary dataflow scenario.

Solution

  1. Identify how much memory will likely be needed - number of object * 1000 gives heap size in GB

  2. Increase memory for the failing Dictionary Dataflow using SCENARIO_LOAD_MEMORY as per Configure Runtime and Limitations|Scenario-Execution-Runtime-Limits

  3. Rerun the lineage analysis again (no need to necessarily rerun the Extraction step).