GitHubContribute in GitHub: Edit online

Configure Runtime and Limitations

Learn how Process Manager allocates available resources such as memory and CPU during execution, and how to control it to optimize runtime, throughput, and performance.

Execution Restrictions in IBM Manta Data Lineage

Each scenario belongs to a specific Manta Data Lineage phase: extraction, analysis, or export, and the phases must be executed in this order. Process Manager plans the execution of these scenarios in the correct order.

Each scenario also belongs to a technology, and each technology belongs to one particular technology group: databases, data integrations, analytical tools, reporting tools, programming languages, data modeling, or Open Manta. The execution of scenarios belonging to any one group can be performed in any order, but all of them must be executed before the first scenario of the next group is started.

This order only applies to analysis and export scenarios; extraction is not limited by this restriction. The execution order of technology groups is as follows.

All scenarios that belong to one phase and one technology (there are usually one to three scenarios) always have a specific order, and they can’t run simultaneously. Every scenario must be executed for each connection, so these restrictions only apply to scenarios executed for one connection. For example, the dictionary mapping scenario cannot be executed for connection A after the extractor scenario is executed for connection B. Here are the scenario restrictions.

No alt text provided

Parallelism in Process Manager

Process Manager provides the benefits of advanced parallelism.

Process Manager executes each scenario as a child process and gives it one Manta Data Lineage connection to perform the operation. This gives the Process Manager better control over scenarios and allows it to execute scenarios from different technologies in parallel. So, it is not necessary to wait for a technology with one connection to finish its one scenario because scenarios from another technology can run parallel to it. The number of processes that can run in parallel is configurable for each workflow. If not otherwise specified, the default value is 4 parallel processes.

Keep in mind that a workflow’s maximum number of parallel processes is a hard limit. Even if there are enough free resources (resources are explained in the next section) for Process Manager to run another scenario, it won’t do it because of this limit. If you don’t want to use this hard limit, increase the value to a high number such as 100.
If you ever need to completely turn off parallelism, set the workflow’s maximum number of parallel processes to 1.

Parallelism in Process Manager is different for each phase.

Customization of Parallelism in Process Manager

Process Manager uses advanced parallelism that is built on a resource system. This advanced parallelism monitors the available resources and tries to execute as many as possible to utilize the full potential of all available resources.

Keep in mind that these resources are only used for scenarios ran from Process Manager not for all of Manta Data Lineage. You need to leave some free resources for the operating system, Manta Admin UI, and Manta Flow Server. Do not set the same resource values as your hardware has.

This resource system is built on multiple resources, but for now, only two are available to users. These are the available memory and available CPU cores. See the section “Configuring Available Resources” for more details.

Configuring Available Resources

The Process Manager resource system allows the configuration of the following resources: memory and available CPU cores.

The available resources are configurable in the Admin UI under the Configuration tab → CLICommonCommon Config.

CPU Cores

Each scenario can be single-threaded or multi-threaded. Process Manager knows how many threads each scenario may use and allows it to execute the number of scenarios that fulfills the configured Available CPU Cores 1. Process Manager may exceed this value by up to 200%, so it may execute scenarios with more threads than are configured.

Some of the scenarios have these values fixed, but for some of them, it is possible to configure this value in the connection’s configuration.

The Available CPU Cores configuration is available under the Configuration tab → CLICommonCommon Config.

Memory

Process Manager Planner Limits

In Common Config, it is possible to define the available memory that Process Manager can use for scheduling scenarios. This is a hard limit and Process Manager never exceeds this value. The default value is 4096 MB. This value controls how many parallel scenarios Process Manager will try to run in parallel. For example, if your Available Memory is 10,240MB and there are two scenarios with a 5120MB requirement, Process Manager can run both at the same time. If your Available Memory were 7168MB, then only one scenario might be executed at once. This setting only sets the upper limit. There are other conditions that control the workflow execution. For example, Analysis cannot run in parallel with Extraction as one depends on the results of the other.

No alt text provided

Scenario Execution Runtime Limits

Process Manager knows the memory expectations for each scenario, but it is possible that these values might not be high enough, especially for large systems.

This can easily be changed by setting an environment variable for the particular scenario. The variable name is SCENARIO_LOAD_MEMORY; its value is in megabytes. Keep in mind that if you set this variable for an inner node (for example, Extraction or MSSQL), it is propagated to its all children (all extraction scenarios or MS SQL scenarios respectively), which may not be desirable, and it may slow down the workflow execution.

No alt text provided

As of R40.1 it is possible set this variable permanently for all workflows, adding the variable:

manta.admin-ui.default-scenario-load-memory=<ValueInMB>

in <mantaflowdirectory>/serviceutility/manta-admin-ui-dir/conf/application.properties. An example for 51200 megabytes:

No alt text provided

It is important to know that SCENARIO_LOAD_MEMORY is converted to change of Xmx setting of JVM for CLI processes! Increasing the Xmx directly through JAVA_OPTS is discouraged as it circumvents the Process Manager planner and can lead to issues when running the scenarios.
SCENARIO_LOAD_MEMORY can only be used when scenarios are ran from Process Manager. It doesn’t work if legacy CLI shell or batch scripts in mantaflow/cli/scenarios/manta-dataflow-cli/bin are used. These should not be used anymore as they have been removed in R39. The newly supported shell batch scripts are available in mantaflow/serviceutility/webapps/manta-admin-gui/WEB-INF/bin.

The value set thorugh SCENARIO_LOAD_MEMORY has no effect on Process Manager planner limits. The planner only considers the default values built in for each scenario.

Large input memory protection

As of R42.3 implemented only for PostgreSQL.

This feature serves as internal protection against large inputs that can cause OutOfMemoryError, which would fail the processing of the whole scenario, by proactively terminating the processing of individual inputs instead of letting the whole scenario fail.

Turned on by default. Can be configured in Manta Admin UI → Configuration → Common → Common Config → Available Resources → Enable large input memory protection

Whenever the processing of input is terminated, it is logged. The log message indicates how much memory is required for that particular input. Using the maximum across all such messages in the log can help determine how much RAM must be assigned to the corresponding analysis scenario. Another option, if viable, is to split those large input scripts into smaller chunks.

2024-01-16 12:21:05.889 [pool-2-thread-3] 3 ERROR eu.profinit.manta.connector.common.memory.MemoryGuardServiceImpl [Context: \myDB\mySch\test02.sql]
MEMORY_GUARD_ERRORS LARGE_INPUT_ABORTED
User message: Aborting processing of input "\myDB\mySch\test02.sql" as it is expected to request 233941162 B of memory.
Technical message: Aborting processing of input "\myDB\mySch\test02.sql" as it is expected to request 233941162 B of memory. Step: BY_AST_SIZE, already granted 20983958 B, total available: 232783872 B, limitsTolerance: 100.
Solution: Please provide more RAM memory, or set manta.memoryGuardService.limitsTolerance to 101 at the cost of increased probability of OutOfMemoryException, or please contact Manta Support at portal.getmanta.com and submit a support bundle/log export.
Impact: SINGLE_INPUT