Configure Runtime and Limitations
Learn how Process Manager allocates available resources such as memory and CPU during execution, and how to control it to optimize runtime, throughput, and performance.
Execution Restrictions in IBM Manta Data Lineage
Each scenario belongs to a specific Manta Data Lineage phase: extraction, analysis, or export, and the phases must be executed in this order. Process Manager plans the execution of these scenarios in the correct order.
Each scenario also belongs to a technology, and each technology belongs to one particular technology group: databases, data integrations, analytical tools, reporting tools, programming languages, data modeling, or Open Manta. The execution of scenarios belonging to any one group can be performed in any order, but all of them must be executed before the first scenario of the next group is started.
This order only applies to analysis and export scenarios; extraction is not limited by this restriction. The execution order of technology groups is as follows.
-
Export
-
Databases
-
Data integrations
-
Analytical tools
-
Reporting tools
-
Modeling tools
-
All scenarios that belong to one phase and one technology (there are usually one to three scenarios) always have a specific order, and they can’t run simultaneously. Every scenario must be executed for each connection, so these restrictions only apply to scenarios executed for one connection. For example, the dictionary mapping scenario cannot be executed for connection A after the extractor scenario is executed for connection B. Here are the scenario restrictions.
Parallelism in Process Manager
Process Manager provides the benefits of advanced parallelism.
Process Manager executes each scenario as a child process and gives it one Manta Data Lineage connection to perform the operation. This gives the Process Manager better control over scenarios and allows it to execute scenarios from different technologies in parallel. So, it is not necessary to wait for a technology with one connection to finish its one scenario because scenarios from another technology can run parallel to it. The number of processes that can run in parallel is configurable for each workflow. If not otherwise specified, the default value is 4 parallel processes.
Parallelism in Process Manager is different for each phase.
-
The extraction phase is fully parallelized. Only the restrictions mentioned in the previous section must be reflected. Only scenarios from the same connection wait for each other.
-
During the analysis phase, all technologies and connections are executed simultaneously.
-
The export phase is parallelized in the same way as the extraction phase, but it reflects the same technology group restrictions as the analysis phase. You may expect that multiple technologies will be exported simultaneously, but a technology waits until the other technologies are exported.
Customization of Parallelism in Process Manager
Process Manager uses advanced parallelism that is built on a resource system. This advanced parallelism monitors the available resources and tries to execute as many as possible to utilize the full potential of all available resources.
This resource system is built on multiple resources, but for now, only two are available to users. These are the available memory and available CPU cores. See the section “Configuring Available Resources” for more details.
Configuring Available Resources
The Process Manager resource system allows the configuration of the following resources: memory and available CPU cores.
The available resources are configurable in the Admin UI under the Configuration tab → CLI → Common → Common Config.
CPU Cores
Each scenario can be single-threaded or multi-threaded. Process Manager knows how many threads each scenario may use and allows it to execute the number of scenarios that fulfills the configured Available CPU Cores 1. Process Manager may exceed this value by up to 200%, so it may execute scenarios with more threads than are configured.
Some of the scenarios have these values fixed, but for some of them, it is possible to configure this value in the connection’s configuration.
The Available CPU Cores configuration is available under the Configuration tab → CLI → Common → Common Config.
Memory
Process Manager Planner Limits
In Common Config, it is possible to define the available memory that Process Manager can use for scheduling scenarios. This is a hard limit and Process Manager never exceeds this value. The default value is 4096 MB. This value controls how many parallel scenarios Process Manager will try to run in parallel. For example, if your Available Memory is 10,240MB and there are two scenarios with a 5120MB requirement, Process Manager can run both at the same time. If your Available Memory were 7168MB, then only one scenario might be executed at once. This setting only sets the upper limit. There are other conditions that control the workflow execution. For example, Analysis cannot run in parallel with Extraction as one depends on the results of the other.
Scenario Execution Runtime Limits
Process Manager knows the memory expectations for each scenario, but it is possible that these values might not be high enough, especially for large systems.
This can easily be changed by setting an environment variable for the particular scenario. The variable name is SCENARIO_LOAD_MEMORY
; its value is in megabytes. Keep in mind that if you set this variable for an inner node (for example,
Extraction or MSSQL), it is propagated to its all children (all extraction scenarios or MS SQL scenarios respectively), which may not be desirable, and it may slow down the workflow execution.
As of R40.1 it is possible set this variable permanently for all workflows, adding the variable:
manta.admin-ui.default-scenario-load-memory=<ValueInMB>
in
<mantaflowdirectory>/serviceutility/manta-admin-ui-dir/conf/application.properties
. An example for 51200 megabytes:
SCENARIO_LOAD_MEMORY
is converted to change of Xmx setting of JVM for CLI processes! Increasing the Xmx directly through JAVA_OPTS
is discouraged as it circumvents
the Process Manager planner and can lead to issues when running the scenarios.
SCENARIO_LOAD_MEMORY
can only be used when scenarios are ran from Process Manager. It doesn’t work if legacy CLI shell or batch scripts in
mantaflow/cli/scenarios/manta-dataflow-cli/bin
are used. These should not be used anymore as they have been removed in R39. The newly supported shell batch scripts are available in
mantaflow/serviceutility/webapps/manta-admin-gui/WEB-INF/bin
.
The value set thorugh SCENARIO_LOAD_MEMORY
has no effect on Process
Manager planner limits. The planner only considers the default values built in for each scenario.
Large input memory protection
As of R42.3 implemented only for PostgreSQL.
This feature serves as internal protection against large inputs that can cause OutOfMemoryError
, which would fail the processing of the whole scenario, by proactively terminating the processing of individual inputs instead of letting
the whole scenario fail.
Turned on by default. Can be configured in Manta Admin UI → Configuration → Common → Common Config → Available Resources → Enable large input memory protection
Whenever the processing of input is terminated, it is logged. The log message indicates how much memory is required for that particular input. Using the maximum across all such messages in the log can help determine how much RAM must be assigned to the corresponding analysis scenario. Another option, if viable, is to split those large input scripts into smaller chunks.
2024-01-16 12:21:05.889 [pool-2-thread-3] 3 ERROR eu.profinit.manta.connector.common.memory.MemoryGuardServiceImpl [Context: \myDB\mySch\test02.sql]
MEMORY_GUARD_ERRORS LARGE_INPUT_ABORTED
User message: Aborting processing of input "\myDB\mySch\test02.sql" as it is expected to request 233941162 B of memory.
Technical message: Aborting processing of input "\myDB\mySch\test02.sql" as it is expected to request 233941162 B of memory. Step: BY_AST_SIZE, already granted 20983958 B, total available: 232783872 B, limitsTolerance: 100.
Solution: Please provide more RAM memory, or set manta.memoryGuardService.limitsTolerance to 101 at the cost of increased probability of OutOfMemoryException, or please contact Manta Support at portal.getmanta.com and submit a support bundle/log export.
Impact: SINGLE_INPUT