IBM Support

Effects of multithread execution in CPLEX Optimization Studio models

Question & Answer


Question

You want to know how to estimate the OPL - CPLEX Optimization Studio performance in multithread executions.

Answer

If the hardware allows it, CPLEX Optimization Studio will use multithreading to solve the optimization model. Parallel algorithms will distribute the computations over several threads, decreasing the time required to find a solution but increasing the memory used.

You can control the multithread mode by specifying
- the number of threads to be used
- the parallel mode (determinist or opportunistic) used. By default, the deterministic mode ensures that multiple executions using the same input data produce identical results. The opportunistic mode trades the determinism for faster execution times.

The sections below describe the memory requirements, execution time and respectively control parameters for the multithread mode.

1. Memory requirements

In a multithread execution, each thread requires its copy of the model data. Increasing the number of threads will result in approximately linear memory increase:

total_memory_requirement = number_of_threads * thread_memory_requirement + base_requirement

If a time limit is used to stop the optimization, then using more threads will allow the engine to explore a larger search space. As a result, the thread memory requirement will also increase with the number of threads causing a superlinear total memory increase.

If the memory requirement approaches the total available memory then it is not beneficial to further increase the number of threads. Memory management penalties will cancel multithreading advantages.

2. Execution times

Increasing the number of workers will reduce the execution time. However, the existence of sequential algorithm steps and the parallelism efficiency factor limit the execution time improvement.

parallel_execution_time = [ p / ( parallelism_efficiency * number_of_threads) + (1-p) ] * sequencial_execution_time

where:
- p is the fraction of computation that can be parallelized. The remaining percentage (1-p) represents inherently sequential steps. Their execution time will not be reduced in a multithread execution.
- parallelism_efficiency describes how efficient is the parallel execution compared to the sequential one.

2.1. Parallelism efficiency

As the thread count increases, the parallel algorithm becomes less and less efficient compared to the sequential one. This is due to thread synchronization and thread contention over computational tasks.

The parallelism efficiency value describes this effect:
parallelism_efficiency = sequential_execution_time / ( multithread_execution_time * number_of_threads )

For CPLEX Optimization Studio, the parallelism efficiency decreases for more than 4-8 threads. Increasing the thread count past this number will not significantly reduce the execution time.

2.1. Percentage of non-sequential steps

Some of optimization steps (such as data initialization, presolve) require a sequential algorithms and do not benefit from optimization.

The time taken by each of these steps is model dependent and thus a general formula cannot be given.
If an approximation is required, you can measure the actual execution times by using time stamps.
Extrapolation can then be used to predict changes in execution time as a function of:
- problem size
- number of threads

The API for obtaining the time stamps:
OPL script built-in value: date.getTime()
CPLEX/Java API: IloCplex.getTime()
.NET API: Cplex.Time()
C API: CPXgettime()


A special step is the execution of the OPL script. The CPLEX Engine will always execute the OPL script sequentially regardless if it is parallelizable or not.

If you do not want a sequential execution then you must implement the optimization model in a language that allows multithreading. You can either:
- write the entire optimization model in C++, Java or .NET using the CPLEX libraries.
- write only a part of the model in multithread Java and call the Java code from the OPL script using:
IloOplImportJava() - to import the classes implementing the computation.
and
IloOplCallJava() - to call the desired Java method inside those classes.

3. Multithread control parameters

3.1 Thread Count

If the processors cores allow hyperthreading (multiple threads can be executed, simultaneously, on the same processor) then the total number of threads than can be executed simultaneously is:

number_of_logical_cores = number_of_physical_cores * hyperthreading_per_core

By default CPLEX Optimization Studio will use number of threads equal to the number of processor cores or 32 threads (whichever number is smaller).

If you want to change the number of threads used, two control parameters are available:

For the CPLEX Engine the control parameter is:
global default thread count (OPL script name: cplex.threads, C++/java/.NET name: Threads, C name: CPX_PARAM_THREADS)

For the CP Optimizer Engine, the control parameter is:
workers (OPL script name: cp.param.workers, C++/java/.NET name: Workers)

Increasing the number of threads past the number of logical cores does not have any benefits.
Memory shortage will also reduce the performance. If memory shortage occurs the number of threads must be lowered.

3.2 Determinism preservation

For the CPLEX Engine, the parallel mode switch parameter determines if a deterministic (multiple executions using the same input data produce identical results) or a opportunistic ( determinism is exchanged for faster execution times) is used.
The values are: deterministic, opportunistic, automatic (default).
The automatic mode reverts to deterministic unless the thread number is specifically set to a value greater than 1.

(OPL name: parallelmode, C++/Java/.NET name: ParallelMode, C name: CPX_PARAM_PARALLELMODE)

For MIP CPLEX models, if callbacks other than informational callbacks are used for solving a MIP, the order in which the callbacks are called cannot be guaranteed to remain deterministic, not even with the stronger thread synchronization. Thus, to make sure of deterministic runs when the parallel mode parameter is set to deterministic, CPLEX reverts to sequential solving of the MIP in the presence of query callbacks, diagnostic callbacks, or control callbacks.

For the CP Optimizer Engine, multithreading may allow a solution to be produced by more that one worker. This will result in duplicate solutions if the model is designed to collect all solutions produced.

[{"Product":{"code":"SSSA5P","label":"IBM ILOG CPLEX Optimization Studio"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Documentation","Platform":[{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"12.5.0.1;12.5;12.4;12.3;12.2;12.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Product":{"code":"SSSA5P","label":"IBM ILOG CPLEX Optimization Studio"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":" ","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21653811