Import job design
An import is an external data file, or job that you design to load data from an external source into the PIM system. You use imports not only during the initial data migration process but also to keep the PIM system up-to-date on an ongoing basis. Generally, you design approximately four to five imports per implementation.
An import job does not necessarily acquire an exclusive lock
on the catalog or hierarchy on which it was defined. If the catalog
or hierarchy is exclusively locked, concurrent changes are not allowed
to be made to the contents of the catalog or hierarchy.
Whether the catalog or hierarchy is locked or not depends on the job type and semantics.
If a catalog or hierarchy is not exclusively
locked by an import job, then the job may acquire locks on the entries
in the catalog or hierarchy that it modifies. The duration for which
these locks are held on the entries is affected by:
- whether the release_locks_early parameter
is set to
true
orfalse
- and how many objects are contained within one transaction
false
, all of the modified objects within
a scheduled job will remain locked until the job has finished. When
set to true the locks are released at the completion of the transaction
within which the objects were modified. If you need to guarantee a
high level of concurrency while running imports, set the release_locks_early parameter
to true
and ensure that the transactions are committed
at short intervals. Depending on what API you use for the scheduled
job, the API can affect when transactions are committed. For example:
- When using the Java™ API,
you have explicit control on transaction boundaries by using the
startTransaction()
andcommit()
methods. - When using Scripting API, per default in the common.properties file, the aggregation_queue_size parameter defines after how many modified objects the transaction gets committed.
startBatchProcessing()
and flushBatch()
methods
to define how many item updates are batched (using JDBC batching)
to minimize database update calls. Batching may help to improve the
overall performance specifically in the cases where many attributes
of an item or category spec are marked as indexed and when there is
a high latency for network calls between the scheduler service and
the database server.