Import job design

An import is an external data file, or job that you design to load data from an external source into the PIM system. You use imports not only during the initial data migration process but also to keep the PIM system up-to-date on an ongoing basis. Generally, you design approximately four to five imports per implementation.

An import job does not necessarily acquire an exclusive lock on the catalog or hierarchy on which it was defined. If the catalog or hierarchy is exclusively locked, concurrent changes are not allowed to be made to the contents of the catalog or hierarchy.

Whether the catalog or hierarchy is locked or not depends on the job type and semantics.

If a catalog or hierarchy is not exclusively locked by an import job, then the job may acquire locks on the entries in the catalog or hierarchy that it modifies. The duration for which these locks are held on the entries is affected by:
  • whether the release_locks_early parameter is set to true or false
  • and how many objects are contained within one transaction
When the release_locks_early parameter is set to false, all of the modified objects within a scheduled job will remain locked until the job has finished. When set to true the locks are released at the completion of the transaction within which the objects were modified. If you need to guarantee a high level of concurrency while running imports, set the release_locks_early parameter to true and ensure that the transactions are committed at short intervals.
Depending on what API you use for the scheduled job, the API can affect when transactions are committed. For example:
  • When using the Java™ API, you have explicit control on transaction boundaries by using the startTransaction() and commit() methods.
  • When using Scripting API, per default in the common.properties file, the aggregation_queue_size parameter defines after how many modified objects the transaction gets committed.
When using the Java API in scheduled jobs, you can use the startBatchProcessing() and flushBatch() methods to define how many item updates are batched (using JDBC batching) to minimize database update calls. Batching may help to improve the overall performance specifically in the cases where many attributes of an item or category spec are marked as indexed and when there is a high latency for network calls between the scheduler service and the database server.