Import job design considerations
Ensure you are familiar with the following import job design considerations and best practices.
- If the same kind of information is coming into the PIM system from multiple systems, make sure you use a single file format. If you use multiple file formats, you need to create an import job for each file format that you use. In general, use an Enterprise Application Integration (EAI) tool between the upstream system and the PIM system.
- Ensure that all transformations of the data from upstream systems are completed outside of the PIM system. Use Extract Transform and Load (ETL) tools, such as IBM® InfoSphere® DataStage®, to cleanse and harmonize the data before importing it into your PIM system.
- Push validations outside of IBM Product Master whenever possible. Almost all Product Master implementations have an upstream system that feeds data into Product Master. Whenever possible provide valid, cleansed, and harmonized data. This will allow the import job to be light in validations.
- Parallelize import jobs to load large input data when possible.
This requires that there is no dependency between the records that
you receive in your imports. You then would need to perform the following:
- Break down the import data file into smaller chunks.
- Create multiple import jobs (using the same import script), each using one of the generated smaller data files.
- Run the jobs in parallel.
- You might need to increase your hardware or memory requirements for your PIM system depending on the volume of data in your import jobs and the frequency with which you run your import jobs.
- Ensure that you define how you will handle errors that occur during import, including how errors will be logged, how users will be notified of errors, or whether a retry mechanism will be included.
- Product Master provides the capability to tag versions for containers such as catalogs and hierarchies. Imports will create versions of the destination catalog each time they are run. The container is tagged with a version upon completion of an import. This enables a difference between exports and rollbacks. The import will also lock any container that it is affecting so that no concurrent changes are made. This will disable both users and other jobs from updating information in the catalog while the import is running.
- File spec
- A Product Master spec that represents the structure of the incoming data file. A file spec is required. An import script API is typically for XML imports or imports that require enhanced business logic, validations, or multi-occurring attributes.
- Primary spec
- The spec of the catalog or hierarchy for which the data is to be loaded.
- Spec map
- A graphical representation that provides:
- A mapping of the file spec attributes to the catalog attributes
- Validations across specifications
- Catalog, hierarchy, or lookup table
- An item import job is for a catalog or lookup table, and a hierarchy import job data import is for a hierarchy.
- Data source
- The source of the input file. For example, the document store, an FTP site, or uploading through a browser.
- Import script API
- A script API that inserts customized business and processing logic.
The script API can integrate the import into a larger framework of
functionality such as the Validation Framework. Ninety percent of
custom import development is done in the import script API.
You can use an existing script API, a generated script API, or a new script API. Generated script APIs can be used in simple imports where a file spec and spec map exist and there are no special processing requirements. Generated script APIs use these components to dynamically generate the processing logic that is required to import data.
Input parameters can be defined for an import job by associating an input parameter spec to the import script.
Best practices
Imports that you create with the user interface (the Import Console) without a script API are best suited for fixed file specifications from a delimited (CSV format) file that are imported into a fixed catalog spec, with no multi-occurring attributes and little business validations. The spec maps can perform attribute-level validations that are identified within the specs, but the more complex business logic from a validation framework or trigger script APIs will be absent. The more complex the data model, validations, and business logic, the more complex the imports will become. Use script API or Java™ API for these types of customizations.
- Data transformations (Inbound)
- For data transformation from the upstream system to the PIM system, ensure that all the data transformations are completed outside of the PIM system.
- File formats
- If the same kind of information is flowing in to the PIM system from multiple systems, then use a single file format. If you use multiple file formats, you need to create as many imports as the number of file formats. Generally, you use an Enterprise Application Integration (EAI) layer between the upstream system and the PIM system.
- Type of input or job
- Provide a import job even if the upstream system sends out snapshot jobs by using ETL tools such as IBM InfoSphere DataStage.