Talend Manual Inputs
This page explains the specific structure and files accepted by this scanner. See Manta Flow Usage: Preparing Scanner Inputs for additional methods how to provide these inputs via Process Manager or Orchestration API. Check Ingest Source Support how to use Manta Agent to obtain input files from a remote machine or Git.
IBM Automatic Data Lineage supports analysis of both Talend export formats (zip and item), but there can be differences in the results.
-
Talend jobs provided as individual
*.itemfiles are analyzed independently, and therefore, no direct relationship between job or joblet components in different jobs is created. -
When a zip archive file is analyzed, the whole zip archive file along with all the jobs and joblets inside is analyzed as one task, and therefore, the relationships between jobs will be analyzed as well. In this case, a relationship is, for example, when one job calls another job through a tRunJob component with the results of the called job propagated to the job that is calling. In the zip archive format, data lineage is created between the result of the called job and tRunJob.
Please note that the content and structure of the Talend archive file format export must comply with the same directory and file structure as generated by Talend Studio.
Exporting Jobs
Jobs have to be exported and provided manually for dataflow analysis. See https://community.qlik.com/t5/Official-Support-Articles/Exporting-items-from-Talend-Studio/ta-p/2150920 or https://help.qlik.com/talend/en-US/studio-user-guide/8.0-R2024-08/exporting-items.
By default, assuming that <proj_name> is the name of the project, the exported files related to jobs are located in /../<proj_name>/process. Inside the folder process, jobs can be further organized
into subfolders. All files related to one job are always located in the same (sub)directory and have the same filename (differing only in extension) — this is mandatory as, otherwise, Talend Studio is unable to import the job back into the workspace.
The whole folder <proj_name> can also be exported as a .zip archive.
Building Jobs
Talend Studio allows you to build jobs as standalone units containing scripts for their execution or scheduling (.bat file for windows). See https://help.qlik.com/talend/en-US/studio-user-guide/8.0-R2024-08/building-job-as-standalone-job. The structure is very similar to one of the exported jobs. By default, the built job is zipped into <job_name>.zip.
Among other things, the archive contains the folder <proj_name> where job definitions are stored as described in the previous section. However, it is important to note that when setting the build properties, the user can choose
not to include the .item files, at which point the job definitions are not stored in the build — which is not desirable.
If the built job references (depends on) other jobs within the same project (i.e., it calls some child jobs as a part of its ETL operations), then all necessary definitions of those referenced are automatically also included in the build.
Input Folder Structure
To analyze Talend project files, create a directory structure as follows.
-
input/talend/${talend.system.id}-
All Talend
*.zipand/or*.itemfile exports (mandatory, as per above) -
contextdirectory with context files (optional, used only if contexts are used) -
contextReplacement.txt(optional, used only when needed as per Talend Context Configuration) -
talendExpressionOverrides.csv(optional, used only when needed as per Talend Scanner Guide)
-
The folder above can be provided to Automatic Data Lineage for execution as per Manta Flow Usage: Preparing Scanner Inputs.
Export File Structure
The zip export structure generated by Talend can be used directly however, Manta only uses the following files:
- process/
- optional folder hierarchy
*.item*.properties
- optional folder hierarchy
- maps/
- optional folder hierarchy
*.xml
- optional folder hierarchy
- joblets/
- optional folder hierarchy
*.item*.properties
- optional folder hierarchy
- context/
*.item*.properties