Data provenance
LSF allows you to use data provenance tools to trace files that are generated by LSF jobs.
You can use LSF data provenance tools to navigate your data to find where the data is coming from and how it is generated. In addition, you can use data provenance information to reproduce your data results when using the same job input and steps.
LSF includes the following scripts to support data provenance:
- tag.sh: This post-execution script marks provenance data in the job-generated files.
- esub.dprov: This esub application automatically enables data provenance. The parameters are the input file names, which are recorded in the job output file as part of the provenance data.
- showhist.py: This script generates a picture to show the relationship of the job data files.
Data provenance marks output files that are generated by LSF (as specified by the bsub -o, -oo, -e, and -eo option arguments, and as the destination file that is specified as an argument for the bsub -f option) and any files in the current working directory for the job that are newer than the job execution time.