Scenario: Processing a stream of files with the ingest utility
The following scenario shows how you can configure your data warehouse to automatically ingest an ongoing stream of data files.
The problem: In some data warehouses, files arrive in an ongoing stream throughout the day and need to be processed as they arrive. This means that each time a new file arrives, another INGEST command needs to be run specifying the new file to process.
The solution: You can write a script that automatically
checks for new files, generates a new INGEST command,
and runs that command. The ingest_files.sh
is a sample
of such a script. You also need to create a crontab entry in order
to specify how frequently the shell script is supposed to run.
Before the user implements this mechanism (that is, the script
and the chrontab entry) for processing the stream of files, the user
needs to have met the following prerequisites and dependencies:
- The target table has been created in the target database
- The ingest utility is ready to use (that is, it is installed and set up on a client machine)
- An INGEST command has been specified and verified by running it manually with a test file
- The objects, such as the exception table, referenced in the INGEST command have been created
- A crontab file has been created on the system on which the ingest utility is running
- The user has a process for creating the input files and moving them into the source directory that the script uses
- The user creates a new script, using
ingest_files.sh
as a template by doing the following:- Replace the following sample input values to reflect the user's
values:
- INPUT_FILES_DIRECTORY
- DATABASE_NAME
- SCHEMA_NAME
- TABLE_NAME
- SCRIPT_PATH
- Replace the sample INGEST command
- Save the script as
populate_table1_script
- Replace the following sample input values to reflect the user's
values:
- The user adds an entry to the crontab file to specify how frequently
the script is to run. Because the user wants the script to run once
a minute, 24 hours a day, every day of the year, the user adds the
following line:
1 * * * * $HOME/bin/populate_table1_script
- The user tests the script by creating new input files and adding them to the source directory.