Running the Oozie workflows from the command line
You can run Big Match applications for deriving, comparing, and linking data (and more) as Oozie workflows from a command-line interface.
About this task
The oozieApps and oozieAppPropTemp directories can be
found on HDFS under the /bigmatch/oozie directory. To extract, you can use this
example command:
hadoop fs -get /bigmatch/oozie/oozieAppPropTempThe general syntax for the commands is:
${OOZIE_HOME}/bin/oozie job -oozie ${OOZIE_URL} -config /home/oozie/{application_name}.properties -runwhere
OOZIE_HOME is the installation directory for Oozie and OOZIE_URL
is URL address pointing to the Oozie service. For example, the following command runs the PME Derive
application for an Oozie installation at /usr/hdp/current/oozie-client with the
Oozie service running at
http://mdmbigmatch01.somedomain.com:11000/oozie:/usr/hdp/current/oozie-client/bin/oozie job -oozie http://mdmbigmatch01.somedomain.com:11000/oozie
-config /home/oozie/derive.properties -runNote: If
you are running Oozie with SSL enabled, then the
bigmatch user must have access to
the Oozie client.- Copy the oozie.truststore file onto the client machine and ensure that the
bigmatchuser can access it. - Pass the trustStore to the JVM. The command syntax is as
follows:
For example:export OOZIE_CLIENT_OPTS='-Djavax.net.ssl.trustStore=<path to oozie.truststore>'export OOZIE_CLIENT_OPTS='-Djavax.net.ssl.trustStore=/home/bigmatch/oozie.truststore'
With SSL, the syntax for executing the Big Match Oozie workflow remains the same as without
it, but the Oozie URL will be different. The secure Oozie URL follows the format
-oozie
https://<oozie_server>:<secure_port>/oozie. For example:
-oozie https://node1.domain.com:11443/oozieRunning the command returns a job ID. You can then use the following command to see the status of
the Oozie
workflow:
${OOZIE_HOME}/bin/oozie job -oozie ${OOZIE_URL} -info ${JOB_ID}As explained elsewhere, the derive, compare, and link applications run automatically by default
as you load data. If you are running the applications manually, you would typically run the
applications in the following order:
- PME Derive
- PME Compare
- PME Link
For particular needs, you can also run the following applications:
- Batch Processing
- PME Derive
- PME Generate Weights
- PME Compare
- PME Link
- PME Re-index
- PME Unlink
- Analysis
- PME Bulk Search
- PME Bucket Analysis
- PME Entity Analysis
- PME Score Analysis
- PME Export Sample Pairs
- PME Token Frequency Analysis
- Administration
- PME Export Records
- PME Extract Entities
- PME Cache Indexes
These applications are not necessarily part of a typical workflow. See the information about each application for more detail.