Distributing transform libraries to compute nodes
If you cannot globally cross-mount the Projects directory (for example, in a Microsoft Windows MPP configuration), you must set up a method for distributing compiled transform libraries to all physical compute nodes in the cluster.
About this task
Method | Configuration | Advantages | Disadvantages |
---|---|---|---|
Manually distribute the transform libraries. | Use operating system commands to copy the transform libraries to all compute nodes. | This is the simplest method. | This method is time-consuming and error-prone. Also, it requires a time window when jobs are not running on the system. |
Copy the transform libraries automatically at runtime. | Set the APT_COPY_ TRANSFORM_OPERATOR environment variable to True. The environment variable can be set at the job or project level. | This method is automatic and does not require manual intervention. | This method adds the startup overhead. Also, it works correctly only with single instance jobs. |
Copy the transform libraries automatically at compile time. | Set the APT_DIST_ TRANSFORM_OPERATOR environment variable to a list of all server names participating in the cluster. The environment variable can be set at the job or project level. The list of server names is merged with the list of fastnames referenced in the configuration file that is used to compile the jobs. If no configuration file is specified, the default configuration file is used. | This method is also automatic and does not require manual intervention. This method does not incur any startup overhead. The method works correctly with both single and multi-instance jobs. This method ensures that compiled transform libraries are distributed to all participating compute nodes regardless of the configuration file that is used at compile time. | This method requires that a compiler be present on the conductor node. This requirement could be difficult to meet in production environments. |
Note: If you use compile-time distribution you do not need
to use runtime distribution.