Installing connectors on remote data sources in Data Virtualization

To access data that is stored in remote data sources in Data Virtualization, you must install a remote connector.

Before you begin

Required role: To complete this task, you must have the Data Virtualization Admin or Engineer role.

About this task

A remote connector enables Data Virtualization to automatically access data, such as files located in a remote data source. If you need to virtualize data that is stored on a remote data source or file system, you must install a remote connector on the data source where the data is located. The credentials used to establish the connection to the data source determine what data in the data source can be accessed by the Data Virtualization service.

Data Virtualization provides a way to add a connector on a remote data source by generating and running the dv-endpoint.sh or dv-endpoint.bat configuration script. The dv-endpoint.sh or dv-endpoint.bat script performs the following tasks:
  • Ensures that you meet all prerequisites to add a remote connector.
  • Sets the specified parameters to add the remote connector.
  • Downloads and extracts the remote connector installation package.
  • Start and check the status of the remote connector.

Procedure

To add connectors to remote data sources, complete the following steps:

  1. On the navigation menu, click Data > Data virtualization.
    The service menu opens to the Data sources page by default.
  2. Click Set up remote connector.
  3. Enter the name and description of the remote connector.
  4. To generate the dv-endpoint configuration script, follow these steps.
    1. Select the operating system of the remote data source.
    2. Specify the directory where Java is installed on the remote data source.
      The path must point to the Java SE Development Kit (JDK) or JRE installation. For example, if your Java binary is installed in /usr/lib/jvm/java-21.0.1.12-ibm/bin/java, then use /usr/lib/jvm/java-21.0.1.12-ibm as your Java installation directory.

      Similarly, on Windows, if your Java binary is located in C:\JAVA21SR1FP12\sdk\bin\java, then use C:\JAVA21SR1FP12\sdk as your Java installation directory.

    3. Specify the directory where you want to install the remote connector.
      For example, you can use the /home/user/<user-id>/dvendpoint directory.
    4. Specify the node port that the connector uses on the remote data source.
      Each connector that you install on the remote data source can use a different node port. By default, the remote connector uses port 6414.
      Note: On Microsoft Windows, if the port is already in use, the installation of the endpoint fails. You can see error messages in the dvendpoint.log file in the endpoint installation directory. For example, you might see errors messages that are similar to the following example.
      An exception occurred during the Install phase.
      System.ComponentModel.Win32Exception: The specified service already exists
    5. Optional: If you want to create connections to data sources that have Kerberos authentication enabled, specify the location of the Kerberos configuration file (krb5.conf) on the remote connector in the Advanced options.
      Data Virtualization supports connecting to the following connectors that have Kerberos authentication:
      • Apache Hive
      • Apache Impala
      • Apache Spark SQL
    6. Click Generate script.
  5. If you download the script, follow these steps to run the script on the remote data source.
    1. Use SCP or FTP to transfer the dv-endpoint file to the remote data source.
    2. On Linux® and Mac OS, set the script as an executable file.
      chmod +x dv_endpoint
    3. On Linux and Mac OS, Run the script on the remote data source.
      ./dv-endpoint.sh
    4. On Windows, the endpoint is installed as a service and must be installed by a user with Administrator privileges. A new user can be created for this purpose, allowing directory access to be restricted later.
  6. If you copy the script to clipboard, follow these steps to run the script on the remote data source.
    1. Create a file on the remote host.
    2. Paste the script content from clipboard and save the file.

Results

You can now virtualize data that is stored on the remote data source or file system.

What to do next

Linux and Mac OS
On Linux and Mac OS, the remote connector has the access permissions of the user who runs the script by default. Therefore, it is recommended that you follow these steps on your operating system:
  1. Create a functional ID and start the remote connector under this functional ID.
  2. Create a group to which the data to be virtualized can be added.
  3. Grant the group read access to the data to be virtualized.
Windows
On Windows, the remote connector must be installed by using a new account, which is a member of the Administrators group. You must grant this user read access only on the data to be virtualized. This step can be done by using Windows OS file system security settings.

To manage remote connectors, see Managing connectors on remote data sources.

To troubleshoot issues with remote connectors, see Remote connector does not start after restart in Data Virtualization.