Network requirements for Data Virtualization

The Data Virtualization service exposes the following network communication ports to allow connections from outside of the Cloud Pak for Data cluster. This is an optional task.

Ports exposed by Data Virtualization
Network requirements for load-balancing environments:
- Defining gateway configuration to access isolated remote connectors
Updating HAProxy configuration file

Ports exposed by Data Virtualization

The following table lists the ports that are exposed by Data Virtualization and their usage.

Table 1. Ports exposed by the Data Virtualization service
Port usage	External port	Internal port	Communication
External client applications to connect to Data Virtualization via JDBC with SSL.	To get the external port: Go to Data > Data Virtualization > Connection details. Select the With SSL option in the Connection configuration resources section. The external port is the value of the Port number field. Optionally, you can run the following command: `oc get -n Project -o jsonpath="{.spec.ports[?(@.name=='bigsqldb2jdbcssl')].nodePort}" services dv-server` Replace `Project` with the project (namespace) where the Data Virtualization service is installed. For example, `oc get -n dv-project -o jsonpath="{.spec.ports[?(@.name=='bigsqldb2jdbcssl')].nodePort}" services dv-server 31961`	32052	TCP
External client applications to connect to Data Virtualization via JDBC without SSL.	To get the external port: Go to Data > Data Virtualization > Connection details. Select the Without SSL option in the Connection configuration resources section. The external port is the value of the Port number field. Optionally, you can run the following command: `oc get -n Project -o jsonpath="{.spec.ports[?(@.name=='bigsqldb2jdbc')].nodePort}" services dv-server` Replace `Project` with the project (namespace) where the Data Virtualization service is installed. For example, `oc get -n dv-project -o jsonpath="{.spec.ports[?(@.name=='bigsqldb2jdbc')].nodePort}" services dv-server 32162`	32051	TCP
Automated discovery to streamline the process of accessing remote data sources. See Discovering remote data sources.	To get the external port, run the following command `oc get -n Project -o jsonpath="{.spec.ports[?(@.name=='qpdiscovery')].nodePort}" services dv-server` Replace `Project` with the project (namespace) where the Data Virtualization service is installed. For example, `oc get -n dv-project -o jsonpath="{.spec.ports[?(@.name=='qpdiscovery')].nodePort}" services dv-server 30503`	7777	TCP

To get the list of Kubernetes NodePort ports exposed by the Data Virtualization service and internal-to-external port mapping, run the following command:

oc get -n Project services dv-server

Replace Project with the project (namespace) where the Data Virtualization service is installed.

For example,

oc get -n dv-project services dv-server

NAME        TYPE    CLUSTER-IP EXTERNAL-IP                 PORT(S)                       AGE
dv-server NodePort 172.30.140.105 <none> 7777:30503/TCP,32051:32162/TCP,32052:31961/TCP  2d

Network requirements for load-balancing environments

By using the iptables utility or the firewall-cmd command, you can ensure that external ports exposed listed in Table 1 and their communication are not blocked by local firewall rules or load balancers.

Note: For more information about checking ports for communication blockages, see Managing data using the NCAT utility in the Red Hat® documentation.

If your Cloud Pak for Data uses a load balancer and you get a timeout error when trying to connect to the Data Virtualization service, increase the load balancer timeout values by updating the /etc/haproxy/haproxy.cfg file. For more information, see Limitations and known issues in Data Virtualization.

Defining gateway configuration to access isolated remote connectors

A remote connector acts as a gateway to remote data sources. If the remote connector host machine has network access to the Cloud Pak for Data cluster, the remote connector will automatically contact and be connected to by the cluster. However, if the physical network allows access in one direction only, from Cloud Pak for Data to the remote connector, you must manually configure the connection from Data Virtualization to the remote connector, by using the API DEFINEGATEWAYS() as follows:

Click Data > Data Virtualization > SQL editor
Run the DVSYS.DEFINEGATEWAYS() stored procedure. For example:
```
DVSYS.DEFINEGATEWAYS('host1:6414, host2:6414') 
```
Replace host1 and host2 variables with the remote connector hostname or IP address. This example uses port 6414, which you specify while generating the dv_endpoint.sh configuration script. To determine which port-mapping to use in the DVSYS.DEFINEGATEWAYS() stored procedure, check the Queryplex_config.log on your remote connector, and search for the GAIAN_NODE_PORT value. For example:
```
GAIAN_NODE_PORT=6414
```
If you use port forwarding (e.g. NAT or VPN) to the remote connector, you must specify two ports:
```
DVSYS.DEFINEGATEWAYS('host1:37400:6414, host2:37400:6414')
```
In this example, two remote connectors are listening internally on ports 6414, but these ports are not exposed externally by the host. For example, remote connectors can only be accessible from Cloud Pak for Data via a VPN server that is configured to map external VPN port 37400 to internal port 6414. Defining the gateway enables Data Virtualization to open a connection to the remote connectors running on host1 and host2. Data Virtualization connects to port 37400 on the remote host, and the VPN forwards traffic to the remote connector's internal port 6414.

Updating HAProxy configuration file

If you use an external infrastructure node to route external Data Virtualization traffic into the Red Hat OpenShift® cluster, you must ensure traffic is forward to the master nodes of your cluster:

On the infrastructure node, open the HAProxy configuration file located at /etc/haproxy/haproxy.cfg.

Update the haproxy.cfg file to specify port information.

You must update the file directly, don't copy and paste the following code sample. The values that are specified in the haproxy.cfg file come from the cluster. These values are different for each Data Virtualization service instance that is provisioned, even if you use the same cluster or namespace to provision the service.

To update the haproxy.cfg file, ensure that:

You include each master node in the cluster in the backend sections, so that if one master node goes down, the connection can go through a different master node.
Sections in the file are uniquely named if your cluster runs multiple namespaces, and each namespace has a Data Virtualization service instance.
For example, you have the following sections in the haproxy.cfg file for Data Virtualization in namespace zen. However, if you also have a Data Virtualization service instance in namespace abc, you must add node ports for namespace abc, and ensure that sections in the haproxy.cfg file have a different name, such as dv-abc-ssl, dv-abc-nonssl, and dv-abc-discovery.

defaults
       log                     global
       option                  dontlognull
       option  tcp-smart-accept
       option  tcp-smart-connect
       retries                 3
       timeout queue           1m
       timeout connect         10s
       timeout client          1m
       timeout server          1m
       timeout check           10s
       maxconn                 3000

frontend dv-nonssl
       bind *:NodePort for 32051
       default_backend dv-nonssl
       mode tcp
       option tcplog
backend dv-nonssl
       balance source
       mode tcp
       server master0 Master0-PrivateIP:NodePort for 32051
       server master1 Master1-PrivateIP:NodePort for 32051
      (repeat for each master node in the cluster)

frontend dv-ssl
       bind *:NodePort for 32052
       default_backend dv-ssl
       mode tcp
       option tcplog
backend dv-ssl
       balance source
       mode tcp
       server master0 Master0-PrivateIP:NodePort for 32052
       server master1 Master1-PrivateIP:NodePort for 32052
      (repeat for each master node in the cluster)


frontend dv-discovery
       bind *:NodePort for 7777
       default_backend dv-discovery
       mode tcp
       option tcplog
backend dv-discovery
       balance source
       mode tcp
       server master0 Master0-PrivateIP:NodePort for 7777
       server master1 Master1-PrivateIP:NodePort for 7777
      (repeat for each master node in the cluster)

Reload HAProxy:
```
systemctl reload haproxy
```