Network requirements for Data Virtualization
The Data Virtualization service exposes the following network communication ports to allow connections from outside of the Cloud Pak for Data cluster. This is an optional task.
- Ports exposed by Data Virtualization
- Network requirements for load-balancing environments:
- Updating HAProxy configuration file
Ports exposed by Data Virtualization
The following table lists the ports that are exposed by Data Virtualization and their usage.
Replace Project with the project (namespace) where the Data Virtualization service is installed.
oc get -n Project services dv-server
oc get -n dv-project services dv-server NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE dv-server NodePort 172.30.140.105 <none> 7777:30503/UDP,32051:32162/TCP,32052:31961/TCP 2d
Network requirements for load-balancing environmentsBy using the iptables utility or the firewall-cmd command, you can ensure that external ports exposed listed in Table 1 and their communication are not blocked by local firewall rules or load balancers.
If your Cloud Pak for Data uses a load balancer and you get a timeout error
when trying to connect to the Data Virtualization
service, increase the load balancer timeout values by updating the
/etc/haproxy/haproxy.cfg file. For more information, see Limitations and known issues in Data Virtualization.
- Switching service port from UDP to TCP
- If the Cloud Pak for Data load balancer does not support
UDP traffic, you can switch the port from UDP to TCP in the Data Virtualization service:
- Run the following command:
oc edit svc dv-server
- Change the protocol parameter from UDP to
TCP for the qpdiscovery
- name: qpdiscovery nodePort: 32071 port: 7777 protocol: TCP targetPort: 7777
- Run the following command:
- Defining alternative gateway host
If the Cloud Pak for Data load balancer does not support UDP port mapping, Data Virtualization provides an alternative gateway host, which handles incoming requests to map UDP ports for remote connectors. To manually define the gateway:
- Run the
QPLEXSYS.DEFINEGATEWAYS ()stored procedure. For example:
QPLEXSYS.DEFINEGATEWAYS ('host1:6414, host2:6414')Replace host1 and host2 variables with the remote connector hostname or IP address. This example uses port 6414, which you specify while generating the dv-endpoint configuration script. To determine which port-mapping to use in the
QPLEXSYS.DEFINEGATEWAYS ()stored procedure, check the QueryPlex_config.log on your remote connector, and search for the GAIAN_NODE_PORT value. For example:
GAIAN_NODE_PORT=6414If you use port (e.g. NAT or VPN) to the remote connector, you must specify two ports:
In this example, two remote connectors are listening internally on ports 6414, but these ports are not exposed externally by the host. For example, remote connectors can only be accessible from Cloud Pak for Data via a VPN server that is configured to map external VPN port 37400 to internal port 6414. Defining the gateway enables Data Virtualization to open a connection to the remote connectors running on host1 and host2. Data Virtualization connects to port 37400 on the remote host, and the VPN forwards traffic to the remote connector's internal port 6414.
QPLEXSYS.DEFINEGATEWAYS ('host1:37400:6414, host2:37400:6414')
Updating HAProxy configuration file
- On the infrastructure node, open the HAProxy configuration file located at /etc/haproxy/haproxy.cfg.
- Modify the haproxy.cfg file to include the following
defaults log global option dontlognull option tcp-smart-accept option tcp-smart-connect retries 3 timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout check 10s maxconn 3000 frontend dv-nonssl bind *:dv NodePort default_backend dv-nonssl mode tcp option tcplog backend dv-nonssl balance source mode tcp server master1 Master1-privateIP:dv NodePort
- Reload HAProxy:
systemctl reload haproxy