Configuring network requirements for Data Virtualization
The Data Virtualization service exposes network communication ports to allow connections from outside of the IBM Software Hub cluster.
Finding ports exposed by Data Virtualization
Data Virtualization supports external client connections by providing three ports that customers can connect to from outside the cluster. To allow external connections from customers, Data Virtualization ports are configured to be exposed externally as Kubernetes NodePort ports. Kubernetes NodePort configuration maps a randomly generated port number (referred to as external port) from a predefined range to the actual port that Data Virtualization pods use internally (referred to as Internal port).
Alternatively, you can establish a SSL connection to Data Virtualization by using the pre-defined OpenShift® route named c-db2u-dv-db2u.
Ensure that you source the environment variables before you run the commands in this task.
Refer to the port table for more information on the ports exposed by Data Virtualization and their usage. See port table.
- External client applications to connect to Data Virtualization by using JDBC with SSL.
-
Note: To establish SSL connections to Data Virtualization, download the SSL certificate. From the Data Virtualization menu, select , and then select Download SSL Certificate.
- Upon provisioning,
Data
Virtualization automatically creates a passthrough OpenShift route named c-db2u-dv-db2u in its OpenShift project. Note: If the route does not exist, see Data Virtualization passthrough route is missing after upgrade.To get the route's host, run this command. Then use the returned route host with port 443 to establish the SSL JDBC connection to Data Virtualization.
oc get route c-db2u-dv-db2u -n ${DV_INSTANCE_NAMESPACE} - The following steps are an alternative method to establish an SSL connection to Data
Virtualization.
Internal port: 50001
Communication: TCP
To get the external port, follow these steps.- Navigate to Configure connection resources.
- Select the With SSL option.
Optionally, you can run the following command.oc get -n ${PROJECT_CPD_INST_OPERANDS} -o jsonpath="{.spec.ports[?(@.name=='ssl-server')].nodePort}" services c-db2u-dv-db2u-engn-svc
- Upon provisioning,
Data
Virtualization automatically creates a passthrough OpenShift route named c-db2u-dv-db2u in its OpenShift project.
- External client applications to connect to Data Virtualization by using JDBC without SSL.
- Internal port: 50000
- Automated discovery to streamline the process of accessing remote data sources.
- See Discovering remote data sources.
oc get -n ${PROJECT_CPD_INST_OPERANDS} services c-db2u-dv-db2u-engn-svcExamine the output.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
c-db2u-dv-db2u-engn-svc NodePort 172.30.13.109 <none> 50000:30662/TCP,50001:32337/TCP,7777:32178/TCP 2d4h
| Description | Internal port | External port |
|---|---|---|
| JDBC SSL | 50001 | 32337 |
| JDBC Non-SSL | 50000 | 30662 |
| DV Automated Discovery | 7777 | 32178 |
Data Virtualization ports for external connections
Data Virtualization provides you with external ports for external connections.
Inside the Data Virtualization head pods, Data Virtualization opens the following ports, which are mapped to Kubernetes NodePort ports.
| Port | From | To | Function |
|---|---|---|---|
|
Port 443 |
External JDBC client connecting to the host of OpenShift route c-db2u-dv-db2u. |
Data Virtualization head pod |
SSL encrypted for database communication. |
|
Corresponding NodePort port value for internal 50000 port |
External JDBC client |
Data Virtualization head pod |
Non-SSL for database communication. |
|
Corresponding NodePort port value for internal 50001 port |
External JDBC client |
Data Virtualization head pod |
SSL encrypted for database communication. |
|
Corresponding NodePort port value for internal 7777 port |
Remote connectors |
Data Virtualization head pod | An encrypted data stream but non-SSL. |
Setting network requirements for load-balancing environments
By using the iptables utility or the firewall-cmd command, you can ensure that external ports exposed listed in Table 1 and their communication are not blocked by local firewall rules or load balancers.
For more information about checking ports for communication blockages, see Managing data using the NCAT utility in the Red Hat® documentation.
If your IBM Software Hub deployment uses a load
balancer and you get a timeout error when you try to connect to the Data
Virtualization service,
increase the load balancer timeout values by updating the /etc/haproxy/haproxy.cfg
file.
Updating HAProxy configuration file
If you use an external infrastructure node to route external Data Virtualization traffic into the Red Hat OpenShift cluster, you must forward traffic to the master nodes of your cluster.
- Ensure you replace <DV_instance_namespace> with the Data Virtualization instance namespace.
-
- Exposing non-SSL NodePort
- To expose non-SSL NodePort, you must define
frontend dv-nonssl-<DV_instance_namespace>andbackend dv-nonssl-<DV_instance_namespace>.
-
- Exposing SSL NodePort
- To expose SSL NodePort, you must define
frontend dv-ssl-<DV_instance_namespace>andbackend dv-ssl-<DV_instance_namespace>.Exception: This step is not required if you plan to connect to Data Virtualization using the c-db2u-dv-db2u OpenShift secure route on port 443 only.
-
- Deploying remote agents
- If you plan to deploy remote agents, you must define
frontend dv-discovery-<DV_instance_namespace>andbackend dv-discovery<DV_instance_namespace>.
- On the infrastructure node, open the HAProxy configuration file at /etc/haproxy/haproxy.cfg.
- Update the haproxy.cfg file to specify port information.
You must update the file directly. Don't copy and paste the following code sample. The values that are specified in the haproxy.cfg file come from the cluster. These values are different for each Data Virtualization service instance that is provisioned, even if you use the same cluster or namespace to provision the service.
- Find the NodePort ports for the namespace. See Finding ports exposed by Data Virtualization for more instructions.
- To get
Master<n>-PrivateIPfor each master node in the cluster, use the following command and look at theINTERNAL-IPcolumn.oc get nodes -o wide - To update the haproxy.cfg file, ensure that the following requirements are met.
- You include each master node in the cluster in the
backendsections, so that if one master node goes down, the connection can go through a different master node. - Sections in the file are uniquely named if your cluster runs multiple namespaces, and each
namespace has a Data
Virtualization service instance.
For example, you have the following sections in the haproxy.cfg file for Data Virtualization in namespace
zen. However, if you also have a Data Virtualization service instance in namespaceabc, you must add NodePort values for namespaceabc, and ensure that sections in the haproxy.cfg file have a different name, such asdv-abc-ssl,dv-abc-nonssl, anddv-abc-discovery. -
If you have multiple Data Virtualization instances, each one must have a different sets of NodePort ports. For example, you can append the namespace to the end of each set of NodePort ports.
The following example shows the haproxy.cfg file with NodePort ports set for multiple instances in different namespaces:
defaults log global option dontlognull option tcp-smart-accept option tcp-smart-connect retries 3 timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout check 10s maxconn 3000 frontend dv-nonssl-<namespace1> bind *:<NodePort value for the internal 50000 port> default_backend dv-nonssl-<namespace1> mode tcp option tcplog backend dv-nonssl-<namespace1> balance source mode tcp server master0 <Master0-PrivateIP>:<NodePort value for the internal 50000 port> check server master1 <Master1-PrivateIP>:<NodePort value for the internal 50000 port> check (repeat for each master node in the cluster) frontend dv-nonssl-<namespace2> bind *:<NodePort value for the internal 50000 port> default_backend dv-nonssl-<namespace2> mode tcp option tcplog backend dv-nonssl-<namespace2> balance source mode tcp server master0 <Master0-PrivateIP>:<NodePort value for the internal 50000 port> check server master1 <Master1-PrivateIP>:<NodePort value for the internal 50000 port> check (repeat for each master node in the cluster) frontend dv-ssl-<namespace1> bind *:<NodePort value for the internal 50001 port> default_backend dv-ssl-<namespace1> mode tcp option tcplog backend dv-ssl-<namespace1> balance source mode tcp server master0 <Master0-PrivateIP>:<NodePort value for the internal 50001 port> check server master1 <Master1-PrivateIP>:<NodePort value for the internal 50001 port> check (repeat for each master node in the cluster) frontend dv-ssl-<namespace2> bind *:<NodePort value for the internal 50001 port> default_backend dv-ssl-<namespace2> mode tcp option tcplog backend dv-ssl-<namespace2> balance source mode tcp server master0 <Master0-PrivateIP>:<NodePort value for the internal 50001 port> check server master1 <Master1-PrivateIP>:<NodePort value for the internal 50001 port> check (repeat for each master node in the cluster) frontend dv-discovery-<namespace1> bind *:<NodePort value for the internal 7777 port> default_backend dv-discovery-<namespace1> mode tcp option tcplog backend dv-discovery<namespace1> balance source mode tcp server master0 <Master0-PrivateIP>:<NodePort value for the internal 7777 port> check server master1 <Master1-PrivateIP>:<NodePort value for the internal 7777 port> check (repeat for each master node in the cluster) frontend dv-discovery-<namespace2> bind *:<NodePort value for the internal 7777 port> default_backend dv-discovery-<namespace1> mode tcp option tcplog backend dv-discovery<namespace2> balance source mode tcp server master0 <Master0-PrivateIP>:<NodePort value for the internal 7777 port> check server master1 <Master1-PrivateIP>:<NodePort value for the internal 7777 port> check (repeat for each master node in the cluster) - You include each master node in the cluster in the
- Reload HAProxy by using the following
command.
systemctl reload haproxy
Configuring a public load balancer service to allow external traffic into a Red Hat OpenShift on IBM Cloud cluster
-
Log in to Red Hat OpenShift Container Platform as a cluster administrator.
oc login ${OCP_URL} -
Change to the project where the IBM Software Hub control plane is installed:
This command uses an environment variable so that you can run the command exactly as written. For information about sourcing environment variables, see Setting up installation environment variables.oc project ${PROJECT_CPD_INST_OPERANDS} -
In the IBM Cloud console, go to and find the public subnet.
-
Find the
Subnet IDthat looks like the following example. You need this value when you create the load balancer file.0245-a1b123a4-1234-1234-1a2b-1b23d3bv9a45c - Create a load balancer .yaml file with the following
details:
apiVersion: v1 kind: Service metadata: name: lb-dv annotations: service.kubernetes.io/ibm-load-balancer-cloud-provider-vpc-subnets: "<Subnet ID from step 4>" spec: ports: - name: db protocol: TCP port: <Port value of your choice to be the non-ssl port that is used by external clients to connect to Data Virtualization> targetPort: 50000 type: LoadBalancer selector: app: db2u-dv component: db2dv formation_id: db2u-dv role: db type: engine name: dashmpp-head-0 -
Run the following command to create the .yaml file in the VPC:
$ oc create -f db2-lb.yaml - Run the following command to see the
details:
$ oc get svc lb-dvThe following example shows the details of load balancer named
lb-db2-2:NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE lb-db2-2 LoadBalancer 172.21.100.200 fbec480d-eu-de.lb.appdomain.cloud 51000:32149/TCP,51001:32514/TCP 21m
Defining gateway configuration to access isolated remote connectors
A remote connector acts as a gateway to remote data sources. If the remote connector host machine has network access to the IBM Software Hub cluster, the remote connector automatically contacts and connects to the cluster. If the automated discovery port is not exposed from HAProxy and the firewall rules, or if the physical network configuration allows only one-way communication from IBM Software Hub to the Data Virtualization remote connector, you might need to establish the connection manually.
After deploying the remote connector and validating that it is running on the host, if the remote
connector does not appear in the , you can manually configure the connection
from Data
Virtualization to the remote connector by using the API
DEFINEGATEWAYS().
- Click .
- Run the
DVSYS.DEFINEGATEWAYS()stored procedure. This stored procedure has an argument that contains a comma-separated list of hosts where remote connectors are running. In the following example, two remote connectors are running; one on host1 and another on host2.CALL DVSYS.DEFINEGATEWAYS('host1:6414, host2:6414')Replace host1 and host2 variables with the remote connector hostname or IP address. This example uses port 6414, which you specify when you generate the dv_endpoint.sh configuration script. To determine which port-mapping to use in the
DVSYS.DEFINEGATEWAYS()stored procedure, check the Queryplex_config.log on your remote connector, and search for the GAIAN_NODE_PORT value as shown in the following example.GAIAN_NODE_PORT=6414If you use port forwarding (for example, NAT or VPN) to the remote connector, you must specify two ports as shown in the following example.
CALL DVSYS.DEFINEGATEWAYS('host1:37400:6414, host2:37400:6414')In this example, two remote connectors are listening internally on ports 6414, but these ports are not exposed externally by the host. For example, remote connectors can be accessible only from IBM Software Hub by using a VPN server that is configured to map external VPN port 37400 to internal port 6414. Defining the gateway enables Data Virtualization to open a connection to the remote connectors that runs on host1 and host2. Data Virtualization connects to port 37400 on the remote host, and the VPN forwards traffic to the remote connector's internal port 6414.
Removing defined gateway configuration
If a defined gateway is no longer necessary or is unreachable, it can negatively impact query
performance. You can remove the gateway by using the API REMOVEGATEWAYS(). This API
takes a comma-separated list of gateway ID parameters as a single string literal
argument as shown in the following example.
CALL DVSYS.REMOVEGATEWAYS('GTW0,GTW2')
listrdbc.