Accessing data sources by using remote connectors (Data Virtualization)
Data Virtualization supports the use of remote connectors to access local files on remote systems or to access remote data sources.
- Access remote data source or services
- Remote connectors provide access to data sources or other data services that are not directly accessible from the Cloud Pak for Data cluster. Additionally, remote connectors facilitate data source discovery with remote port scanning. For more information, see Discovering remote data sources.
- Access data stored in files
- You can access file data, in formats such as, CSV, TSV, and XLS, located on remote file systems. Additionally, connectors provide remote browsing and data preview to facilitate virtualization configuration.
- Improve query performance
- Remote connectors enable distributed aggregations and join filters, and accelerate query
processing on multiple worker nodes. Connectors also enable greater numbers of data source
connections and enhance parallelism during processing. As the number of connected sources increases,
the distribution and parallelism of processing benefits query performance. Thus, moving the
connector closer to the data source moves that processing closer to the data source.Recommendations:
- Locate the remote connector as close as possible to the data source. When it is on the same machine as the data source, you eliminate network latency between the data source and the remote connector. If it is located within the same data center, you have a stable high-speed network between them. The latency increases the further the remote connector moves from the data source. Latencies still exist along the connector communications path, but the connector performs more operations on the result data from the data source.
- The maximum recommended number of data sources per remote connector is 10, due to memory settings defined for each connector.
- Simplify access control to remote data sources
- You can control access to remote data sources by stopping or starting remote connectors.
How to access data on remote data sources
Use the following workflow to understand how to access data on remote data sources.
To try it out, see Tutorial: Remote connectors on Data Virtualization.