Configuring access to HDFS parquet data source
To configure a federated server to access HDFS parquet file, you must provide the federated server with information about the hadoop file system and parquet files that you want to access. For hadoop file system, the namenode server address and port is needed.
Before you begin
- Ensure that the restservicewrapper.jar file and it's dependencies are installed and configured in CLASSPATH on the server that acts as the federated server. The restservicewrapper.jar should be installed in $INST_DIR/sqllib/federation/restservice/ path.
- Check db2 configuration 'JAVA_HEAP_SZ', set it to a reasonable value.
Procedure
- Test the network connection to the HDFS data source to verify that the that the HDFS data source server is properly started.
- Register the NoSQL wrapper. Optionally register a wrapper to access HDFS data sources. Federated servers use wrappers to communicate with and retrieve data from data sources. Wrappers are implemented as a set of library files.
- Register the server definitions for a HDFS data source. You must register the HDFS data source server that you want to access in the federated database.
- Create the user mapping for a HDFS data source. When you attempt to access a HDFS data source, the federated server establishes a connection to the namenode server by using a user ID and password that are valid for that data source.
- Register nicknames for a HDFS data source. For each HDFS data source server definition that you register, you must register a nickname for each collection that you want to access. Use these nicknames, instead of the names of the data source objects when you query the HDFS servers.
- Test query data from the HDFS data source. After you successfully register a nickname for a parquet file you can test querying data from that file.