Import a custom JDBC data source

If a user plans to connect to a relational database that is not supported by default, an IBM® Watson™ Studio Local administrator must import a JDBC driver for the relational database.

For a list of supported data sources, see Add data sources and remote data sets

To use data from an unsupported relational database, complete the following tasks:

Import a JDBC driver

This task must be completed by the IBM Watson Studio Local administrator.

To import a JDBC driver:

  1. From the command line, change to the /wdp/utils/ directory on the master node of the cluster.
  2. Enter the following command:
    moveJarClasspath.sh -add jarfile-name

    Replace jarfile-name with the fully qualified path to the JAR file that contains the JDBC driver. For example:

    moveJarClasspath.sh -add /wdp/tmp/postgresql-42.1.1.jar

Remove a JDBC driver

This task must be completed by the IBM Watson Studio Local administrator.

To remove a JDBC driver:

  1. From the command line, change to the /wdp/utils/ directory on the master node of the cluster.
  2. Enter the following command:
    moveJarClasspath.sh -remove jarfile-name
    Replace jarfile-name with the name of the file (not the fully qualified path of the file). For example:
    moveJarClasspath.sh -remove postgresql-42.1.1.jar

Create a custom data source and remote data set

Any collaborator with data source information and credentials can create a data source in the project. To add a data source to a project:
  1. From your project, go to the Data Sources page and click add data source.
  2. In the Create Data Source window, specify all of the required fields.
  3. In the Data Source Type pull-down menu, select Custom JDBC.
  4. Type in the JDBC driver class name, for example, org.postgresql.Driver.
  5. Type in the JDBC URL, for example, jdbc:postgresql://<host>:29922/compose.
  6. If you select the Shared check box, the data source and its credentials become accessible to all collaborators in the project. If you do not select Shared, then each collaborator must provide their own credentials to use the data source.
Note: When a data source is shared, it cannot be made private again.
Note: A remote data source cannot be shared for an external Git project.
You can add or edit remote data sets that connect directly to a schema and table name in the data source. Notebooks within the project can then refer to each remote data set by a name to retrieve its information. You can either define remote data sets when you create the data source, or add them later.

All collaborators in a project can use a shared data source in their notebook, or add their own private data source. Collaborators with viewer privileges cannot edit data sources.

Load data in the notebook

To load data from a relational database into a notebook, you must manually add a connection to the database in your notebook.

The following table lists the recommended APIs for adding a connection:

Language Recommended APIs
Scala Spark APIs, java.sql.{Connection,DriverManager}
Python Spark APIs, jaydebeapi
R Spark APIs

You can use the following examples to get started:

Scala syntax
java.sql.{Connection,DriverManager}

import java.sql.{Connection,DriverManager}
val url = "jdbc:postgresql://9.87.654.321:5432/postgres"
val driver = "org.postgresql.Driver"
val username = "postgres"
val password = "password"var connection:Connection = _
try {
   Class.forName(driver)
   connection = DriverManager.getConnection(url, username, password)
   val statement = connection.createStatement
   val rs = statement.executeQuery("SELECT * FROM myschema.cars")
   while (rs.next) {
      val make = rs.getString("make")
      println("make = %s".format(make))
   }
} catch {
   case e: Exception => e.printStackTrace
}
connection.close
Python syntax
import jaydebeapi, sys

#Enter the values for you database connection
dsn_database = "postgres"# e.g. "BLUDB"
dsn_hostname = "9.87.654.321"# e.g.: "hostname.site.com"
dsn_port = "5432"# e.g. "50000"
dsn_uid = "postgres"# e.g. "dash104434"
dsn_pwd = "password"# e.g. "7dBZ3jWt9xN6$o0JiX!m"

connection_string='jdbc:postgresql://'+dsn_hostname+':'+dsn_port+'/'+dsn_database

print connection_string

if (sys.version_info >= (3,0)):
   conn = jaydebeapi.connect(dataSource['driver_class'], dataSource['URL'], [dataSource['user'], dataSource['password']])
else:
   conn = jaydebeapi.connect(dataSource['driver_class'], [dataSource['URL'], dataSource['user'], dataSource['password']])
query = 'select * from ' + (dataSet['schema'] + '.'if (len(dataSet['schema'].strip()) != 0) else'') + dataSet['table']

curs = conn.cursor()
curs.execute("select * from myschema.cars")
curs.close()
R syntax
url <- "jdbc:postgresql://9.87.654.321:5432/postgres"
user <- "postgres"
password <- "password"
dbtable <- "myschema.cars"

spark<-sparkR.session();
df <- read.jdbc(url, dbtable, user=user, password=password)
collect(df)