Table of contents

Supported data sources

IBM® Cloud Pak for Data lets you connect to your data no matter where it lives.

Expand each section for more information.

The Connections page

The Connections page displays a list of the connections that you can use for governance, data virtualization, data integration, and analytics projects. However, the list of connections is filtered when you access the connections from a service. For example, if you are using a connection to add a data source to an analytics project, only connections that are supported for analytics projects are displayed.

To create connections by using third-party JDBC drivers, see Importing JDBC drivers for data sources.

Restriction: Some connections can only be created from within a specific service. If you do not see a connection type listed on the Connections page, you must create it from the appropriate page within the service that supports it. For details, see Connecting to data sources (by service).

Additionally, some connections, such as IBM Guardium® connections, are not data sources. These connections enable you to integrate with external applications, but cannot be used as data sources for other services. For details, see Auditing your sensitive data with IBM Guardium.

For Watson™ Knowledge Catalog, this table shows the data sources that are supported in catalogs. For the data sources supported by automated discovery and quick scan, see Discovering assets (Watson Knowledge Catalog).

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

IBM data sources

The following table lists the IBM data sources that you can connect to from Cloud Pak for Data.

Connection type Cognos® Dashboards (analytics dashboards) DataStage® Data Virtualization Watson Knowledge Catalog Watson Studio Notes
Analytics Engine HDFS      
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Classic Federation          
Cloud Object Storage
(IBM Cloud Storage)
   
Credentials
You must have your API key and resource instance ID.

If you plan to use the S3 API, you must also have your access key and your secret key.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Cloud Object Storage (infrastructure)      
Credentials
You must have your secret key.

If you plan to use the S3 API , you must also have your access key.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Cloudant®      
Credentials
User name and password.
When you create your Cloudant service on IBM Cloud, you must choose Use both legacy credentials and IAM for the Available authentication methods.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Cognos Analytics      

Source connections only. You cannot use this connection as a storage target.

Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.
Anonymous access also supported.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Compose for MySQL      
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Data Set          
Data Virtualization

You can point to an instance of Data Virtualization on the same instance of Cloud Pak for Data or on a different instance of Cloud Pak for Data.

Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

IBM Data Virtualization Manager for z/OS®        
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.
Databases for PostgreSQL    
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Db2®  
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Db2 Big SQL
(Big SQL)
   
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Db2 Event Store
(Event Store)
       
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.
Db2 for i    
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Db2 for z/OS  
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Restriction: This option is only available from the Connections page.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Db2 Hosted      
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Db2 on Cloud  
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Db2 Warehouse
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Db2 Warehouse on Cloud
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Distributed Transaction          
DRS          
External Source          
External Target          
HDFS via Execution Engine for Hadoop      

Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service.

Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
The Use Watson Studio authorization for the connection option, uses your IBM Cloud Pak for Data credentials to authenticate to the data source.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Hierarchical          
Hive via Execution Engine for Hadoop      

Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service.

You also need the HiveJDBC41.jar file, which you can download from the Cloudera website (Click GET IT NOW, and then unzip the downloaded file.)

Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
The Use Watson Studio authorization for the connection option, uses your IBM Cloud Pak for Data credentials to authenticate to the data source.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Replaces the Hive JDBC - HDP connection.

Impala via Execution Engine for Hadoop      

Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service.

You also need the ImpalaJDBC41.jar file, which you can download from the Cloudera website (Click GET IT NOW, and then unzip the downloaded file.)

Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
The Use Watson Studio authorization for the connection option, uses your IBM Cloud Pak for Data credentials to authenticate to the data source.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Informix®  
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Restriction: This option is only available from the Connections page.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Informix Enterprise          
Informix Load          
ISD Input          
ISD Output          
Java Integration          
Lookup File Set          
Netezza®      
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.
Planning Analytics      
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
  • Basic
  • CAM Credentials
  • CAM Password
  • Windows Integrated Authentication Token

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Restriction: You can connect only to an external Planning Analytics TM1 server. You cannot connect to an instance of Planning Analytics that is deployed on Cloud Pak for Data
PureData® System for Analytics    
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

WebSphere® MQ          

Third-party data sources

Connection type Cognos Dashboards DataStage Data Virtualization Watson Knowledge Catalog Watson Studio Notes
Amazon Redshift
(Redshift)
   
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Restriction: This option is only available from the Connections page.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Amazon S3    
Credentials
You must have your access key and secret key.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Apache Cassandra          
Apache Hbase          
Apache HDFS    
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Apache Hive  

Source connections only. You cannot use this connection as a storage target.

Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Apache Kafka          
Big Data File Stage (BDFS)          
Cloudera Impala  

Source connections only. You cannot use this connection as a storage target.

Supported encryption
(Optional) SSL certificate or SSL certificate file.
Restriction: This option is only available from the Connections page.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Dropbox      
Credentials
You must have your access token.

To obtain the access token, follow the instructions in the Dropbox OAuth guide.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

File system        
Important: This type of connection is not recommended for use in IBM Cloud Pak for Data.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

FTP enterprise          
FTP (remote file system transfer)    

You can create a connection to a remote file system to access files that reside on the remote system. For information about supported files, see Data files.

Connection mode
Use the appropriate connection method based on the configuration of the FTP server:
  • Anonymous
  • Basic authentication (with a username and password)
  • SSL
  • SSH

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Generic JDBC    

You must upload the JDBC JAR files when you create the connection.

Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Generic Service           This connection type is intended for use only with the connections API.
Google BigQuery
(Big Query)
 
Credentials
You must enter either the Credentials (The contents of the Google service account key JSON file) or the Credentials file path (The path of the Google service account file).

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Google Cloud Storage    
Credentials
You must enter either the Credentials (The contents of the Google service account key JSON file) or the Credentials file path (The path of the Google service account file).
Supported authentication for Data Virtualization
Alternatively, you can use an access token.

To obtain the access token, follow the instructions in the Google BigQuery documentation.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

HDFS via File connector         The File connector uses the WebHDFS API or the HttpFS API to connect to HDFS.
HDFS - CDH           This connection type is deprecated.
HDFS - HDP           This connection type is deprecated.
Hive JDBC      
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.
Hive JDBC - CDH        
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.
Hive JDBC - HDP        
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.
Looker      

Source connections only. You cannot use this connection as a storage target.

Credentials
You must have your client ID and your client secret.

Before you configure the connection, set up API3 credentials for your Looker instance. For details, see Looker API Authentication.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

MariaDB        
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.
Microsoft Azure (Blob and File)          
Microsoft Azure Data Lake Store
(Azure Data Lake)
   
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Restriction: This option is only available from the Connections page.
Credentials
You must have your client ID, tenant ID, and client secret.

Before you configure the connection, you must create an Azure Active Directory (Azure AD) web application, get an application ID, authentication key, and a tenant ID. Then you must assign the Azure AD application to the Azure Data Lake Store account file or folder. Follow Steps 1, 2, and 3 at Service-to-service authentication with Data Lake Store using Azure Active Directory.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Microsoft Azure SQL Database
(Azure SQL)
     
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Microsoft SQL Server
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Restriction: This option is only available from the Connections page.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Minio      
Credentials
You must have your access key and your secret key.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Mongo        
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.
MySQL
(My SQL Community Edition)
(My SQL Enterprise Edition)
 
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

ODBC          
OData      
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
Select the appropriate authentication method based on the configuration of the data source:
  • API key requires your API key.
  • Basic requires your user name and password.
  • None requires no credentials.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Oracle  
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Pivotal Greenplum
(Greenplum)
   
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Restriction: This option is only available from the Connections page.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

PostgreSQL    
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Restriction: This option is only available from the Connections page.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Salesforce.com    

Source connections only. You cannot use this connection as a storage target.

Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

SAP HANA        
Credentials
User name and password.
SAP OData    
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
Select the appropriate authentication method based on the configuration of the data source:
  • API key requires your API key.
  • Basic requires your user name and password.
  • None requires no credentials.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Snowflake   ✓*
Credentials
User name and password.

* See Watson Knowledge Catalog known issues.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Sybase  

Source connections only. You cannot use this connection as a storage target.

Supported encryption
(Optional) SSL certificate or SSL certificate file.
Restriction: This option is only available from the Connections page.
Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Sybase IQ    

Source connections only. You cannot use this connection as a storage target.

Credentials
User name and password.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Tableau      

Source connections only. You cannot use this connection as a storage target.

Credentials
User name and password for the site that you want to connect to.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Teradata   ✓*

Teradata JDBC Driver 15.10 Copyright (C) 2015 - 2017 by Teradata. All rights reserved. IBM provides embedded usage of the Teradata JDBC Driver under license from Teradata solely for use as part of the IBM Watson service offering.

Credentials
User name and password.

* To create a connection to a Teradata data source, an administrator must Import a JDBC driver.

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

Other data sources

An administrator can upload JDBC drivers to enable connections to additional data sources. See Importing JDBC drivers for data sources.

The Data Virtualization service supports connections that are established using third-party JDBC drivers.

See the product roadmap at http://ibm.biz/AnalyticsRoadmaps for information about support for additional data sources.

Data files

In addition to using data from remote data sources or integrated database, you can use data from files. You can work with data from the following types of files:

Table 1. Supported data sources by service
Data source Cognos Dashboards DataStage Data Virtualization Watson Knowledge Catalog Watson Studio Notes
CSV files
Data Virtualization
To access CSV files on remote data sources, you must install a remote connector on the data source. See Installing connectors on remote data sources (Data Virtualization)
Watson Studio
You either:
Microsoft Excel spreadsheets    
Data Virtualization
To access spreadsheets on remote data sources, you must install a remote connector on the data source. See Installing connectors on remote data sources (Data Virtualization)
Watson Studio
You either:
Sequential File          
TSV files          

Connecting to data sources (by service)

Use the following resources to create connections in your application:

Service Learn more
Cognos Dashboards You can use the local and remote data sets that already exist in your analytics projects.

Alternatively, you can create connections that can be used in an analytics dashboard by selecting Add data source from the analytics dashboard menu.

Restriction: Analytics dashboards support only JDBC-based connections.

You can also add data from files by selecting Add data set from the analytics dashboard menu.

DataStage You can transform data that is in a catalog by searching for the data that you want to use and selecting Transform.
Alternatively:
  • If a connection type is supported for data discovery and data transformation, you can import a discovered connection on the Connections page of the data transformation project.
    Important: The credentials for the connection are not imported. After you import the connection, you must edit the connection to specify your username and password for the connection.
  • If a connection type is supported only for data transformation, you can create connections from the following locations:
    • The Connections page of the data transformation project
    • The job canvas

To use data from local files, add the file to the job canvas.

For details, see Creating a data transformation job.

Data Virtualization You can create connections that can be used to virtualize data from the following locations:
  • The Connections page
  • The Data Sources page in the Data Virtualization add-on

For details, see Adding data sources (Data Virtualization).

Watson Knowledge Catalog

You can create connections that can be used in the catalog and connections that can be used to curate data.

For connections that can be used in a catalog, you can create connections from the catalog Overview page.

For details, see Adding a connection asset to a catalog (Watson Knowledge Catalog).

For connections that can be used to curate data, you can create connections from the following locations:
  • The Connections page
  • The Organize > Curation > Data discovery page
For details, see:
Watson Studio

Ideally, you should use data that is already in a catalog. Search for the data you want in a catalog and add it to an analytics project.

Alternatively, you can create connections that can be used in analytics projects from the following locations:
  • The Connections page
  • The Assets page of the analytics project

You can also add data from files. To add data from files, go to the Assets page of the analytics project.

For details, see Adding data to an analytics project.