Supported data sources

IBM® Cloud Pak for Data lets you connect to your data no matter where it lives.

Ways to connect to your data

The way that you connect to your data depends on several factors, including the services that are installed on Cloud Pak for Data. Some services can use connections that are defined at the platform-level, while other services use connections that are specific to the service.

Use the following table to determine which method is appropriate for your use case.

Method When should I use this method?
Creating connections at the platform level In general, platform-level connections simplify the process of creating and maintaining connections. You create the connection once and services can refer to the connection. Additionally, if you update the connection, the changes are automatically picked up by the analytics projects that use the connection.

You can create platform-level connections from the Platform connections page. These connections that can be used by various services across the platform. However, this page is available only if the Cloud Pak for Data common core services are installed.

You should consider creating connections at the platform level if the following statements are true:

  • The services support platform-level connections
  • The same connection needs to be used by multiple services or instances or across multiple projects.
  • You have the appropriate permissions to create platform-level connections

    You must have the Editor or Admin role on the Platform connections page. For details, see Managing collaborators on platform connections.

Platform connections are visible to all platform users. However, only users with the credentials for the data source can use the connection.

If the type of data source that you want to connect to is not defined, a Cloud Pak for Data administrator can upload the JDBC driver JAR files to enable you to create a generic JDBC connection to the data source.

Not all services support the same types of connections. If you want to use a connection from the Platform connections catalog, the list of connections is filtered based on the types of connections that the service supports. For example, if you are using a connection to add a data source to an analytics project, only connections that are supported for analytics projects are displayed.

Creating connections at the service level You should create connections at the service level, if any of the following statements are true:
  • The service that you are using does not support platform-level connections
  • You don't have the appropriate permissions to create platform-level connections
  • You don't want the connection to be included in the Connections catalog for security reasons
Restriction:

For Watson™ Knowledge Catalog, this table shows the data sources that are supported in catalogs. For the data sources supported by automated discovery and quick scan, see Discovering assets (Watson Knowledge Catalog).

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

IBM data sources

The following table lists the IBM data sources that you can connect to from Cloud Pak for Data.

Connection type Cognos® Analytics Cognos Dashboards (analytics dashboards) DataStage® Data Virtualization Watson Knowledge Catalog Watson Studio Notes
Analytics Engine HDFS        
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Classic Federation            
Cloud Object Storage
(IBM Cloud Storage)
     
Credentials
You must have your API key and resource instance ID.

If you plan to use the S3 API, you must also have your access key and your secret key.

Watson Knowledge Catalog discovery: Not supported.

Cloud Object Storage (infrastructure)        
Credentials
You must have your secret key.

If you plan to use the S3 API , you must also have your access key.

Watson Knowledge Catalog discovery: Not supported.

Cloudant®        
Credentials
Username and password.
When you create your Cloudant service on IBM Cloud, you must choose Use both legacy credentials and IAM for the Available authentication methods.

Watson Knowledge Catalog discovery: Not supported.

Cognos Analytics        

Source connections only. You cannot use this connection as a storage target.

Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.
or
Cloud Pak for Data credentials
Anonymous access also supported.

Watson Knowledge Catalog discovery: Not supported.

Compose for MySQL        
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Data Set            
Data Virtualization    

You can point to an instance of Data Virtualization on the same instance of Cloud Pak for Data or on a different instance of Cloud Pak for Data.

Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.
or
Cloud Pak for Data credentials

Watson Knowledge Catalog discovery: Not supported.

IBM Data Virtualization Manager for z/OS®      
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Databases for PostgreSQL      
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Db2®    
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.
or
Cloud Pak for Data credentials

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Supported (see Discovering assets).

Db2 Big SQL
(Big SQL)
     
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Not supported.

Db2 Event Store
(Event Store)
     
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Db2 for i      
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Db2 for z/OS    
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Not supported.

Db2 Hosted        
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Db2 on Cloud    
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Db2 Warehouse  
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported (see Discovering assets).

Distributed Transaction            
DRS            
External SourceExecution Engine for Hadoop            
External Target            
HDFS via Execution Engine for Hadoop        

Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service.

This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Supported encryption
SSL certificate.
Credentials
Cloud Pak for Data credentials

Watson Knowledge Catalog discovery: Not supported.

Hierarchical            
Hive via Execution Engine for Hadoop        

Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service.

You also need the HiveJDBC41.jar file, which you can download from the Cloudera website (Click GET IT NOW, and then unzip the downloaded file.)

This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Supported encryption
SSL certificate.
Credentials
Cloud Pak for Data credentials

Replaces the Hive JDBC - HDP connection.

Watson Knowledge Catalog discovery: Not supported.

Impala via Execution Engine for Hadoop        

Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service.

You also need the ImpalaJDBC41.jar file, which you can download from the Cloudera website (Click GET IT NOW, and then unzip the downloaded file.)

This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Supported encryption
(Optional) SSL certificate.
Credentials
Cloud Pak for Data credentials

Watson Knowledge Catalog discovery: Not supported.

Informix®    
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Informix Enterprise            
Informix Load            
ISD Input            
ISD Output            
Java Integration            
Lookup File Set            
Netezza®          
Supported encryption
(Optional) SSL certificate or SSL certificate file.
Credentials
Username and password.
Planning Analytics        
Supported encryption
(Optional) SSL certificate.
Credentials
  • Basic
  • CAM Credentials
  • CAM Password
  • Windows Integrated Authentication Token
Restriction: You can connect only to an external Planning Analytics TM1 server. You cannot connect to an instance of Planning Analytics that is deployed on Cloud Pak for Data

Watson Knowledge Catalog discovery: Not supported.

Netezza (PureData® System for Analytics)    
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

SPSS® Analytic Server        
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Storage volume        
Credentials
Cloud Pak for Data credentials

Watson Knowledge Catalog discovery: Not supported.

WebSphere® MQ            

Third-party data sources

Connection type Cognos Dashboards DataStage Data Virtualization Watson Knowledge Catalog Watson Studio Notes
Amazon RDS for MySQL      
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Amazon RDS for PostgreSQL      
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Amazon Redshift
(Redshift)
 
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Not supported.

Amazon S3    
Credentials
You must have your access key and secret key.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Apache Cassandra    
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Apache Derby
(Derby)
   
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Not supported.

Apache Hbase          
Apache HDFS    
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Apache Hive  

Source connections only. You cannot use this connection as a storage target.

Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Apache Kafka          
Big Data File Stage (BDFS)          
Box      
Credentials
  • Private Key
  • Private Key Password
  • Public Key

Watson Knowledge Catalog discovery: Not supported.

Cloudera Impala  

Source connections only. You cannot use this connection as a storage target.

Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Dropbox      
Credentials
You must have your access token.

To obtain the access token, follow the instructions in the Dropbox OAuth guide.

Watson Knowledge Catalog discovery: Not supported.

Elasticsearch      
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.
Anonymous access also supported.

Watson Knowledge Catalog discovery: Not supported.

File system        
Important: This type of connection is not recommended for use in IBM Cloud Pak for Data.
FTP enterprise          
FTP (remote file system transfer)    

You can create a connection to a remote file system to access files that reside on the remote system. For information about supported files, see Data files.

Connection mode
Use the appropriate connection method based on the configuration of the FTP server:
  • Anonymous
  • Basic authentication (with a username and password)
  • SSL
  • SSH

Watson Knowledge Catalog discovery: Not supported.

Generic JDBC  

You must upload the JDBC JAR files when you create the connection.

Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.
Generic Service           This connection type is intended for use only with the connections API.
Google BigQuery
(Big Query)
 
Credentials
You must enter either the Credentials (The contents of the Google service account key JSON file) or the Credentials file path (The path of the Google service account file).
Google Cloud Storagederby    
Credentials
You must enter either the Credentials (The contents of the Google service account key JSON file) or the Credentials file path (The path of the Google service account file).
Supported authentication for Data Virtualization
Alternatively, you can use an access token.

To obtain the access token, follow the instructions in the Google BigQuery documentation.

Watson Knowledge Catalog discovery: Not supported.

HDFS via File connector         The File connector uses the WebHDFS API or the HttpFS API to connect to HDFS.
HDFS - CDH           This connection type is deprecated.
HDFS - HDP           This connection type is deprecated.
Hive JDBC      
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.
Hive JDBC - CDH        
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.
Hive JDBC - HDP        
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.
HTTP      

Watson Knowledge Catalog discovery: Not supported.

Looker      

Source connections only. You cannot use this connection as a storage target.

Credentials
You must have your client ID and your client secret.

Before you configure the connection, set up API3 credentials for your Looker instance. For details, see Looker API Authentication.

Watson Knowledge Catalog discovery: Not supported.

MariaDB    
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Not supported.

Microsoft Azure (Blob and File)          
Microsoft Azure Blob Storage      
Credentials
Authentication is managed by the Azure portal access keys.

Watson Knowledge Catalog discovery: Not supported.

Microsoft Azure Cosmos DB      
Credentials
Azure Cosmos DB master key

Watson Knowledge Catalog discovery: Not supported.

Microsoft Azure Data Lake Store
(Azure Data Lake)
   
Supported encryption
SSL is implicit in the URL prefix https.
Credentials
Authentication is handled by the tenant ID, client (or application) ID, and the client secret.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Microsoft Azure SQL Database
(Azure SQL)
     
Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Microsoft SQL Server
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Supported (see Discovering assets).

MinIO      
Credentials
You must have your access key and your secret key.

Watson Knowledge Catalog discovery: Not supported.

MongoDB    
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported (see Discovering assets).

MySQL
(My SQL Community Edition)
(My SQL Enterprise Edition)
 
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

ODBC          
OData      
Supported encryption
(Optional) SSL certificate.
Credentials
Select the appropriate authentication method based on the configuration of the data source:
  • API key requires your API key.
  • Basic requires your username and password.
  • None requires no credentials.

Watson Knowledge Catalog discovery: Not supported.

Oracle  
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Supported (see Discovering assets).

Pivotal Greenplum
(Greenplum)
 
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

PostgreSQL    
Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Salesforce.com  

Source connections only. You cannot use this connection as a storage target.

Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

SAP HANA    
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

SAP OData  
Supported encryption
(Optional) SSL certificate.
Credentials
Select the appropriate authentication method based on the configuration of the data source:
  • API key requires your API key.
  • Basic requires your username and password.
  • None requires no credentials.

Watson Knowledge Catalog discovery: Not supported.

Snowflake  
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Sybase  

Source connections only. You cannot use this connection as a storage target.

Supported encryption
(Optional) SSL certificate.
Credentials
Username and password.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Sybase IQ    

Source connections only. You cannot use this connection as a storage target.

Credentials
Username and password.

Watson Knowledge Catalog discovery: Not supported.

Tableau      

Source connections only. You cannot use this connection as a storage target.

Credentials
Username and password for the site that you want to connect to.

Watson Knowledge Catalog discovery: Not supported.

Teradata  

Teradata JDBC Driver 15.10 Copyright (C) 2015 - 2017 by Teradata. All rights reserved. IBM provides embedded usage of the Teradata JDBC Driver under license from Teradata solely for use as part of the IBM Watson service offering.

Credentials
Username and password.

For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.

Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Other data sources

An administrator can upload JDBC drivers to enable connections to additional data sources. See Importing JBDC drivers for data sources .

The Data Virtualization service supports connections that are established using third-party JDBC drivers.

See the product roadmap at http://ibm.biz/AnalyticsRoadmaps for information about support for additional data sources.

Data files

In addition to using data from remote data sources or integrated databases, you can use data from files. You can work with data from the following types of files:

Table 1. Supported data sources by service
Data source Cognos Dashboards DataStage Data Virtualization Watson Knowledge Catalog Watson Studio Notes
Complex flat files          
CSV files
Data Virtualization
To access CSV files on remote data sources, you must install a remote connector on the data source. See Installing connectors on remote data sources (Data Virtualization)
Watson Studio
You either:
Microsoft Excel spreadsheets    
Data Virtualization
To access spreadsheets on remote data sources, you must install a remote connector on the data source. See Installing connectors on remote data sources (Data Virtualization)
Watson Studio
You either:
Sequential File          
TSV files      
z/OS files          

Connecting to data sources (by service)

Use the following resources to create connections in your application:

Service Learn more
Cognos Dashboards You can use the local and remote data sets that already exist in your analytics projects.

Alternatively, you can create connections that can be used in an analytics dashboard by selecting Add data source from the analytics dashboard menu.

Restriction: Analytics dashboards support only JDBC-based connections.

You can also add data from files by selecting Add data set from the analytics dashboard menu.

DataStage You can transform data that is in a catalog by searching for the data that you want to use and selecting Transform.
Alternatively:
  • If a connection type is supported for data discovery and data transformation, you can import a discovered connection on the Connections page of the data transformation project.
    Important: The credentials for the connection are not imported. After you import the connection, you must edit the connection to specify your username and password for the connection.
  • If a connection type is supported only for data transformation, you can create connections from the following locations:
    • The Connections page of the data transformation project
    • The job canvas

To use data from local files, add the file to the job canvas.

For details, see Creating a data transformation job.

Data Virtualization You can create connections that can be used to virtualize data from the following locations:
  • The Connections page
  • The Data Sources page in the Data Virtualization service.

For details, see Adding data sources (Data Virtualization).

Watson Knowledge Catalog

You can create connections that can be used in the catalog and connections that can be used to curate data.

For connections that can be used in a catalog, you can create connections from the catalog Overview page.

For details, see Adding a connection asset to a catalog (Watson Knowledge Catalog).

For connections that can be used to curate data, you can create connections from the following locations:
  • The Platform connections page
  • The Governance > Data discovery page when you create a new discovery job
For details, see:
Watson Studio

Ideally, you should use data that is already in a catalog. Search for the data you want in a catalog and add it to an analytics project.

Alternatively, you can create connections that can be used in analytics projects from the following locations:
  • The Connections page
  • The Assets page of the analytics project

You can also add data from files. To add data from files, go to the Assets page of the analytics project.

For details, see Adding data to an analytics project.

Learn more