Supported data sources

IBM® Cloud Pak for Data lets you connect to your data no matter where it lives.

Ways to connect to your data

The way that you connect to your data depends on several factors, including the services that are installed on Cloud Pak for Data. Some services can use connections that are defined at the platform-level, while other services use connections that are specific to the service.

Use the following table to determine which method is appropriate for your use case.

Method	When should I use this method?
Creating connections at the platform level	In general, platform-level connections simplify the process of creating and maintaining connections. You create the connection once and services can refer to the connection. Additionally, if you update the connection, the changes are automatically picked up by the analytics projects that use the connection. You can create platform-level connections from the Platform connections page. These connections that can be used by various services across the platform. However, this page is available only if the Cloud Pak for Data common core services are installed. You should consider creating connections at the platform level if the following statements are true: The services support platform-level connections The same connection needs to be used by multiple services or instances or across multiple projects. You have the appropriate permissions to create platform-level connections You must have the Editor or Admin role on the Platform connections page. For details, see Managing collaborators on platform connections. Platform connections are visible to all platform users. However, only users with the credentials for the data source can use the connection. If the type of data source that you want to connect to is not defined, a Cloud Pak for Data administrator can upload the JDBC driver JAR files to enable you to create a generic JDBC connection to the data source. Not all services support the same types of connections. If you want to use a connection from the Platform connections catalog, the list of connections is filtered based on the types of connections that the service supports. For example, if you are using a connection to add a data source to an analytics project, only connections that are supported for analytics projects are displayed.
Creating connections at the service level	You should create connections at the service level, if any of the following statements are true: The service that you are using does not support platform-level connections You don't have the appropriate permissions to create platform-level connections You don't want the connection to be included in the Connections catalog for security reasons

Method

When should I use this method?

Creating connections at the platform level

In general, platform-level connections simplify the process of creating and maintaining connections. You create the connection once and services can refer to the connection. Additionally, if you update the connection, the changes are automatically picked up by the analytics projects that use the connection.

You can create platform-level connections from the Platform connections page. These connections that can be used by various services across the platform. However, this page is available only if the Cloud Pak for Data common core services are installed.

You should consider creating connections at the platform level if the following statements are true:

The services support platform-level connections
The same connection needs to be used by multiple services or instances or across multiple projects.
You have the appropriate permissions to create platform-level connections
You must have the Editor or Admin role on the Platform connections page. For details, see Managing collaborators on platform connections.

Platform connections are visible to all platform users. However, only users with the credentials for the data source can use the connection.

If the type of data source that you want to connect to is not defined, a Cloud Pak for Data administrator can upload the JDBC driver JAR files to enable you to create a generic JDBC connection to the data source.

Not all services support the same types of connections. If you want to use a connection from the Platform connections catalog, the list of connections is filtered based on the types of connections that the service supports. For example, if you are using a connection to add a data source to an analytics project, only connections that are supported for analytics projects are displayed.

Creating connections at the service level

You should create connections at the service level, if any of the following statements are true:

The service that you are using does not support platform-level connections
You don't have the appropriate permissions to create platform-level connections
You don't want the connection to be included in the Connections catalog for security reasons

Restriction:

For Watson™ Knowledge Catalog, this table shows the data sources that are supported in catalogs. For the data sources supported by automated discovery and quick scan, see Discovering assets (Watson Knowledge Catalog).

For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.

IBM data sources

The following table lists the IBM data sources that you can connect to from Cloud Pak for Data.

Connection type	Cognos Dashboards (analytics dashboards)	DataStage®	Data Virtualization	Watson Knowledge Catalog	Watson Studio	Notes
Analytics Engine HDFS				✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Classic Federation		✓
Cloud Object Storage (IBM Cloud Storage)		✓		✓	✓	Credentials You must have your API key and resource instance ID. If you plan to use the S3 API, you must also have your access key and your secret key. Watson Knowledge Catalog discovery: Not supported.
Cloud Object Storage (infrastructure)				✓	✓	Credentials You must have your secret key. If you plan to use the S3 API , you must also have your access key. Watson Knowledge Catalog discovery: Not supported.
Cloudant®				✓	✓	Credentials Username and password. When you create your Cloudant service on IBM Cloud, you must choose Use both legacy credentials and IAM for the Available authentication methods. Watson Knowledge Catalog discovery: Not supported.
Cognos Analytics				✓	✓	Source connections only. You cannot use this connection as a storage target. Supported encryption (Optional) SSL certificate. Credentials Username and password. or Cloud Pak for Data credentials Anonymous access also supported. Watson Knowledge Catalog discovery: Not supported.
Compose for MySQL				✓	✓	Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Data Set		✓
Data Virtualization	✓	✓		✓	✓	You can point to an instance of Data Virtualization on the same instance of Cloud Pak for Data or on a different instance of Cloud Pak for Data. Supported encryption (Optional) SSL certificate. Credentials Username and password. or Cloud Pak for Data credentials Watson Knowledge Catalog discovery: Not supported.
IBM Data Virtualization Manager for z/OS®			✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
Databases for PostgreSQL	✓			✓	✓	Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Db2®		✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. or Cloud Pak for Data credentials For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported (see Discovering assets).
Db2 Big SQL (Big SQL)			✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported.
Db2 Event Store (Event Store)			✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Db2 for i			✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Db2 for z/OS		✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported.
Db2 Hosted				✓	✓	Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Db2 on Cloud		✓	✓	✓	✓	Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Db2 Warehouse	✓	✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Supported (see Discovering assets).
Distributed Transaction		✓
DRS		✓
External SourceExecution Engine for Hadoop		✓
External Target		✓
HDFS via Execution Engine for Hadoop				✓	✓	Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service. This connection supports connecting to a Hadoop environment that is secured by Kerberos. Supported encryption SSL certificate. Credentials Cloud Pak for Data credentials Watson Knowledge Catalog discovery: Not supported.
Hierarchical		✓
Hive via Execution Engine for Hadoop				✓	✓	Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service. You also need the HiveJDBC41.jar file, which you can download from the Cloudera website (Click GET IT NOW, and then unzip the downloaded file.) This connection supports connecting to a Hadoop environment that is secured by Kerberos. Supported encryption SSL certificate. Credentials Cloud Pak for Data credentials Replaces the Hive JDBC - HDP connection. Watson Knowledge Catalog discovery: Not supported.
Impala via Execution Engine for Hadoop				✓	✓	Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service. You also need the ImpalaJDBC41.jar file, which you can download from the Cloudera website (Click GET IT NOW, and then unzip the downloaded file.) This connection supports connecting to a Hadoop environment that is secured by Kerberos. Supported encryption (Optional) SSL certificate. Credentials Cloud Pak for Data credentials Watson Knowledge Catalog discovery: Not supported.
Informix®		✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Informix Enterprise		✓
Informix Load		✓
ISD Input		✓
ISD Output		✓
Java Integration		✓
Lookup File Set		✓
Netezza®		✓				Supported encryption (Optional) SSL certificate or SSL certificate file. Credentials Username and password.
Planning Analytics				✓	✓	Supported encryption (Optional) SSL certificate. Credentials Basic CAM Credentials CAM Password Windows Integrated Authentication Token Restriction: You can connect only to an external Planning Analytics TM1 server. You cannot connect to an instance of Planning Analytics that is deployed on Cloud Pak for Data Watson Knowledge Catalog discovery: Not supported.
Netezza (PureData® System for Analytics)	✓		✓	✓	✓	Credentials Username and password. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source.
SPSS® Analytic Server				✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Storage volume				✓	✓	Credentials Cloud Pak for Data credentials Watson Knowledge Catalog discovery: Not supported.
WebSphere® MQ		✓

Third-party data sources

Connection type	Cognos Dashboards	DataStage	Data Virtualization	Watson Knowledge Catalog	Watson Studio	Notes
Amazon RDS for MySQL				✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Amazon RDS for PostgreSQL				✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Amazon Redshift (Redshift)		✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported.
Amazon S3		✓		✓	✓	Credentials You must have your access key and secret key. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
Apache Cassandra		✓		✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
Apache Derby (Derby)			✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported.
Apache Hbase		✓
Apache HDFS		✓		✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
Apache Hive		✓	✓	✓	✓	Source connections only. You cannot use this connection as a storage target. Supported encryption (Optional) SSL certificate. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
Apache Kafka		✓
Big Data File Stage (BDFS)		✓
Box				✓	✓	Credentials Private Key Private Key Password Public Key Watson Knowledge Catalog discovery: Not supported.
Cloudera Impala		✓	✓	✓	✓	Source connections only. You cannot use this connection as a storage target. Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Dropbox				✓	✓	Credentials You must have your access token. To obtain the access token, follow the instructions in the Dropbox OAuth guide. Watson Knowledge Catalog discovery: Not supported.
Elasticsearch				✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Anonymous access also supported. Watson Knowledge Catalog discovery: Not supported.
File system		✓				Important: This type of connection is not recommended for use in IBM Cloud Pak for Data.
FTP enterprise		✓
FTP (remote file system transfer)		✓		✓	✓	You can create a connection to a remote file system to access files that reside on the remote system. For information about supported files, see Data files. Connection mode Use the appropriate connection method based on the configuration of the FTP server: Anonymous Basic authentication (with a username and password) SSL SSH Watson Knowledge Catalog discovery: Not supported.
Generic JDBC		✓	✓	✓	✓	You must upload the JDBC JAR files when you create the connection. Supported encryption (Optional) SSL certificate. Credentials Username and password.
Generic Service						This connection type is intended for use only with the connections API.
Google BigQuery (Big Query)		✓	✓	✓	✓	Credentials You must enter either the Credentials (The contents of the Google service account key JSON file) or the Credentials file path (The path of the Google service account file).
Google Cloud Storagederby		✓		✓	✓	Credentials You must enter either the Credentials (The contents of the Google service account key JSON file) or the Credentials file path (The path of the Google service account file). Supported authentication for Data Virtualization Alternatively, you can use an access token. To obtain the access token, follow the instructions in the Google BigQuery documentation. Watson Knowledge Catalog discovery: Not supported.
HDFS via File connector		✓				The File connector uses the WebHDFS API or the HttpFS API to connect to HDFS.
HDFS - CDH						This connection type is deprecated.
HDFS - HDP						This connection type is deprecated.
Hive JDBC		✓	✓			Supported encryption (Optional) SSL certificate. Credentials Username and password.
Hive JDBC - CDH		✓				Supported encryption (Optional) SSL certificate. Credentials Username and password.
Hive JDBC - HDP		✓				Supported encryption (Optional) SSL certificate. Credentials Username and password.
HTTP				✓	✓	Watson Knowledge Catalog discovery: Not supported.
Looker				✓	✓	Source connections only. You cannot use this connection as a storage target. Credentials You must have your client ID and your client secret. Before you configure the connection, set up API3 credentials for your Looker instance. For details, see Looker API Authentication. Watson Knowledge Catalog discovery: Not supported.
MariaDB			✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported.
Microsoft Azure (Blob and File)		✓
Microsoft Azure Blob Storage				✓	✓	Credentials Authentication is managed by the Azure portal access keys. Watson Knowledge Catalog discovery: Not supported.
Microsoft Azure Cosmos DB				✓	✓	Credentials Azure Cosmos DB master key Watson Knowledge Catalog discovery: Not supported.
Microsoft Azure Data Lake Store (Azure Data Lake)		✓		✓	✓	Supported encryption SSL is implicit in the URL prefix `https`. Credentials Authentication is handled by the tenant ID, client (or application) ID, and the client secret. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
Microsoft Azure SQL Database (Azure SQL)				✓	✓	Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Microsoft SQL Server	✓	✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported (see Discovering assets).
MinIO				✓	✓	Credentials You must have your access key and your secret key. Watson Knowledge Catalog discovery: Not supported.
MongoDB			✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Supported (see Discovering assets).
MySQL (My SQL Community Edition) (My SQL Enterprise Edition)		✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
ODBC		✓
OData				✓	✓	Supported encryption (Optional) SSL certificate. Credentials Select the appropriate authentication method based on the configuration of the data source: API key requires your API key. Basic requires your username and password. None requires no credentials. Watson Knowledge Catalog discovery: Not supported.
Oracle		✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported (see Discovering assets).
Pivotal Greenplum (Greenplum)		✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
PostgreSQL			✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
Salesforce.com		✓	✓	✓	✓	Source connections only. You cannot use this connection as a storage target. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
SAP HANA			✓	✓	✓	Credentials Username and password. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
SAP OData		✓	✓	✓	✓	Supported encryption (Optional) SSL certificate. Credentials Select the appropriate authentication method based on the configuration of the data source: API key requires your API key. Basic requires your username and password. None requires no credentials. Watson Knowledge Catalog discovery: Not supported.
Snowflake		✓	✓	✓	✓	Credentials Username and password. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
Sybase		✓	✓	✓	✓	Source connections only. You cannot use this connection as a storage target. Supported encryption (Optional) SSL certificate. Credentials Username and password. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).
Sybase IQ		✓		✓	✓	Source connections only. You cannot use this connection as a storage target. Credentials Username and password. Watson Knowledge Catalog discovery: Not supported.
Tableau				✓	✓	Source connections only. You cannot use this connection as a storage target. Credentials Username and password for the site that you want to connect to. Watson Knowledge Catalog discovery: Not supported.
Teradata		✓	✓	✓	✓	Teradata JDBC Driver 15.10 Copyright (C) 2015 - 2017 by Teradata. All rights reserved. IBM provides embedded usage of the Teradata JDBC Driver under license from Teradata solely for use as part of the IBM Watson service offering. Credentials Username and password. For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets).

Other data sources

An administrator can upload JDBC drivers to enable connections to additional data sources. See Importing JBDC drivers for data sources .

The Data Virtualization service supports connections that are established using third-party JDBC drivers.

See the product roadmap at http://ibm.biz/AnalyticsRoadmaps for information about support for additional data sources.

Data files

In addition to using data from remote data sources or integrated databases, you can use data from files. You can work with data from the following types of files:

Table 1. Supported data sources by service
Data source	Cognos Dashboards	DataStage	Data Virtualization	Watson Knowledge Catalog	Watson Studio	Notes
Complex flat files		✓
CSV files	✓	✓	✓	✓	✓	Data Virtualization To access CSV files on remote data sources, you must install a remote connector on the data source. See Installing connectors on remote data sources (Data Virtualization) Watson Studio You either: Add local files Create a connection to an FTP server
Microsoft Excel spreadsheets			✓	✓	✓	Data Virtualization To access spreadsheets on remote data sources, you must install a remote connector on the data source. See Installing connectors on remote data sources (Data Virtualization) Watson Studio You either: Add local files Create a connection to an FTP server
Sequential File		✓
TSV files			✓	✓	✓
z/OS files		✓

Connecting to data sources (by service)

Use the following resources to create connections in your application:

Service	Learn more
Cognos Dashboards	You can use the local and remote data sets that already exist in your analytics projects. Alternatively, you can create connections that can be used in an analytics dashboard by selecting Add data source from the analytics dashboard menu. Restriction: Analytics dashboards support only JDBC-based connections. You can also add data from files by selecting Add data set from the analytics dashboard menu.
DataStage	You can transform data that is in a catalog by searching for the data that you want to use and selecting Transform. Alternatively: If a connection type is supported for data discovery and data transformation, you can import a discovered connection on the Connections page of the data transformation project. Important: The credentials for the connection are not imported. After you import the connection, you must edit the connection to specify your username and password for the connection. If a connection type is supported only for data transformation, you can create connections from the following locations: The Connections page of the data transformation project The job canvas To use data from local files, add the file to the job canvas. For details, see Creating a data transformation job.
Data Virtualization	You can create connections that can be used to virtualize data from the following locations: The Connections page The Data Sources page in the Data Virtualization service. For details, see Adding data sources (Data Virtualization).
Watson Knowledge Catalog	You can create connections that can be used in the catalog and connections that can be used to curate data. For connections that can be used in a catalog, you can create connections from the catalog Overview page. For details, see Adding a connection asset to a catalog (Watson Knowledge Catalog). For connections that can be used to curate data, you can create connections from the following locations: The Platform connections page The Governance > Data discovery page when you create a new discovery job For details, see: Running automated discovery Running a quick scan
Watson Studio	Ideally, you should use data that is already in a catalog. Search for the data you want in a catalog and add it to an analytics project. Alternatively, you can create connections that can be used in analytics projects from the following locations: The Connections page The Assets page of the analytics project You can also add data from files. To add data from files, go to the Assets page of the analytics project. For details, see Adding data to an analytics project.