Supported data sources
IBM® Cloud Pak for Data lets you connect to your data no matter where it lives.
Ways to connect to your data
The way that you connect to your data depends on several factors, including the services that are installed on Cloud Pak for Data. Some services can use connections that are defined at the platform-level, while other services use connections that are specific to the service.
Use the following table to determine which method is appropriate for your use case.
Method | When should I use this method? |
---|---|
Creating connections at the platform level | In general, platform-level connections simplify the process of creating and maintaining
connections. You create the connection once and services can refer to the connection. Additionally,
if you update the connection, the changes are automatically picked up by the analytics projects that
use the connection. You can create platform-level connections from the Platform connections page. These connections that can be used by various services across the platform. However, this page is available only if the Cloud Pak for Data common core services are installed. You should consider creating connections at the platform level if the following statements are true:
Platform connections are visible to all platform users. However, only users with the credentials for the data source can use the connection. If the type of data source that you want to connect to is not defined, a Cloud Pak for Data administrator can upload the JDBC driver JAR files to enable you to create a generic JDBC connection to the data source. Not all services support the same types of connections. If you want to use a connection from the Platform connections catalog, the list of connections is filtered based on the types of connections that the service supports. For example, if you are using a connection to add a data source to an analytics project, only connections that are supported for analytics projects are displayed. |
Creating connections at the service level | You should create connections at the service level, if any of the following statements are true:
|
For Watson™ Knowledge Catalog, this table shows the data sources that are supported in catalogs. For the data sources supported by automated discovery and quick scan, see Discovering assets (Watson Knowledge Catalog).
For more information on connections supported for the synchronization of information assets to the default catalog, see Information assets view.
IBM data sources
The following table lists the IBM data sources that you can connect to from Cloud Pak for Data.
Connection type | Cognos® Analytics | Cognos Dashboards (analytics dashboards) | DataStage® | Data Virtualization | Watson Knowledge Catalog | Watson Studio | Notes |
---|---|---|---|---|---|---|---|
Analytics Engine HDFS | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||||
Classic Federation | ✓ | ||||||
Cloud Object Storage (IBM Cloud Storage)
|
✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Cloud Object Storage (infrastructure) | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||||
Cloudant® | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||||
Cognos Analytics | ✓ | ✓ |
Source connections only. You cannot use this connection as a storage target.
Watson Knowledge Catalog discovery: Not supported. |
||||
Compose for MySQL | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||||
Data Set | ✓ | ||||||
Data Virtualization | ✓ | ✓ | ✓ | ✓ |
You can point to an instance of Data Virtualization on the same instance of Cloud Pak for Data or on a different instance of Cloud Pak for Data.
Watson Knowledge Catalog discovery: Not supported. |
||
IBM Data Virtualization Manager for z/OS® | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
|||
Databases for PostgreSQL | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Db2® | ✓ | ✓ | ✓ | ✓ |
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported (see Discovering assets). |
||
Db2 Big SQL (Big SQL)
|
✓ | ✓ | ✓ |
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported. |
|||
Db2 Event Store (Event Store)
|
✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Db2 for i | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Db2 for z/OS | ✓ | ✓ | ✓ | ✓ |
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported. |
||
Db2 Hosted | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||||
Db2 on Cloud | ✓ | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||
Db2 Warehouse | ✓ | ✓ | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported (see Discovering assets). |
|
Distributed Transaction | ✓ | ||||||
DRS | ✓ | ||||||
External SourceExecution Engine for Hadoop | ✓ | ||||||
External Target | ✓ | ||||||
HDFS via Execution Engine for Hadoop | ✓ | ✓ | Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service. This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Watson Knowledge Catalog discovery: Not supported. |
||||
Hierarchical | ✓ | ||||||
Hive via Execution Engine for Hadoop | ✓ | ✓ | Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service. You also need the HiveJDBC41.jar file, which you can download from the Cloudera website (Click GET IT NOW, and then unzip the downloaded file.) This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Replaces the Hive JDBC - HDP connection. Watson Knowledge Catalog discovery: Not supported. |
||||
Impala via Execution Engine for Hadoop | ✓ | ✓ | Requires the Execution Engine for Apache Hadoop service. See Installing the Execution Engine for Apache Hadoop service. You also need the ImpalaJDBC41.jar file, which you can download from the Cloudera website (Click GET IT NOW, and then unzip the downloaded file.) This connection supports connecting to a Hadoop environment that is secured by Kerberos.
Watson Knowledge Catalog discovery: Not supported. |
||||
Informix® | ✓ | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||
Informix Enterprise | ✓ | ||||||
Informix Load | ✓ | ||||||
ISD Input | ✓ | ||||||
ISD Output | ✓ | ||||||
Java Integration | ✓ | ||||||
Lookup File Set | ✓ | ||||||
Netezza® | ✓ |
|
|||||
Planning Analytics | ✓ | ✓ |
Restriction: You can connect only to an external Planning Analytics TM1 server. You cannot connect to an instance of
Planning Analytics that is deployed on Cloud Pak for Data
Watson Knowledge Catalog discovery: Not supported. |
||||
Netezza (PureData® System for Analytics) | ✓ | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. |
||
SPSS® Analytic Server | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||||
Storage volume | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||||
WebSphere® MQ | ✓ |
Third-party data sources
Connection type | Cognos Dashboards | DataStage | Data Virtualization | Watson Knowledge Catalog | Watson Studio | Notes |
---|---|---|---|---|---|---|
Amazon RDS for MySQL | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Amazon RDS for PostgreSQL | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Amazon Redshift (Redshift)
|
✓ | ✓ | ✓ | ✓ |
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported. |
|
Amazon S3 | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
||
Apache Cassandra | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
||
Apache Derby (Derby)
|
✓ | ✓ | ✓ |
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported. |
||
Apache Hbase | ✓ | |||||
Apache HDFS | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
||
Apache Hive | ✓ | ✓ | ✓ | ✓ |
Source connections only. You cannot use this connection as a storage target.
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
|
Apache Kafka | ✓ | |||||
Big Data File Stage (BDFS) | ✓ | |||||
Box | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Cloudera Impala | ✓ | ✓ | ✓ | ✓ |
Source connections only. You cannot use this connection as a storage target.
Watson Knowledge Catalog discovery: Not supported. |
|
Dropbox | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Elasticsearch | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
File system | ✓ |
Important: This type of connection is not recommended for use in IBM Cloud Pak for Data.
|
||||
FTP enterprise | ✓ | |||||
FTP (remote file system transfer) | ✓ | ✓ | ✓ |
You can create a connection to a remote file system to access files that reside on the remote system. For information about supported files, see Data files.
Watson Knowledge Catalog discovery: Not supported. |
||
Generic JDBC | ✓ | ✓ | ✓ | ✓ |
You must upload the JDBC JAR files when you create the connection.
|
|
Generic Service | This connection type is intended for use only with the connections API. | |||||
Google BigQuery (Big Query)
|
✓ | ✓ | ✓ | ✓ |
|
|
Google Cloud Storagederby | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
||
HDFS via File connector | ✓ | The File connector uses the WebHDFS API or the HttpFS API to connect to HDFS. | ||||
HDFS - CDH | This connection type is deprecated. | |||||
HDFS - HDP | This connection type is deprecated. | |||||
Hive JDBC | ✓ | ✓ |
|
|||
Hive JDBC - CDH | ✓ |
|
||||
Hive JDBC - HDP | ✓ |
|
||||
HTTP | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Looker | ✓ | ✓ |
Source connections only. You cannot use this connection as a storage target.
Watson Knowledge Catalog discovery: Not supported. |
|||
MariaDB | ✓ | ✓ | ✓ |
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Not supported. |
||
Microsoft Azure (Blob and File) | ✓ | |||||
Microsoft Azure Blob Storage | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Microsoft Azure Cosmos DB | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Microsoft Azure Data Lake Store (Azure Data Lake)
|
✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
||
Microsoft Azure SQL
Database (Azure SQL)
|
✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Microsoft SQL Server | ✓ | ✓ | ✓ | ✓ | ✓ |
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported (see Discovering assets). |
MinIO | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
MongoDB | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported (see Discovering assets). |
||
MySQL (My SQL Community Edition)
(My SQL Enterprise Edition) |
✓ | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
|
ODBC | ✓ | |||||
OData | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|||
Oracle | ✓ | ✓ | ✓ | ✓ |
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported (see Discovering assets). |
|
Pivotal Greenplum (Greenplum)
|
✓ | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
|
PostgreSQL | ✓ | ✓ | ✓ |
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
||
Salesforce.com | ✓ | ✓ | ✓ | ✓ |
Source connections only. You cannot use this connection as a storage target.
Watson Knowledge Catalog discovery: Not supported. |
|
SAP HANA | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
||
SAP OData | ✓ | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Not supported. |
|
Snowflake | ✓ | ✓ | ✓ | ✓ |
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
|
Sybase | ✓ | ✓ | ✓ | ✓ |
Source connections only. You cannot use this connection as a storage target.
Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
|
Sybase IQ | ✓ | ✓ | ✓ |
Source connections only. You cannot use this connection as a storage target.
Watson Knowledge Catalog discovery: Not supported. |
||
Tableau | ✓ | ✓ |
Source connections only. You cannot use this connection as a storage target.
Watson Knowledge Catalog discovery: Not supported. |
|||
Teradata | ✓ | ✓ | ✓ | ✓ |
Teradata JDBC Driver 15.10 Copyright (C) 2015 - 2017 by Teradata. All rights reserved. IBM provides embedded usage of the Teradata JDBC Driver under license from Teradata solely for use as part of the IBM Watson service offering.
For Data Virtualization only: This connection has been optimized to take advantage of the native query capabilities in this data source. Watson Knowledge Catalog discovery: Supported with restrictions (see Discovering assets). |
Other data sources
An administrator can upload JDBC drivers to enable connections to additional data sources. See Importing JBDC drivers for data sources .
The Data Virtualization service supports connections that are established using third-party JDBC drivers.
See the product roadmap at http://ibm.biz/AnalyticsRoadmaps for information about support for additional data sources.
Data files
In addition to using data from remote data sources or integrated databases, you can use data from files. You can work with data from the following types of files:
Data source | Cognos Dashboards | DataStage | Data Virtualization | Watson Knowledge Catalog | Watson Studio | Notes |
---|---|---|---|---|---|---|
Complex flat files | ✓ | |||||
CSV files | ✓ | ✓ | ✓ | ✓ | ✓ |
|
Microsoft Excel spreadsheets | ✓ | ✓ | ✓ |
|
||
Sequential File | ✓ | |||||
TSV files | ✓ | ✓ | ✓ | |||
z/OS files | ✓ |
Connecting to data sources (by service)
Use the following resources to create connections in your application:
Service | Learn more |
---|---|
Cognos Dashboards | You can use the local and remote data sets that already exist in your analytics projects.
Alternatively, you can create connections that can be used in an analytics dashboard by selecting Add data source from the analytics dashboard menu. Restriction: Analytics dashboards support only JDBC-based connections.
You can also add data from files by selecting Add data set from the analytics dashboard menu. |
DataStage | You can transform data that is in a catalog by searching for the data that you want to use
and selecting Transform. Alternatively:
To use data from local files, add the file to the job canvas. For details, see Creating a data transformation job. |
Data Virtualization | You can create connections that can be used to virtualize data from the following locations:
For details, see Adding data sources (Data Virtualization). |
Watson Knowledge Catalog |
You can create connections that can be used in the catalog and connections that can be used to curate data. For connections that can be used in a catalog, you can create connections from the catalog Overview page. For details, see Adding a connection asset to a catalog (Watson Knowledge Catalog). For connections that can be used to curate data, you can create connections from the
following locations:
|
Watson Studio | Ideally, you should use data that is already in a catalog. Search for the data you want in a catalog and add it to an analytics project. Alternatively, you can create connections that can be used in analytics projects from the following locations:
You can also add data from files. To add data from files, go to the Assets page of the analytics project. For details, see Adding data to an analytics project. |