Supported data sources

In IBM Cloud Pak® for Data, you can connect to your data no matter where it lives.

Ways to connect to your data

The way that you connect to your data depends on several factors, including the services that are installed on Cloud Pak for Data. Some services can use connections that are defined at the platform-level, while other services use connections that are specific to the service.

Use the following list to determine which method is appropriate for your use case.

Creating connections at the platform level
In general, platform-level connections simplify the process of creating and maintaining connections. You create the connection and then multiple services can refer to the connection. If you update the connection, the changes are automatically picked up by the projects that use the connection.

You can create platform-level connections from the Platform connections page. These connections can be used by various services across the platform. However, the Platform connections page is available only if the Cloud Pak for Data common core services are installed.

For more information, see Connecting to data sources at the platform level.

Consider creating connections at the platform level if the following statements are true:

  • The services support platform-level connections.
  • The same connection needs to be used by multiple services or instances or across multiple projects.
  • You have the appropriate permissions to create platform-level connections.

    You must have the Editor or Admin role on the Platform connections page. For more information, see Managing collaborators on platform connections.

Platform connections are visible to all platform users. However, only users with the credentials for the data source can use the connection.

If you don't see the type of data source that you want to connect to, a Cloud Pak for Data administrator can upload the JDBC driver JAR files so that you can create a generic JDBC connection to the data source. For more information, see Importing JDBC drivers for data sources.

Not all services support the same types of connections. If you want to use a connection from the Platform connections catalog, the list of connections is filtered based on the types of connections that the service supports. For example, if you are using a connection to add a data source to a project, only connections that are supported for projects are displayed.

Creating connections at the service level
Create connections at the service level, if any of the following statements are true:
  • The service that you are using does not support platform-level connections.
  • You don't have the appropriate permissions to create platform-level connections.
  • You don't want the connection to be included in the Connections catalog for security reasons.

For more information, see Connecting to data sources at the service level.

Connectors

The following table lists the data sources that you can connect to from Cloud Pak for Data.

Connector Watson™ Knowledge Catalog,
Watson
Studio
SPSS® Modeler DataStage® Watson Query
Amazon RDS for MySQL See note
Amazon RDS for Oracle See note
Amazon RDS for PostgreSQL See note
Amazon Redshift See note
Amazon S3 See note
Apache Cassandra See note  
Apache Cassandra (optimized)      
Apache Derby See note  
Apache HBase      
Apache HDFS See note  
Connector Watson Knowledge Catalog,
Watson
Studio
SPSS Modeler DataStage Watson Query
Apache Hive See note
Apache Kafka See note    
Box See note  
Cloudera Impala See note
Dremio See note    
Dropbox See note  
Elasticsearch See note  
Exasol See note  
File system    
FTP (remote file system transfer) See note  
Generic JDBC See note
Connector Watson Knowledge Catalog,
Watson
Studio
SPSS Modeler DataStage Watson Query
Generic S3 See note  
Google BigQuery See note
Google Cloud Pub/Sub      
Google Cloud Storage See note  
Greenplum See note
HDFS via Execution Engine for Hadoop See note    
Hive JDBC    
Hive via Execution Engine for Hadoop See note    
HTTP See note  
Connector Watson Knowledge Catalog,
Watson
Studio
SPSS Modeler DataStage Watson Query
IBM Cloud® Data Engine See note    
IBM Cloud Databases for MongoDB See note
IBM Cloud Databases for MySQL See note
IBM Cloud Databases for PostgreSQL See note
IBM® Cloud Object Storage See note
IBM Cloud Object Storage (infrastructure) See note    
IBM Cloudant® See note    
IBM Cognos® Analytics See note  
IBM Data Virtualization Manager for z/OS® See note
Connector Watson Knowledge Catalog,
Watson
Studio
SPSS Modeler DataStage Watson Query
IBM Db2® See note
IBM Db2 (optimized)      
IBM Db2 Big SQL See note
IBM Db2 for i See note
IBMDb2 for z/OS See note
IBM Db2 on Cloud See note
IBM Db2 Warehouse See note
IBM Informix® See note
IBM Match 360 See note    
Connector Watson Knowledge Catalog,
Watson
Studio
SPSS Modeler DataStage Watson Query
IBM MQ      
IBM Netezza® Performance Server See note
IBM Netezza Performance Server (optimized)      
IBM Planning Analytics See note  
IBM Product Master See note      
IBM SPSS Analytic Server See note    
IBM Watson Query See note  
Impala via Execution Engine for Hadoop See note    
Looker See note    
MariaDB See note
Microsoft Azure Blob Storage See note  
Microsoft Azure Cosmos DB See note  
Microsoft Azure Data Lake Storage See note  
Connector Watson Knowledge Catalog,
Watson
Studio
SPSS Modeler DataStage Watson Query
Microsoft Azure File Storage See note  
Microsoft Azure SQL Database See note
Microsoft Power BI (Azure) Watson Knowledge Catalog      
Microsoft Power BI (Local) Watson Knowledge Catalog      
Microsoft SQL Server See note
Microsoft SQL Server Integration Services Watson Knowledge Catalog      
Microsoft SQL Server Reporting Services Watson Knowledge Catalog      
MinIO See note  
MongoDB See note
MySQL
(My SQL Community Edition)
(My SQL Enterprise Edition)
See note
OData See note      
ODBC    
Oracle See note
Oracle (optimized)      
Oracle Business Intelligence Enterprise Edition Watson Knowledge Catalog      
Oracle Data Integrator Watson Knowledge Catalog      
PostgreSQL See note
Presto See note    
Connector Watson Knowledge Catalog,
Watson
Studio
SPSS Modeler DataStage Watson Query
Qlik Sense Watson Knowledge Catalog      
Salesforce.com See note
Salesforce.com (optimized)      
SAP ASE See note
SAP Bulk Extract      
SAP Delta Extract      
SAP HANA See note
SAP IDoc      
SAP IQ See note  
SAP OData See note
SingleStoreDB See note    
Snowflake See note
Storage volume See note  
Tableau See note  
Teradata See note
Teradata (optimized)      
watsonx.data Watson Knowledge Catalog      
Note: In the Watson Knowledge Catalog, Watson Studio column, this table shows the data sources that are supported in catalogs and projects. Some tools for these services support only a subset of those data sources. Follow the link for a specific data source to see the list of tools that support that data source. See also Supported connectors by tool.

Other data sources

An administrator can upload JDBC drivers to enable connections to more data sources. See Importing JBDC drivers for data sources.

The Watson Query service supports connections that are established by using third-party JDBC drivers.

Data files

In addition to using data from remote data sources or integrated databases, you can use data from files. You can work with data from the following types of files.

Type of data file Supported in
Avro DataStage
SPSS
Modeler
Watson
Knowledge Catalog
Watson
Studio
CSV
DataStage
SPSS
Modeler
Watson
Knowledge Catalog
Watson Query
Watson
Studio
JSON
DataStage
Watson
Knowledge Catalog
Watson Query
Watson
Studio
Microsoft Excel spreadsheets
DataStage
SPSS
Modeler
Watson
Knowledge Catalog
Watson Query
Watson
Studio
ORC Watson Query
Parquet
DataStage
Watson
Knowledge Catalog
Watson Query
Watson
Studio
SAS SPSS Modeler
Watson
Studio
(Data Refinery)
SAV SPSS Modeler
TSV
DataStage
Watson
Knowledge Catalog
Watson Query
Watson
Studio
(Data Refinery)
XML
DataStage
SPSS
Modeler

Connecting to data sources (by service)

Use the following resources to create connections in your application.

Cognos Dashboards
You can use CSV files, Microsoft Excel spreadsheets, connected data assets, and Watson Query assets as data sources for a dashboard. You must add all of these data sources to a project before you can use them as data sources.

Add data sources to a dashboard by clicking the Add a source (+) button in the Selected sources pane.

For more information, see Supported data sources for Cognos Dashboards.

Data Refinery

You can cleanse and refine tabular data with a graphical flow editor tool called Data Refinery. To refine data, you must add connections to your data sources and you must understand source file limitations. For more information, see Refining data (Data Refinery) and Supported data sources for Data Refinery.

DataStage
DataStage uses connectors on the DataStage canvas to work with remote data sources. To connect to the data source, you need to create a connection asset for the associated DataStage connector before you can use it in DataStage.
SPSS Modeler
Data sources in the SPSS Modeler service support read-only access, read/write access, and SQL pushback.

The SPSS Modeler service also supports several other file types.

For more information, see Supported data sources for SPSS Modeler.

Watson Knowledge Catalog
You can create connections that can be used in the catalog or in projects and connections that can be used to curate data. In general, you can create connections from the Platform connections page. In addition, you can create connections as follows:
Watson Query
You can create connections that can be used to virtualize data from the following locations:
  • The Platform connections page
  • The Data sources page in the Watson Query service.

For more information, see Adding data sources (Watson Query).

Watson Studio

Ideally, use data that is already in a catalog. Search for the data you want in a catalog and add it to a project.

Alternatively, you can create connections that can be used in projects from the following locations:
  • The Connections page
  • The Assets page of the project

You can also add data from files. To add data from files, go to the Assets page of the project.

For more information, see Adding data to an project.

Learn more