Supported data sources

You can connect to many data sources in Cloud Pak for Data. Some services support connections to data sources that are defined at the platform-level, while other services use connections that are specific to the service.

Ways to connect to your data

Use the following list to choose a method to connect to your data for your use case.

Creating connections at the platform level
In general, platform-level connections simplify the process of creating and maintaining connections. You create the connection and then multiple services can refer to the connection. If you update the connection, the changes are automatically picked up by the projects that use the connection.

You can create platform-level connections from the Platform connections page. These connections can be used by various services across the platform. However, the Platform connections page is available only if the Cloud Pak for Data common core services are installed.

For more information, see Connecting to data sources at the platform level.

Consider creating connections at the platform level if the following statements are true:

  • The services support platform-level connections.
  • The same connection needs to be used by multiple services or instances or across multiple projects.
  • You have the appropriate permissions to create platform-level connections.

    You must have the Editor or Admin role on the Platform connections page. For more information, see Managing collaborators on platform connections.

Platform connections are visible to all platform users. However, only users with the credentials for the data source can use the connection.

If you don't see the type of data source that you want to connect to, a Cloud Pak for Data administrator can create a custom JDBC connector for the data source. If you are connecting to only one data source and users do not need a repeatable method to connect to it, you can create a Generic JDBC connection.

Not all services support the same types of connections. If you want to use a connection from the Platform connections catalog, the list of connections is filtered based on the types of connections that the service supports. For example, if you are using a connection to add a data source to a project, only connections that are supported for projects are displayed.

Creating connections at the service level
Create connections at the service level, if any of the following statements are true:
  • The service that you are using does not support platform-level connections.
  • You don't have the appropriate permissions to create platform-level connections.
  • You don't want the connection to be included in the Connections catalog for security reasons.

For more information, see Connecting to data sources at the service level.

Connectors

The following table lists the data sources that you can connect to from Cloud Pak for Data.

Connector IBM® Knowledge Catalog,
Watson™ Studio
SPSS® Modeler DataStage® Watson Query
Amazon RDS for MySQL See note
Amazon RDS for Oracle See note
Amazon RDS for PostgreSQL See note
Amazon Redshift See note
Amazon S3 See note
Apache Cassandra See note  
Apache Cassandra (optimized)      
Apache Derby See note
Apache HBase      
Apache HDFS See note  
Connector IBM Knowledge Catalog,
Watson Studio
SPSS Modeler DataStage Watson Query
Apache Hive See note
Apache Kafka See note    
Box See note  
Cloudera Impala See note
DataStax Enterprise      
Dremio See note    
Dropbox See note  
Elasticsearch See note  
Exasol See note  
File system    
FTP (remote file system transfer) See note  
Generic JDBC See note
Connector IBM Knowledge Catalog,
Watson Studio
SPSS Modeler DataStage Watson Query
Generic S3 See note  
Google BigQuery See note
Google Cloud Pub/Sub      
Google Cloud Storage See note  
Greenplum See note
HDFS via Execution Engine for Hadoop See note    
Hive JDBC    
Hive via Execution Engine for Hadoop See note    
HTTP See note  
Connector IBM Knowledge Catalog,
Watson Studio
SPSS Modeler DataStage Watson Query
IBM Cloud® Data Engine See note  
IBM Cloud Databases for MongoDB See note
IBM Cloud Databases for MySQL See note
IBM Cloud Databases for PostgreSQL See note
IBM Cloud Object Storage See note
IBM Cloud Object Storage (infrastructure) See note    
IBM Cloudant® See note    
IBM Cognos® Analytics See note  
IBM Data Virtualization Manager for z/OS® See note
Connector IBM Knowledge Catalog,
Watson Studio
SPSS Modeler DataStage Watson Query
IBM Db2® See note
IBM Db2 (optimized)      
IBM Db2 Big SQL See note
IBM Db2 for i See note
IBMDb2 for z/OS See note
IBM Db2 on Cloud See note
IBM Db2 Warehouse See note
IBM Informix® See note
IBM Match 360 See note    
Connector IBM Knowledge Catalog,
Watson Studio
SPSS Modeler DataStage Watson Query
IBM MQ      
IBM Netezza® Performance Server See note
IBM Netezza Performance Server (optimized)      
IBM Planning Analytics See note  
IBM Product Master See note      
IBM SPSS Analytic Server See note    
IBM Watson Query See note  
IBM watsonx.data See note    
Impala via Execution Engine for Hadoop See note    
Looker See note  
MariaDB See note
Microsoft Azure Blob Storage See note  
Microsoft Azure Cosmos DB See note  
Microsoft Azure Data Lake Storage See note  
Connector IBM Knowledge Catalog,
Watson Studio
SPSS Modeler DataStage Watson Query
Microsoft Azure File Storage See note  
Microsoft Azure SQL Database See note
Microsoft Power BI (Azure) IBM Knowledge Catalog      
Microsoft Power BI (Local) IBM Knowledge Catalog      
Microsoft SQL Server See note
Microsoft SQL Server Integration Services IBM Knowledge Catalog      
Microsoft SQL Server Reporting Services IBM Knowledge Catalog      
MinIO See note  
MongoDB See note
MySQL
(My SQL Community Edition)
(My SQL Enterprise Edition)
See note
OData See note      
ODBC    
Oracle See note
Oracle (optimized)      
Oracle Business Intelligence Enterprise Edition IBM Knowledge Catalog      
Oracle Data Integrator IBM Knowledge Catalog      
PostgreSQL See note
Presto See note
Connector IBM Knowledge Catalog,
Watson Studio
SPSS Modeler DataStage Watson Query
Qlik Sense IBM Knowledge Catalog      
Salesforce.com See note
Salesforce.com (optimized)      
SAP ASE See note
SAP BusinessObjects See note    
SAP Bulk Extract      
SAP Delta Extract      
SAP HANA See note
SAP IDoc      
SAP IQ See note  
SAP OData See note
SingleStoreDB See note  
Snowflake See note
Storage volume See note  
Tableau See note  
Teradata See note
Teradata (optimized)      
Note: In the IBM Knowledge Catalog, Watson Studio column, this table shows the data sources that are supported in catalogs and projects. Some tools for these services support only a subset of those data sources. Follow the link for a specific data source to see the list of tools that support that data source. See also Supported connectors by tool.

Other data sources

An administrator can upload JDBC drivers to enable connections to more data sources. See Importing JBDC drivers for data sources.

The Watson Query service supports connections that are established by using third-party JDBC drivers.

Data files

In addition to using data from remote data sources or integrated databases, you can use data from files. You can work with data from the following types of files.

Type of data file Supported in
Avro DataStage
IBM Knowledge Catalog
SPSS Modeler
Watson Studio
CSV
DataStage
Decision Optimization
IBM Knowledge Catalog
SPSS Modeler
Watson Query
Watson Studio
JSON
DataStage
Decision Optimization (JSON tabular form)
IBM Knowledge Catalog
Watson Query
Watson Studio
Microsoft Excel spreadsheets
DataStage
IBM Knowledge Catalog
SPSS Modeler
Watson Query
Watson Studio
ORC
DataStage
Watson Query
Parquet
DataStage
IBM Knowledge Catalog
Watson Query
Watson Studio
SAS SPSS Modeler
Watson Studio (Data Refinery)
SAV
DataStage
SPSS Modeler
TSV
DataStage
IBM Knowledge Catalog
Watson Query
Watson Studio (Data Refinery)
XML
DataStage
Decision Optimization (XML tabular form)
SPSS Modeler

Connecting to data sources (by service)

Use the following resources to create connections in your application.

Cognos Dashboards
You can use CSV files, Microsoft Excel spreadsheets, connected data assets, and Watson Query assets as data sources for a dashboard. You must add all of these data sources to a project before you can use them as data sources.

Add data sources to a dashboard by clicking the Add a source (+) button in the Selected sources pane.

For more information, see Supported data sources for Cognos Dashboards.

Data Refinery

You can cleanse and refine tabular data with a graphical flow editor tool called Data Refinery. To refine data, you must add connections to your data sources and you must understand source file limitations. For more information, see Refining data (Data Refinery) and Supported data sources for Data Refinery.

DataStage
DataStage uses connectors on the DataStage canvas to work with remote data sources. To connect to the data source, you need to create a connection asset for the associated DataStage connector before you can use it in DataStage.
Decision Optimization
You can use CSV, JSON (tabular form), XML (tabular form) or connected assets to build and deploy Decision Optimization models.

For more information, see Supported data sources for Decision Optimization.

IBM Knowledge Catalog
You can create connections that can be used in the catalog or in projects and connections that can be used to curate data. In general, you can create connections from the Platform connections page. In addition, you can create connections as follows:
SPSS Modeler
Data sources in the SPSS Modeler service support read-only access, read/write access, and SQL pushback.

The SPSS Modeler service also supports several other file types.

For more information, see Supported data sources for SPSS Modeler.

Synthetic Data Generator
Data sources in the Synthetic Data Generator service support read-only access and read/write access.

The Synthetic Data Generator service also supports several other file types.

For more information, see Supported data sources for Synthetic Data Generator.

Watson Machine Learning Accelerator
You can create connections that can be used in projects from the following locations:
  • The Connections page
  • The Assets page of the project

You can also add data from files. To add data from files, go to the Assets page of the project.

For more information, see Adding data to an project.

See also Supported data sources for Watson Machine Learning Accelerator.

Watson Query
You can create connections that can be used to virtualize data from the following locations:
  • The Platform connections page
  • The Data sources page in the Watson Query service.

For more information, see Connecting to data sources in Watson Query.

See also Supported data sources in Watson Query.

Watson Studio

Ideally, use data that is already in a catalog. Search for the data you want in a catalog and add it to a project.

Alternatively, you can create connections that can be used in projects from the following locations:
  • The Connections page
  • The Assets page of the project

You can also add data from files. To add data from files, go to the Assets page of the project.

For more information, see Adding data to an project.

Learn more