Supported data sources
In IBM® Cloud Pak for Data, you can connect to your data no matter where it lives.
Ways to connect to your data
The way that you connect to your data depends on several factors, including the services that are installed on Cloud Pak for Data. Some services can use connections that are defined at the platform-level, while other services use connections that are specific to the service.
Use the following list to determine which method is appropriate for your use case.
- Creating connections at the platform level
- In general, platform-level connections simplify the process of creating and maintaining
connections. You create the connection and then multiple services can refer to the connection. If
you update the connection, the changes are automatically picked up by the analytics projects that
use the connection. Connections used in automated discovery might not be updated automatically. For
more information, see Supported connections for automated discovery.
You can create platform-level connections from the Platform connections page. These connections can be used by various services across the platform. However, the Platform connections page is available only if the Cloud Pak for Data common core services are installed.
For more information, see Connecting to data sources at the platform level.
Consider creating connections at the platform level if the following statements are true:
- The services support platform-level connections.
- The same connection needs to be used by multiple services or instances or across multiple projects.
- You have the appropriate permissions to create platform-level connections.
You must have the Editor or Admin role on the Platform connections page. For more information, see Managing collaborators on platform connections.
Platform connections are visible to all platform users. However, only users with the credentials for the data source can use the connection.
If you don't see the type of data source that you want to connect to, a Cloud Pak for Data administrator can upload the JDBC driver JAR files so that you can create a generic JDBC connection to the data source. For more information, see Importing JDBC drivers for data sources.
Not all services support the same types of connections. If you want to use a connection from the Platform connections catalog, the list of connections is filtered based on the types of connections that the service supports. For example, if you are using a connection to add a data source to an analytics project, only connections that are supported for analytics projects are displayed.
- Creating connections at the service level
- Create connections at the service level, if any of the following statements are true:
- The service that you are using does not support platform-level connections.
- You don't have the appropriate permissions to create platform-level connections.
- You don't want the connection to be included in the Connections catalog for security reasons.
For more information, see Connecting to data sources at the service level.
Connection types
The following table lists the data sources that you can connect to from Cloud Pak for Data.
Connection type | Watson Knowledge
Catalog, Watson Studio
|
SPSS® Modeler | DataStage® | Data Virtualization |
---|---|---|---|---|
Amazon RDS for MySQL | ✓ | ✓ | ✓ | ✓ |
Amazon RDS for Oracle | ✓ | ✓ | ✓ | |
Amazon RDS for PostgreSQL | ✓ | ✓ | ✓ | ✓ |
Amazon Redshift | ✓ | ✓ | ✓ | ✓ |
Amazon S3 | ✓ | ✓ | ✓ | ✓ |
Analytics Engine HDFS | ✓ | ✓ | ||
Apache Cassandra | ✓ | ✓ | ✓ | |
Apache Derby | ✓ | ✓ | ✓ | |
Apache HBase | ✓ | |||
Apache HDFS | ✓ | ✓ | ✓ | |
Connection type | Watson Knowledge
Catalog,
Watson Studio
|
SPSS Modeler | DataStage | Data Virtualization |
Apache Hive | ✓ | ✓ | ✓ | ✓ |
Apache Kafka | ✓ | ✓ | ||
Box | ✓ | ✓ | ✓ | |
Cloudera Impala | ✓ | ✓ | ✓ | ✓ |
Dropbox | ✓ | ✓ | ||
Elasticsearch | ✓ | |||
Exasol | ✓ | ✓ | ||
File system | ✓ | ✓ | ||
FTP (remote file system transfer) | ✓ | ✓ | ✓ | |
Generic JDBC | ✓ | ✓ | ✓ | |
Connection type | Watson Knowledge
Catalog,
Watson Studio
|
SPSS Modeler | DataStage | Data Virtualization |
Generic S3 | ✓ | ✓ | ||
Google BigQuery | ✓ | ✓ | ✓ | ✓ |
Google Cloud Pub/Sub | ✓ | |||
Google Cloud Storage | ✓ | ✓ | ✓ | |
Greenplum | ✓ | ✓ | ✓ | ✓ |
HDFS via Execution Engine for Hadoop | ✓ | ✓ | ||
Hive JDBC | ✓ | ✓ | ||
Hive via Execution Engine for Hadoop | ✓ | ✓ | ||
HTTP | ✓ | ✓ | ✓ | |
IBM Cloud® Compose for MySQL | ✓ | ✓ | ✓ | ✓ |
Connection type | Watson Knowledge
Catalog,
Watson Studio
|
SPSS Modeler | DataStage | Data Virtualization |
IBM Cloud Data Engine | ✓ | ✓ | ||
IBM Cloud Databases for DataStax | ✓ | ✓ | ||
IBM Cloud Databases for MongoDB | ✓ | ✓ | ✓ | ✓ |
IBM Cloud Databases for PostgreSQL | ✓ | ✓ | ✓ | ✓ |
IBM Cloud Object Storage | ✓ | ✓ | ✓ | ✓ |
IBM Cloud Object Storage (infrastructure) | ✓ | ✓ | ||
IBM Cloudant® | ✓ | ✓ | ||
IBM Cognos® Analytics | ✓ | ✓ | ✓ | |
IBM Data Virtualization | ✓ | ✓ | ✓ | |
IBM Data Virtualization Manager for z/OS® | ✓ | ✓ | ✓ | ✓ |
Connection type | Watson Knowledge
Catalog,
Watson Studio
|
SPSS Modeler | DataStage | Data Virtualization |
IBM Db2® | ✓ | ✓ | ✓ | ✓ |
IBM Db2 Big SQL | ✓ | ✓ | ✓ | ✓ |
IBM Db2 for i | ✓ | ✓ | ✓ | ✓ |
IBMDb2 for z/OS | ✓ | ✓ | ✓ | ✓ |
IBM Db2 Hosted | ✓ | ✓ | ✓ | ✓ |
IBM Db2 on Cloud | ✓ | ✓ | ✓ | ✓ |
IBM Db2 Warehouse | ✓ | ✓ | ✓ | ✓ |
IBM Informix® | ✓ | ✓ | ✓ | ✓ |
IBM Match 360 | ✓ | ✓ | ||
Connection type | Watson Knowledge
Catalog,
Watson Studio
|
SPSS Modeler | DataStage | Data Virtualization |
IBM MQ | ✓ | |||
IBM Netezza® Performance Server | ✓ | ✓ | ✓ | ✓ |
IBM Planning Analytics | ✓ | ✓ | ✓ | |
IBM SPSS Analytic Server | ✓ | ✓ | ||
Impala via Execution Engine for Hadoop | ✓ | ✓ | ||
Looker | ✓ | ✓ | ||
MariaDB | ✓ | ✓ | ✓ | ✓ |
Microsoft Azure Blob Storage | ✓ | ✓ | ✓ | |
Microsoft Azure Cosmos DB | ✓ | ✓ | ✓ | |
Microsoft Azure Data Lake Store | ✓ | ✓ | ✓ | |
Connection type | Watson Knowledge
Catalog,
Watson Studio
|
SPSS Modeler | DataStage | Data Virtualization |
Microsoft Azure File Storage | ✓ | ✓ | ✓ | |
Microsoft Azure SQL Database | ✓ | ✓ | ✓ | ✓ |
Microsoft SQL Server | ✓ | ✓ | ✓ | ✓ |
MinIO | ✓ | ✓ | ||
MongoDB | ✓ | ✓ | ✓ | ✓ |
MySQL (My SQL Community Edition)
(My SQL Enterprise Edition) |
✓ | ✓ | ✓ | ✓ |
OData | ✓ | |||
ODBC | ✓ | ✓ | ||
Oracle | ✓ | ✓ | ✓ | ✓ |
PostgreSQL | ✓ | ✓ | ✓ | ✓ |
Connection type | Watson Knowledge
Catalog,
Watson Studio
|
SPSS Modeler | DataStage | Data Virtualization |
Power BI (Azure) | Watson Knowledge Catalog | |||
Power BI (Local) | Watson Knowledge Catalog | |||
Salesforce.com | ✓ | ✓ | ✓ | ✓ |
SAP ASE | ✓ | ✓ | ✓ | ✓ |
SAP Bulk Extract | ✓ | |||
SAP Delta Extract | ✓ | |||
SAP HANA | ✓ | ✓ | ✓ | ✓ |
SAP IQ | ✓ | ✓ | ✓ | |
SAP OData | ✓ | ✓ | ✓ | ✓ |
Snowflake | ✓ | ✓ | ✓ | ✓ |
Storage volume | ✓ | ✓ | ✓ | |
Tableau | ✓ | ✓ | ||
Teradata | ✓ | ✓ | ✓ | ✓ |
Other data sources
An administrator can upload JDBC drivers to enable connections to more data sources. See Importing JBDC drivers for data sources.
The Data Virtualization service supports connections that are established by using third-party JDBC drivers.
See the product roadmap at http://ibm.biz/AnalyticsRoadmaps for information about support for more data sources.
Data files
In addition to using data from remote data sources or integrated databases, you can use data from files. You can work with data from the following types of files.
Type of data file | Supported in |
---|---|
Avro | DataStage SPSS Modeler
Watson Knowledge
Catalog Watson Studio
|
CSV | Data Virtualization DataStage
SPSS Modeler
Watson Knowledge
Catalog Watson Studio
|
JSON | Data Virtualization DataStage
Watson Knowledge
Catalog Watson Studio
|
Microsoft Excel spreadsheets | Data Virtualization SPSS Modeler
Watson Knowledge
Catalog Watson Studio
|
ORC | Data Virtualization |
Parquet | Data Virtualization DataStage
Watson Knowledge
Catalog Watson Studio
|
SAS | SPSS Modeler Watson Studio (Data Refinery)
|
SAV | SPSS Modeler |
TSV | Data Virtualization DataStage
Watson Knowledge
Catalog Watson Studio (Data Refinery)
|
XML | SPSS Modeler |
Connecting to data sources (by service)
Use the following resources to create connections in your application.
- Cognos Dashboards
- You can use the local and remote data sets that exist in your analytics projects.
Alternatively, you can create connections that can be used in an analytics dashboard by selecting Add data source from the analytics dashboard menu.
Restriction: Analytics dashboards support only JDBC-based connections.You can also add data from files by selecting Add data set from the analytics dashboard menu.
- Data Refinery
-
You can cleanse and refine tabular data with a graphical flow editor tool called Data Refinery. To refine data, you must add connections to your data sources and you must understand source file limitations. For more information, see Refining data (Data Refinery) and Supported data sources for Data Refinery.
- DataStage
- DataStage uses connectors
on the DataStage canvas to interact with
remote data sources. To connect to the data source, you need to create a project connection
asset for the associated DataStage
connector before you can use it in DataStage.
- For instructions on creating a project connection asset, see Adding connections to analytics projects.
- For the list of available DataStage connectors, see DataStage connectors.
- To add a local file such as a CSV file, see Adding data to an analytics project.
- Data Virtualization
- You can create connections that can be used to virtualize data from the following locations:
- The Platform connections page
- The Data sources page in the Data Virtualization service.
For more information, see Adding data sources (Data Virtualization).
- SPSS Modeler
- Data sources in the SPSS Modeler service
support read-only access, read/write access, and SQL pushback.
The SPSS Modeler service also supports several other file types.
For more information, see Supported data sources for SPSS Modeler.
- Watson Knowledge Catalog
-
You can create connections that can be used in the catalog or in analytics projects and connections that can be used to curate data. In general, you can create connections from the Platform connections page. In addition, you can create connections as follows:
- Connections that can be used in a catalog from the catalog Assets page. For more information, see Adding a connection asset to a catalog (Watson Knowledge Catalog).
- Connections that can be used in analytics projects from the Assets page of the analytics project. For more information, see Adding data to an analytics project.
- Connections that can be used for metadata import in analytics projects when you create the metadata import asset. For more information, see Managing metadata imports.
- Connections that can be used in data discovery or data quality projects from the Running automated discovery. page when you create a new discovery job. For more information, see
- Watson Studio
Ideally, use data that is already in a catalog. Search for the data you want in a catalog and add it to an analytics project.
Alternatively, you can create connections that can be used in analytics projects from the following locations:- The Connections page
- The Assets page of the analytics project
You can also add data from files. To add data from files, go to the Assets page of the analytics project.
For more information, see Adding data to an analytics project.