Connecting to data sources

You can connect to your data sources in IBM® Cloud Pak for Data in several ways.

Connecting to data sources at the platform level

You can create connections that can be used by various services across the platform. Any user who has access to the platform can see these connections. However, only users with the credentials for the data source can use a connection.

These platform-level connections are available from the Platform connections page. However, the Platform connections page is available only if the Cloud Pak for Data common core services are installed.

Currently, the following services can use connections from the Platform connections page:
  • Cognos® Analytics
  • Data Virtualization
  • DataStage®
  • Watson™ Knowledge Catalog
  • Watson Studio

    Many of the tools that work with Watson Studio can use data from these connections after the connection is added to a project.

Restriction: Not all services support the same types of connections. Most services support a subset of the connections that are supported by the platform. For more information, see Connecting to data sources (by service).

The Platform connections page is a specialized view of the Platform assets catalog. The connections that are defined on the Platform connections page are also included in the Platform assets catalog.

The Platform connections page shows the list of connections that can be used by various services on the platform. At a minimum, all users have the Viewer role on the catalog, which means that they can see the connections that are defined. For more information, see Managing collaborators on platform connections.

Required permissions
To create a platform-level connection, you must be an Editor or Administrator on the Platform assets catalog catalog.
Tip: Work with your data source administrator to ensure that you have the correct information to connect to your data source.

Watch this video to see how to create a platform-level connection.

This video provides a visual method as an alternative to following the written steps in this documentation.

To create a platform-level connection:

  1. Log in to the Cloud Pak for Data web client.
  2. From the navigation menu, select Data > Platform connections.
  3. Click New connection.
  4. Select the type of data source that you want to connect to.
    The following types of connections have additional requirements that must be met before you can use them:
    Generic JDBC
    If you want to connect to an unsupported data source by creating a Generic JDBC connection, a Cloud Pak for Data administrator must upload the JDBC drivers for that data source. For more information, see Importing JDBC drivers for data sources.
    Storage volume
    If you want to connect to a storage volume, such as an external NFS server or a persistent volume claim, a user with the Create service instances permissions must add the volume to Cloud Pak for Data. For more information, see Managing storage volumes.
  5. Enter a name and description for the connection.
  6. Enter the details for the connection.

    The type of connection that you are creating determines the information that you must specify. Typically, a connection requires either:

    • A hostname and port number
    • A URL

    You might also need to specify the database that you want to connect to.

  7. Enter your credentials for the connection.

    The connection type determines the type of credentials that you must specify. Typically, a connection requires a username and password or an API key and secret key. Some data sources allow you to connect anonymously.

    You might need to specify how you want to provide your credentials. The options that are available depend on how the platform is configured.
    Enter credentials manually
    With this option, you manually enter your credentials in the web client. The platform stores these credentials and uses them to authenticate you.

    This is the default method for entering credentials. However, an administrator can optionally disable this method. For more information, see Requiring users to use secrets for credentials when creating connections.

    Use secrets from a vault
    With this option, you select the secrets that contain the appropriate credentials. For example, if you need to specify your username and password, select the secret that contains your username and the secret that contains your password. The platform uses the secrets (which are stored in a vault) to authenticate you.

    To use this option, an administrator must enable the vaults feature. For more information, see Enabling vaults for the Cloud Pak for Data web client. Additionally, if you are using secrets from an external vault, you must have the appropriate permissions to connect to external vaults or an administrator must share the appropriate secrets with you. For more information, see Managing secrets and vaults.

    Use my platform login credentials
    With this option, the platform uses your platform credentials to authenticate you.

    This option is available only if the data source is a service that is deployed on the instance of Cloud Pak for Data where you are creating the connection.

  8. If applicable, specify the SSL information required to connect to your data source.

    Some data sources require you to use SSL for secure communication. Other data sources support it but do not require it. Ensure that you understand what information you need to provide to communicate securely with your data source:

    • If you specified a port number that is configured to accept SSL connections, ensure that you select The port is configured to accept SSL connections
    • If the data source uses a self-signed certificate, you must specify the contents of the certificate to enable secure communication between Cloud Pak for Data and the data source.
    • If your data source uses chained certificates, you can specify the contents of multiple certificates.

    Some services can use an SSL certificate that is stored as a secret. To use this option, an administrator must enable the vaults feature. For more information, see Enabling vaults for the Cloud Pak for Data web client. Additionally, if you are using secrets from an external vault, you must have the appropriate permissions to connect to external vaults or an administrator must share the appropriate secrets with you. For more information, see Managing secrets and vaults.

Connecting to data sources at the service level

Typically, if you create a connection at the service level, the connection is accessible only from the service where it is created.

Service Learn more
Cognos Dashboards You can use CSV files, database connections, connected data assets, and Data Virtualization assets as data sources for a dashboard. All of these data sources must be added to a project first before they can be used as a data source.

Data sources are added to a dashboard by selecting Add data from the analytics dashboard menu.

For a detailed list of supported data sources, see the Data Format section in Visualizing Data with Cognos Dashboards.

DataStage DataStage uses connectors on the DataStage canvas to interact with remote data sources. To connect to the data source, you need to create a project connection asset for the associated DataStage connector before you can use it in DataStage.
Data Virtualization You can create connections that can be used to virtualize data from the following locations:
  • The Platform connections page
  • The Data sources page in the Data Virtualization service

For more information, see Adding data sources (Data Virtualization).

Watson Knowledge Catalog You can create connections that can be used in the catalog and connections that can be used to curate data.

Add connections that can be used in a catalog from the catalog Overview page. You can create new connections or pick from existing platform-level connections.

For more information, see Adding a connection asset to a catalog (Watson Knowledge Catalog).

When you publish a data asset to a catalog, the connection is published along with it, unless the connection exists in the catalog.

For connections that can be used to curate data, you can create connections as follows:
  • From the Platform connections page. You can pick from those platform-level connections when you set up a discovery job.
  • When you set up a new discovery job from the Governance > Data discovery page.
For more information, see Discovering assets (Watson Knowledge Catalog).
Watson Studio Ideally, you should use data that is already in a catalog. Search for the data you want in a catalog and add it to an analytics project.

Alternatively, you can create connections that can be used in analytics projects from the following locations:

You can also add data from files. To add data from files, go to the Assets page of the analytics project. The initial storage limitation of assets is 100 GB across all projects, spaces, and catalogs.

For more information, see Adding data to an analytics project.