FAQs

This topic is a collection of frequently asked questions (FAQs) about the IBM® watsonx.data service.

watsonx.data Developer edition

watsonx.data on IBM Software Hub

General

What is IBM watsonx.data?

IBM watsonx.data is an open, hybrid, and governed fit-for-purpose data store optimized to scale all data, analytics, and AI workloads to get greater value from your analytics ecosystem.

What can I do with IBM watsonx.data?

You can use IBM watsonx.data to collect, store, query, and analyze all your enterprise data with a single unified data platform. You can connect to data in multiple locations and get started in minutes with built-in governance, security, and automation. You can use multiple query engines to run analytics and AI workloads, reducing your data warehouse costs by up to 50%.

What are the key features of IBM watsonx.data?
The key features of IBM watsonx.data are:
  1. An architecture that fully separates compute, metadata, and storage to offer ultimate flexibility.
  2. Multiple engines such as Presto and Spark that provide fast, reliable, and efficient processing of big data at scale.
  3. Open formats for analytic data sets, allowing different engines to access and share the data at the same time.
  4. Data sharing between IBM watsonx.data, Db2® Warehouse, and Netezza Performance Server or any other data management solution through common Iceberg table format support, connectors, and a shareable metadata store.
  5. Built-in governance that is compatible with existing solutions, including IBM Knowledge Catalog.
  6. Cost-effective, simple object storage is available across hybrid-cloud and multicloud environments.
  7. Integration with a robust ecosystem of IBM’s best-in-class solutions and third-party services to enable easy development and deployment of key use cases.
Which data formats are supported in IBM watsonx.data?
The following data formats are supported in IBM watsonx.data:
  1. Ingestion: Data ingestion in IBM watsonx.data supports .CSV and .parquet data file formats.
  2. Create Table: Create table in IBM watsonx.data supports .CSV, .Parquet, .JSON, and .TXT data file formats.
What is the maximum size of the default IBM managed storage?

The IBM-managed storage is a default 10 GB storage.

Presto

What is Presto?

Presto is a distributed SQL query engine, with the capability to query vast data sets located in different data sources, thus solving data problems at scale.

What are the Presto server types?

A Presto installation includes three server types: Coordinator, Worker, and Resource manager.

What SQL statements are supported in IBM watsonx.data?

For information on supported SQL statements, see Supported SQL statements.

Metastore

What is HMS (Hive Metastore)?

Hive Metastore (HMS) is a service that stores metadata that is related to Presto and other services in a backend Relational Database Management System (RDBMS) or Hadoop Distributed File System (HDFS).

Installation and Set up

What version of Cloud Pak for Data do I need to use the latest version of IBM watsonx.data?

For version updates, see What's new in watsonx.data.

How can I create an IBM watsonx.data service instance?

To create an IBM watsonx.data service instance, see Installing watsonx.data.

How can I delete my IBM watsonx.data instance?

To uninstall the IBM watsonx.data service, see Uninstalling watsonx.data.

How can I configure an engine?

From the IBM watsonx.data web console, go to Infrastructure manager to configure an engine. For more information, see Provisioning a Presto engine.

How can I configure catalog or metastore?

To configure a catalog with an engine, see Associating a catalog with an engine.

How can I configure a storage?

From the IBM watsonx.data web console, go to Infrastructure manager to configure a storage. For more information, see Adding a storage-catalog pair.

Access

How can I manage IAM access for IBM watsonx.data?

Controlling access to the engines and other components is a critical requirement for many enterprises. To ensure that the resource usage is under control, IBM watsonx.data provides the ability to manage access controls on these resources. A user with admin privileges on the resources can grant access to other users. For more information about infrastructure access management, see Infrastructure access.

How can I add and remove the users?

To add or remove users, see Managing user access.

How is the access control for users provided?

To provide access control for users to restrict unauthorized access, see Data policy.

What is the process to assign access to a user?

To assign access to a user, see Managing access to the platform.

What is the process to assign access to a group?

To assign access to a group, see Managing user groups.

Presto Engine

How can I create an engine?

To create an engine, see Provisioning a Presto engine.

How can I pause and resume an engine?
To pause an engine, use one of the following methods:
  1. Pausing an engine in list view:
    1. Click the overflow menu icon at the end of the row and click Pause icon. A pause confirmation dialog appears.
    2. Click Pause.
  2. Pausing an engine in topology view:
    1. Hover over the engine that you want to pause and click the Pause icon. A pause confirmation dialog appears.
    2. Click Pause.
To resume a paused engine, use one of the following methods:
  1. Resuming an engine in list view:
    1. Click the overflow menu icon at the end of the row.
    2. Click Resume icon.
  2. Resuming an engine in topology view:
    1. Hover over the engine that you want to resume.
    2. Click Resume icon.
How can I delete an engine?

To delete an engine, see Deleting an engine.

How can I run SQL queries?

You can use the Query workspace interface in IBM watsonx.data to run SQL queries and scripts against your data. For more information, see Running SQL queries.

Databases and Connectors

How can I add a database?

To add a database, see Adding a database-catalog pair.

How can I remove a database?

To remove a database, see Deleting a database-catalog pair.

What data sources does IBM watsonx.data currently support?

IBM watsonx.data currently supports the following data sources:

  1. IBM Db2
  2. IBM Netezza
  3. Apache Kafka
  4. MongoDB
  5. MySQL
  6. PostgreSQL
  7. SQL Server
  8. Custom
  9. Teradata
  10. SAP HANA
  11. Elasticsearch
  12. SingleStore
  13. Snowflake
  14. IBM Data Virtualization Manager for z/OS
How can I load the data into IBM watsonx.data?

You can load the data into IBM watsonx.data by the following ways:

  1. Web console: You can use the Ingestion jobs tab from the Data manager page to securely and easily load data into IBM watsonx.dataconsole. For more information, see Ingesting data by using web console.
  2. Command Line Interface: You can load data into IBM watsonx.data through CLI. For more information, see Ingesting data by using command line interface (CLI) .
  3. Creating tables: You can load or ingest local data files to create tables by using the CREATE TABLE option. For more information, see Creating tables.
How can I create tables?
You can create tables by the following methods:
  1. Through the Data manager page by using the web console. For more information, see Creating tables.
  2. Through the Command Line Interface. For more information, see Creating tables through CLI.
How can I create schema?
You can create schema by the following methods:
  1. Through the Data manager page by using the web console. For more information, see Creating schemas.
  2. Through the Command Line Interface. For more information, see Creating schema through CLI.
How can I query the loaded data?

You can use the Query workspace interface in IBM watsonx.data to run SQL queries and scripts against your data. For more information, see Running SQL queries.

Ingestion

What are the storage bucket options available?

The storage bucket options available are IBM Storage Ceph, IBM Cloud Object Storage (COS), AWS S3, and MinIO object storage.

What type of data files can be ingested?

Only .Parquet and .CSV data files can be ingested.

Can a folder of multiple files be ingested together?

Yes a folder of multiple data files be ingested. An S3 folder must be created with data files in it for ingesting. The source folder must contain either all Parquet files or all CSV files. For detailed information on S3 folder creation, see Preparing for ingesting data.

What commands are supported in the command line interface during ingestion?

For commands supported in the command line interface during ingestion, see Options and parameters supported in ibm-lh tool.