Adding storage

You can organize your data in watsonx.data by associating storage.

watsonx.data on IBM Software Hub

watsonx.data Developer edition

About this task

In watsonx.data, the data is stored either in an internal storage created during instance provisioning or in an externally managed storage. You can associate a catalog with a storage. A catalog defines the schemas and metadata for a storage.

Note: The out-of-the-box MinIO object storage is provided for exploratory purposes only. It does not have all the security features and is not configured to provide high-speed data access. Register your own s3 bucket that meets your security and performance requirements.

Procedure

To add a storage, complete the following steps:

Log in to watsonx.data console.
From the navigation menu, select Infrastructure manager.
To define and connect a storage, click Add component.
In the Add component window, select a storage from the Storage section and provide the details to establish the connection.
watsonx.data supports the following storage options:
Note: Hadoop Distributed File System (HDFS) is not supported in Milvus.
Note: You can modify the access key and secret key of a user-registered bucket for a storage. This feature is only available for user-registered buckets and is not applicable to default buckets, ADLS, or Google Cloud Storage. This feature can be used if the new credentials successfully pass the test connection.
Additional information about catalogs
You have an option to associate a catalog to the storage.
Note:
- The catalog name cannot be specified as hive (case-insensitive) as it is a reserved keyword.
- The catalog must point to the same table where the data resides for the table creation to work effectively.
  Catalog credentials: The catalog must have the necessary credentials to connect to the external bucket in order to access the data. Without these credentials, the connection fails before reaching HMS.
  
  Presto failure: If the catalog lacks the required credentials, the table creation process fails at the Presto level. The query cannot reach HMS to initiate the table creation.
For Apache Iceberg catalog:
- You can delete data from tables by using DELETE FROM statement for Apache Iceberg catalog.
- You can specify the table property delete_mode for new tables by using either copy-on-write mode or merge-on-read mode (default).
For DELETE FROM statement for Apache Iceberg catalog:
- Filtered columns support comparison operators, such as EQUALS, LESS THAN, or LESS THAN EQUALS.
- Deletes must occur on the latest snapshot.
- For V1 tables, the Apache Iceberg catalog can delete data only in one or more entire partitions. Columns in the filter must all be identity-transformed partition columns of the target table.
For the Apache Iceberg catalog, ALTER TABLE operations on a column support the following data type conversions:
- INT to BIGINT
- FLOAT to DOUBLE
- DECIMAL(num1, dec_digits) to DECIMAL(num2, dec_digits), where num2>num1
Limitations for SQL statements
- For Apache Iceberg, UPDATE query with subquery is not supported.
- For Apache Iceberg, mixed-case feature flag is not supported.
- For Apache Iceberg, Memory and Hive connectors, DROP SCHEMA can do RESTRICT by default.
- For the database-based catalogs, CREATE SCHEMA, CREATE TABLE, DROP SCHEMA, DROP TABLE, DELETE, DROP VIEW, ALTER TABLE, and ALTER SCHEMA statements are not available in the Data manager UI.
Limitations for data types
- For Apache Iceberg, the maximum number of digits that can be accommodated in a column of data type FLOAT and DOUBLE is 37. Trying to insert anything larger ends up in a decimal overflow error.
- When the fields of data type REAL have 6 digits or more in the decimal part with the digits being predominately zero, the values when queried are rounded off. It is observed that the rounding off occurs differently based on the precision of the values. For example, a decimal number 1.654 when rounded to 3-digits after the decimal point are the same. Another example is 10.890009 and 10.89000. It is noticed that 10.89000 is rounded to 10.89, whereas 10.89009 is not rounded off. This is an inherent issue because of the representational limitations of binary floating point formats. This might have a significant impact when querying involves sorting.
For more information on mixed-case feature flag behavior, supported SQL statements, and supported data types matrices, see Support content.

Related API: For information on related API, see
- Get bucket registrations.
- Register bucket.