You can organize your data in watsonx.data by associating storage.
watsonx.data on Red Hat® OpenShift®
watsonx.data Developer
edition
About this task
In watsonx.data, the data is stored either in
an internal storage created during instance provisioning or in an externally managed storage. You
can associate a catalog with a storage. A catalog defines the schemas and metadata for a
storage.
Note: The out-of-the-box MinIO object storage is provided for exploratory purposes only. It does not
have all the security features and is not configured to provide high-speed data access. Register
your own s3 bucket that meets your security and performance requirements.
Procedure
To add a storage, complete the following steps:
- Log in to watsonx.data
console.
- From the navigation menu, select Infrastructure
manager.
- To define and connect a storage, click Add component.
- In the Add component window, select a storage from the
Storage section and provide the details to connect to existing externally
managed storage.
Note: A catalog defines the schemas and metadata for a data source. Depending on storage type,
Iceberg, Hive, Hudi, and Delta lake catalogs are supported.
Note: You can modify the access key and secret key of a user-registered bucket for a storage. This
feature is only available for user-registered buckets and is not applicable to default buckets,
ADLS, or Google Cloud Storage. This feature can only be used if the new credentials successfully
pass the test connection.
Features
For
Iceberg connector:
- You can delete data from tables by using
DELETE FROM
statement for
Iceberg connector.
- You can specify the table property
delete_mode
for new tables by using either
copy-on-write
mode or merge-on-read
mode (default).
For
DELETE FROM
statement for
Iceberg connector:
- Filtered columns only support comparison operators, such as
EQUALS
,
LESS THAN
, or LESS THAN EQUALS
.
- Deletes must only occur on the latest snapshot.
- For V1 tables, the Iceberg connector can delete data only in one or more entire
partitions. Columns in the filter must all be identity-transformed partition columns of the target
table.
For the
Iceberg connector,
ALTER TABLE
operations on a column support the
following data type conversions:
INT
to BIGINT
FLOAT
to DOUBLE
DECIMAL
(num1, dec_digits) to DECIMAL
(num2, dec_digits), where
num2>num1
Limitations for SQL statements
- For Iceberg,
UPDATE
query with sub-query is not supported.
- For Iceberg,
UPDATE
query with mixed-case column is not supported.
- For Iceberg, Memory and Hive connectors,
DROP SCHEMA
can
do RESTRICT
by default.
- For the database-based catalogs,
CREATE SCHEMA
, CREATE TABLE
,
DROP SCHEMA
, DROP TABLE
, DELETE
, DROP
VIEW
, ALTER TABLE
, and ALTER SCHEMA
statements are not
available in the Data manager UI.
Limitations for data types
- For the Iceberg connector, the maximum number of digits that can be accommodated in a
column of data type
FLOAT
and DOUBLE
is 37. Trying to insert
anything larger ends up in a decimal overflow error.
- When the fields of data type
REAL
have 6 digits or more in the decimal part
with the digits being predominately zero, the values when queried are rounded off. It is observed
that the rounding off occurs differently based on the precision of the values. For example, a
decimal number 1.654 when rounded to 3-digits after the decimal point are the same. Another example
is 10.890009 and 10.89000. It is noticed that 10.89000 is rounded to 10.89, whereas 10.89009 is not
rounded off. This is an inherent issue because of the representational limitations of binary
floating point formats. This might have a significant impact when querying involves sorting.
For more information on mixed-case feature flag behavior, supported SQL statements, and supported
data types matrices, see Support content.