Data intelligence tools settings
Enable or disable the use of generative AI for data intelligence tools in a project.
If generative AI capabilities in general are enabled for watsonx.data intelligence, you can turn off the features for specific projects.
For some of the capabilities, additional deployment requirements apply. For more information, see Preparing to install IBM watsonx.data intelligence in the IBM Software Hub documentation.
If you turn off these capabilities for a project, the project can't be enabled for natural language queries and the use of generative AI is disabled for certain metadata enrichment objectives and for data quality rules.
You might want to do that for projects that are set up for use cases that don't rely on the gen AI capabilities of the data intelligences tools to avoid unnecessary use of generative AI thus also limiting the incurred cost for working with large language models.
If you want to make the gen AI capabilities available in a project at a later time, you can turn on the features again.
Natural language queries
When you enable projects that allow for the use of gen AI for natural language queries, a job with the name Onboard for generative AI_<date-time> is started. This job initializes the project for vectorization of metadata so
that the metadata can be used for converting plain text requests to SQL queries or in semantic searches. If the project already contains data assets, the metadata of these assets is directly vectorized and stored. Metadata that is added after
the initialization is automatically processed and vectorized.
The vectorized metadata is used for Text-to-SQL conversions. Convert plain text requests into complex and accurate SQL queries that you can run on relational data sources. No access to a data source is required for generating the SQL queries. The queries are generated based on the available metadata.
Optionally, you can run metadata enrichment to attach more metadata to your assets, which improves the accuracy of the generated SQL queries.
- Generate meaningful names for columns that have cryptic names in the data source, for example, acronyms or abbreviations that only subject matter experts can decipher.
- Generate descriptions for columns that provide additional context to the models.
- Run term assignment to provide business context for your data that helps to map queries in business language to the right columns.
- Profile the data and collect data samples to provide information about the data format to the model.
- Run key and relationship analysis to identify how data assets are linked.
To provide even more context to the Text-to-SQL service, you can upload CSV files with SQL samples to the project. For more information, see Providing additional context for converting natural language queries to SQL.
If you disable natural language queries, vectorization of new metadata is stopped. However, you can enable that feature again at any time so that the metadata is processed again.
You can generate SQL queries from plain text and run these queries for data from these data sources:
- Amazon RDS for Oracle
- Amazon RDS for PostgreSQL
- Amazon Redshift
- Apache Cassandra
- Apache Hive
- Apache Impala
- DataStax Enterprise
- Google BigQuery
- Greenplum
- IBM Db2
- IBM Db2 Warehouse
- IBM Netezza Performance Server
- IBM watsonx.data Presto
- Microsoft Azure Databricks
- Microsoft Azure SQL Database
- Microsoft SQL Server
- MongoDB
- MySQL
- Oracle
- PostgreSQL
- SAP ASE
- SingleStoreDB
- Snowflake
- Teradata
Metadata enrichment
In projects where generative AI capabilities are enabled, you can run metadata enrichment to add these types of metadata to your data:
- AI-generated display names
- AI-generated descriptions
- Gen AI based term assignments
Data quality
In projects where generative AI capabilities are enabled, you can have AI automatically generate plain English descriptions for data quality rules and the rule expressions that they contain. These descriptions help users understand and review complex data quality rules.