Ingesting content through User Interface (UI)

Edit online

Using the watsonx Assistant for Z Management Console, you can upload content stored in a remote S3-compatible location and ingest it into a dedicated collection source.

About this task

Tenant-based content ingestion helps you organize and manage content in a multi-tenant environment. When you ingest content, you select a specific tenant so that all data is associated with that tenant.

Content ingested for a tenant is isolated and accessible only within that tenant’s scope. This ensures secure, tenant-specific data handling and prevents access across tenants.

You can also use the Service Provider option to ingest content that is shared across all tenants. This is useful for common content that should be available across all tenants.

For more information on multitenancy, see Multitenancy in watsonx Assistant for Z.

Procedure

Navigate to your IBM Software Hub instance and log in.
Click in the top-right corner of the UI and select IBM Cloud Pak for Data.
On the WXA4Z Management Console card, click Launch.
Click Content Ingestion, and click Ingest Data.
Note: Alternatively, you can also access the Content Ingestion page by following these steps:
1. Log in to the Red Hat OpenShift console.
2. Click Networking > Routes.
3. Click the URL for wxa4z-contentingestion-ui-route.
  You are redirected to the watsonx Assistant for Z Management Console login page.
From the Tenant Name drop-down, select one of the following options:
- Tenant name: Choose this option to ingest content specific to the selected tenant.
- Service Provider: Choose this option to ingest common content that all tenants can access.
Click Ingest Data to proceed.
The Ingest Content modal appears.
Provide the following information:
Remote source type
1. Select the source type, then click Next.
Credentials
1. URL: URL of the remote source from which the content must be ingested.
2. Access Key ID: Access key of the remote source.
3. Secret Access Key: Secret key of the remote source.
4. Bucket Name: Name of the bucket containing the source files.
  
  Note: Supported file formats include: PDF, HTML, DOCX, CSV, XLS, XLSX, PPTX, and Markdown.
Collection Source
1. Collection Source: Choose a collection source where you want to ingest your content. You can either select one from the available list or create a new collection source.
  To create a collection source:
  1. Select New collection source .
  2. Enter a name in the Collection name field.
2. Tabular Support: Stores data in a structured format, enabling the LLM to retrieve information efficiently (enabled by default).
3. PII Check: Filters sensitive information from the content source (enabled by default).
  The following PII checks are currently supported:
  - Financial identifiers: Bank account numbers and credit card numbers, including Amex, Diners, Discover, JCB, Mastercard, Visa, and other types.
  - Government identifiers: Social Security Numbers (SSN).
  - Contact information: Email addresses, IP addresses, MAC addresses, phone numbers, and credentials such as access keys (restricted to bearer tokens only).
  Note: Use the skip_pii flag to disable the check.
4. OCR: Processes and extracts images from the content source.
5. HAP: Checks for Hate, Abuse, and Profanity content. Files containing such content are not ingested (enabled by default).
  
  Note: Use the skip_hap flag to disable the check.
6. Click Ingest to begin processing.
Data in the processing stage appears under your defined collection source name from the specified bucket.
Once the ingestion completes, the collection source appears on the Content Ingestion page. You can:
- Search for a collection source by name using the search icon.
- Filter collection sources by type or ingestion status by using the filter icon.
- Delete collection sources by selecting them and clicking Delete.

What to do next

Use the AI Assistant to submit queries and receive relevant responses from the documents you ingested.