Creating a domain and connecting with data source

Domains collect, transform, and transfer data from multiple data sources to a destination, enabling efficient data flow for analysis and AI models.

About this task

When you configure a domain, you automatically configure the DocumentProcessorCR. The processor streamlines data flow for analysis, reporting, and powering AI models effectively.

Procedure

  1. Go to Content-aware Storage > Domains.
  2. Click Create domain.
  3. In the Create domain dialog, enter the name of the domain and select the document processor engine.
    Note: NVIDIA multimodal requires NVIDIA NIM to be configured first.
  4. Click Next.
  5. Select the content types to be processed by the domain (only applicable to NVIDIA multimodal). Then click Next.
  6. Select an option to either connect a new data source or use an existing one.
    1. If you choose to use an existing data source, click Finish. The domain gets created.
    2. If you choose to connect a new data source, click Next and proceed with the next steps to enter the required data source details. For more information about the data source fields, see Creating a data source. Upon completing, both the data source and domain get created.

    If the status of the data source gets stuck in connecting state, ensure that you enable watch creation. For more information about the configuration, see Configuring Scale user to enable watch creation.

    Note: You can delete a domain from the Domain details page only if no data source exists for it.

What to do next

You can view file processing metrics and data source status for each domain. Metrics include the total number of files and the file size within the selected domain that provides improved visibility into data volume and processing activity.

Tip: To customize the metrics polling interval, set the METRICS_POLLING_INTERVAL_MS parameter in the cas-config ConfigMap located in the CAS installation namespace:
kind: ConfigMap
apiVersion: v1
metadata:
  name: cas-config
  namespace: ibm-cas
data:
  ENABLE_FILE_LEVEL_SECURITY: 'true'
  HF_ACCESS_TOKEN_SECRET_NAME: hf-access-token
  METRICS_POLLING_INTERVAL_MS: '60000' ---> Metrics Polling interval in millisecond
  NVMM_EMBED_SERVICE: 'http://nv-ingest-embedqa.nv-ingest.svc.cluster.local'
  NVMM_NIM_SERVICE: 'http://nv-ingest.nv-ingest.svc.cluster.local'