Configuring Content-Aware Storage (CAS) through CLI

You can use the CRs to configure CAS.

Ensure all the prerequisites are met. See Planning and prerequisites. To install CAS from the user interface of IBM Fusion, see Installing Content-Aware Storage (CAS).

Create a Scale datasource

oc apply -f - <<EOF
apiVersion: cas.isf.ibm.com/v1beta1
kind: DataSource
metadata:
  name: <datasource name>
spec:  
 scale:
  fileSystemName: <filesystem name>
  path: <fileset path>
EOF

Create a S3 datasource

For prerequisites, create a key/value secret from the OpenShift® console in CAS namespace. For other prerequisites, see Prerequisites for S3 type data source.

The following example is based on Amazon AWS.
apiVersion: cas.isf.ibm.com/v1beta1
kind: DataSource
metadata:
  name: <datasource name>
  namespace: ibm-cas
spec:  
 s3:
  bucket: <bucket name>
  endpoint: <endpoint>
  credentials:
    accessKeyRef:
      key: ""
    secretKeyRef:
      key: ""
    secretName: <Secret name>
  fileSystemName: <filesystem name>
  provider: aws
If a Group ID is defined for the underlying fileset, run the following command to add the annotation to the previously created datasource previously:
oc annotate DataSource <datasource name> group-id='310' --overwrite
Where, <datasource name> and 310 are example values that can change depending on the datasource name and the GID set in Scale.

Create a domain and add datasource to it

kubectl apply -f - <<EOF
apiVersion: cas.isf.ibm.com/v1beta1
kind: Domain
metadata:
  name: <domain name>
spec: 
  dataSources: 
  - <datasource name>
  collections:
  - name: "<domain name>"
EOF

Create a document processor

Document processor type field refers to the document processing engine that is used to ingest documents. CAS can be configured to use either NVIDIA Multimodal Services or Docling Multimodal Services.

The keys for each type are as follows:
  • NVIDIA Multimodal Processing Engine - nvidia_multimodal
    oc apply -f - <<EOF
    apiVersion: cas.isf.ibm.com/v1beta1
    kind: DocumentProcessor
    metadata:
      name: <domain name>
    spec:
      type: nvidia_multimodal
      domains:
      - <domain name>
    EOF
  • Docling Multimodal Processing Engine - docling_multimodal
    oc apply -f - <<EOF
    apiVersion: cas.isf.ibm.com/v1beta1
    kind: DocumentProcessor
    metadata:
      name: <domain name>
    spec:
      type: docling_multimodal
      domains:
      - <domain name>
    EOF
Advanced configuration for the NVIDIA Multimodal document processor
To extract content from tables, charts, and images, from the documents in the dataset, enable additional settings in the document processor custom resource (CR).
Note: Enabling these options increases document processing time.
Supported file types for NVIDIA Multimodal document processor: .pdf, .txt, .html, .json, .md, .docx, .pptx, .bmp, .jpeg, .png, .tiff, .sh

Supported options for nvidia_multimodal extract task:

  • extractCharts: Extracts chart graphs from documents.
  • extractTables: Extracts tabular data from documents.
  • extractImages: Extracts content from image files and images that are contained within documents.
The following example shows how to configure these options in the nvidia_multimodal extract task.
oc apply -f - <<EOF
apiVersion: cas.isf.ibm.com/v1beta1
kind: DocumentProcessor
metadata:
  name: <domain name>
spec:
  type: nvidia_multimodal
  tasks:
    - name: Extract
      nvidiaExtract:
        extractCharts: true
        extractImages: true
        extractTables: true
        extractText: true
        extractTextDepth: page
    - name: Split
      nvidiaSplit:
        chunkOverlap: 150
        chunkSize: 1024
        tokenizer: meta-llama/Llama-3.2-1B
    - name: Embed
      nvidiaEmbed: {}
  domains:
  - <domain name>
EOF
If you want to configure file ingestion filtering by using Data Cataloging, add the additional supported file types for NVIDIA Multimodal document processor to Data Cataloging's filtering rule:
oc patch rule allowed-filetypes -n ibm-data-cataloging --type merge -p '{"spec": {"metadata": [{"field": "filetype","operator": "in","value": ["doc", "docx", "pdf", "txt", "gif", "html", "json", "md", "pptx", "bmp", "jpeg", "png", "tiff", "sh" ]}]}}'
rule.datacataloging.ibm.com/allowed-filetypes patched
Note: You must update the filtering rule every time you enable Data Cataloging filtering, if you later disable it.

Create a CAS Resource Access Control (CRAC) CR

apiVersion: cas.isf.ibm.com/v1beta1
kind: CasResourceAccessControl
metadata:
 name: <CR name: "<Domain name>-access">
 namespace: <CAS installation namespace>
spec:
 resourceRef:
  name: <Domain name>
  type: Domain
 subjects:
  groups:
   - name: <Name of the IDP/OpenShift group who needs access of given domain>
    role: READER
  users:
   - name: <Name of the IDP/OpenShift user who needs access of given domain>
    role: READER

The CAS Resource Access Control (CRAC) CR status contains validation details for resources, users, and groups.

  • If resource is not valid, the condition status indicates the exact error.

  • If users in the specific user list are not valid, the condition status indicates the name of invalid users.

  • If groups in the specific group list are not valid, the condition status indicates the name of invalid groups.

  • If the resource, users, and groups are valid, the condition status reflects the validation status message.
    The following example shows how the status message appears in the CRAC CR:
    status:
      conditions:
        - lastTransitionTime: '2025-08-29T02:15:42Z'
          message: 'Resource: gt20 is validated'
          observedGeneration: 1
          reason: AsExpected
          status: 'True'
          type: ValidatedResource
        - lastTransitionTime: '2025-08-29T02:15:42Z'
          message: Users are validated
          observedGeneration: 1
          reason: AsExpected
          status: 'True'
          type: ValidatedUsers
        - lastTransitionTime: '2025-08-29T02:15:42Z'
          message: Groups are validated
          observedGeneration: 1
          reason: AsExpected
          status: 'True'
          type: ValidatedGroups

To configure CAS to restrict specific users to specific CAS domains through user interface, see Configuring CAS Resource Access Control (CRAC).

To remove search access for a domain through CLI, delete the relevant users and groups from the CRAC CR and save the changes. Removed users and groups then lose authorization to access the query search API for the domain.