Setting size limits for files in external vector stores
You can set custom file size limits for documents in external vector stores that are used to ground foundation model prompts with contextual information.
Before you begin
The IBM watsonx.ai service must be installed.
You must be a cluster administrator.
Procedure
You can change the default file size limits for documents stored in external vector stores such as Elasticsearch and watsonx.data™ Milvus.
- Setting file size limits for the cluster
- Edit the
watsonxaiifmcustom resource to specify the size limit in megabytes (MB), kilobytes (KB), or gigabytes (GB) for each file type stored in a vector data store. The file size limits apply across the cluster.The following table describes which attributes you can set in your custom resource to specify size limits for various file types:
For example, run the following command to set size limits for PDF and HTML files in your vector data store:File type Custom resource attribute CSV csv_file_type_limitDOC doc_file_type_limitHTML html_file_type_limitJSON json_file_type_limitPPTX pptx_file_type_limitPDF pdf_file_type_limitTXT txt_file_type_limitYAML yaml_file_type_limitXLS xls_file_type_limitXML xml_file_type_limitoc patch watsonxaiifm watsonxaiifm-cr \ --namespace=${PROJECT_CPD_INST_OPERANDS} \ --type=merge \ --patch='{"spec":{"file_limits": {"pdf_file_type_limit": "50MB", "html_file_type_limit": "20MB"}}}'Attention: If you override the file size limits for your cluster and then restart thewatsonxaiifmoperator during a service upgrade, the cluster-level settings are removed and the default file size limits are applied. You must reapply any cluster-level file size limit override configuration. - Setting file size limits for a project
- Use the asset files API to set size limit for each file type stored in your vector data store.
For API method details, see Data and AI Common Core Software APIs. Note: If you specify both the cluster-level and project-level file size limit settings, the project-level settings take precedence and are applied to your installation.Review the following requirements for project-level file size limit configuration:
- You must provide the file size limit override configuration in a JSON format only. An invalid JSON format will not set the file size override configuration correctly.
- If you do not set the limit for a specific file type, the cluster-level setting for that file type is used.
File type JSON configuration attribute CSV WX_MIME_TYPE_CSVDOC WX_MIME_TYPE_DOCHTML WX_MIME_TYPE_HTMLJSON WX_MIME_TYPE_JSONPPTX WX_MIME_TYPE_PPTXPDF WX_MIME_TYPE_PDFTXT WX_MIME_TYPE_TXTYAML WX_MIME_TYPE_YAMLXLS WX_MIME_TYPE_XLSXML WX_MIME_TYPE_XMLYou can set file size limits for multiple projects simultaneously in a single configuration file. Run the following request to override default values and set custom file size limits specified in a JSON configuration file:
curl --location --request PUT '<cluster_url>/v2/asset_files/config/override_config.json?account_id=999&root=true' \ --header 'Authorization: Bearer ${ACCESS_TOKEN}' \ --form 'file=@"/Users/<user_system_name>/Documents/override_config.json"'Important: When you run the asset files API cURL command, make sure to specify limits for every project for which you want to change the file size limits. To preserve the override settings for existing projects and update the configuration for new projects, make sure to specify the configuration for the complete list of affected projects in your workspace in the JSON configuration file. Settings are deleted for any projects that are not included in the configuration file.The following file is an exampleoverride_config.jsonthat sets custom size limits for PDF and TXT file types for two projects:
For details about how to retrieve the watsonx™ project ID, see Finding the project ID.{ "project_overrides": { "<watsonx project ID 1>": { "vector_indexes": { "WX_MIME_TYPE_PDF": "10MB", "WX_MIME_TYPE_TXT": "10MB", "WX_MIME_TYPE_CSV": "10MB", "WX_MIME_TYPE_HTML": "10MB", "WX_MIME_TYPE_JSON": "10MB", "WX_MIME_TYPE_XLS": "10MB", "WX_MIME_TYPE_PPTX": "10MB", "WX_MIME_TYPE_DOC": "10MB" } }, "<watsonx project ID 2>": { "vector_indexes": { "WX_MIME_TYPE_PDF": "10MB", "WX_MIME_TYPE_TXT": "10MB", "WX_MIME_TYPE_CSV": "10MB", "WX_MIME_TYPE_HTML": "10MB", "WX_MIME_TYPE_JSON": "10MB", "WX_MIME_TYPE_XLS": "10MB", "WX_MIME_TYPE_PPTX": "10MB", "WX_MIME_TYPE_DOC": "10MB" } } } }- Optional: Run the following command to verify that your file size limit settings are
applied
correctly:
The settings may take up to 15 minutes to apply.curl --location --request GET '<cluster_url>/v2/asset_files/config/override_config.json?account_id=999&root=true' \ --header 'Authorization: Bearer ${ACCESS_TOKEN}'
What to do next
To get started with indexing your documents by adding the files to vector data stores, see Adding vectorized documents.