Digitize Documents
Converts physical and scanned documents into searchable, editable digital formats; enables efficient document management, data extraction, and workflow automation at scale.
Detailed description
The Digitize Documents Service is a REST API microservice that converts PDF and DOCX documents into structured formats (e.g., Markdown, plain text, or JSON) and enables their ingestion into vector databases for RAG applications. The service handles the full processing pipeline—document conversion, text and table extraction, chunking, embedding generation, and indexing—all executed through asynchronous job processing with concurrent request management.
For full endpoint specifications and integration details, refer to the Digitize Documents API documentation.
Deployment & Usage
This service can be deployed as a standalone service on podman runtime.