We’re excited to announce a new partnership between IBM and Unstructured, an IBM Ventures portfolio company. Together, we’re addressing one of the most significant barriers to scaling enterprise AI: the preparation of unstructured data for generative AI.
Approximately 80% of enterprise data is unstructured—residing in PDFs, emails, collaboration platforms and document repositories. Yet less than 1% of this data is in a format suitable directly for AI consumption. This gap represents both a massive opportunity and a critical challenge for organizations scaling AI initiatives.
Traditional approaches to unstructured data preparation are holding enterprises back. Manual pipelines require 6-12 months to build and remain brittle, breaking with each new document format or source system change. Engineering teams spend valuable time on data plumbing rather than AI innovation. Without proper structure and consistency, AI models deliver unreliable results, undermining trust and delaying time-to-value.
IBM watsonx.data addresses this challenge as the industry’s only hybrid, open data lakehouse built for AI and analytics. It simplifies access, preparation and governance across both structured and unstructured data, helping organizations establish a trusted data foundation for generative AI at scale.
Through this partnership, Unstructured extends the power of watsonx.data to access and transform unstructured data into AI-ready formats to fuel reliable, scalable and trusted generative AI.
Unstructured provides more than 30 pre-built connectors to enterprise data sources including SharePoint, Google Drive, Salesforce, Confluence, Box and Dropbox. With support for over 70 file types—from PDFs with complex layouts to scanned images, emails and Microsoft Office documents—organizations can access and transform their complete data estate.
Unlike basic text extraction tools, Unstructured’s intelligent document understanding preserves critical elements such as tables, hierarchies and semantic structure, ensuring AI models receive contextually rich data rather than just raw text.
A no-code visual workflow builder empowers business and data teams to design and manage data pipelines without requiring specialized engineering resources. For organizations with development teams, a comprehensive API provides programmatic control and customization options.
Automatic incremental synchronization processes ingest only new and changed documents, reducing compute costs and keeping AI applications current. Multi-source orchestration coordinates data flows across multiple systems simultaneously, eliminating manual coordination overhead.
Unstructured is SOC 2 Type II, ISO 27001, HIPAA, and GDPR compliant, meeting the rigorous security and privacy standards enterprise IT organizations require. Together with watsonx.data, the solution provides version control, data lineage tracking and granular access controls that honor source system permissions throughout the data pipeline.
Unstructured delivers semantically enriched, properly chunked, and embedded data optimized for modern AI architectures:
With watsonx.data and Unstructured, teams can move quickly with production-ready pipelines combining speed, flexibility and AI-readiness all in one integrated solution.
If watsonx.data is the data engine powering generative AI applications, Unstructured provides the fuel. Together, watsonx.data and Unstructured deliver AI-ready unstructured data and enable advanced retrieval-augmented generation patterns that improve the accuracy and reliability of AI.
Enterprises can accelerate time-to-value by replacing manual document preparation with automated, intelligent processing. Governance policies flow from document source systems all the way to the AI applications, improving trust and transparency at every stage. By removing the bottleneck of unstructured data preparation, and providing a data foundation with unified data access, preparation, and governance, organizations can finally unlock the full potential of their unstructured content to power reliable, enterprise-grade AI.
To see watsonx.data and Unstructured in action, join our upcoming joint webinar or book a meeting. Together, we’ll help you move from spending time preparing messy, unstructured data to accelerating enterprise-grade AI agents and applications, powered by AI-ready data, at scale.