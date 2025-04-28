Unstructured data is information that does not have a predefined format. Unstructured datasets are massive (often terabytes or petabytes of data) and contain 90% of all enterprise-generated data.1

The proliferation of unstructured data is driven by its diverse and extensive data sources—including text documents, social media, image and audio files, instant messages and smart devices. Almost all new data generated today is unstructured: every message sent, photo uploaded or sensor triggered adds to the growing volume.

Unlike structured data (which has a predefined data model) unstructured data does not easily conform to the fixed schemas of conventional databases. Instead, unstructured data is often stored in file systems, non-relational (or NoSQL databases) or in data lakes.

Unstructured data’s complexity and nonuniform data structure also necessitate more sophisticated methods of data analysis. Technologies such as machine learning (ML) and natural language processing (NLP) are commonly leveraged to extract insights from unstructured datasets.

In the recent past, unstructured data was considered dark data. The challenges of unstructured data (that is, its volume and lack of uniformity) rendered it unusable for many business use cases.

Today, however, enterprises with abundant unstructured data possess a significant strategic asset. When combined, structured and unstructured data provide a complete view of data across an enterprise. And, especially relevant in this current moment, unstructured data can also help businesses unlock the full potential of generative AI (gen AI).