In terms of its discoverability for timely and complete data analytics initiatives, dark data may be structured data, unstructured data or semi-structured data.
Structured data is information added to clearly defined spreadsheet or database fields before being stored.
Server log files, Internet of Things (IoT) sensor data, customer relationship management (CRM) databases and enterprise resources planning (ERP) systems are examples of dark data created from structured data sources.
Although most forms of sensitive data, like electronic bank statements, medical records and encrypted customer data are typically in structured form, it is difficult to view and categorize because of permission issues.
Unlike structured data, unstructured data includes information that can’t be organized in databases or spreadsheets for analysis without conversion, codification, tiering and structuring.
Email correspondences, PDFs, text documents, social media posts, call center recordings, chat logs and surveillance video footage are examples of dark data created from unstructured data sources.
Semi-structured data is unstructured data that contains some information in defined data fields. Although it doesn’t have the same ease of dark data discovery as structured data, it is able to be searched or catalogued.
Examples include HTML code, invoices, graphs, tables and XML documents.