Data Cataloging architecture

Data Cataloging is an extensible platform that provides exabyte scale data ingest, data visualization, data activation, and business-oriented data mapping.

Note: All references to "Spectrum Discover" in the images refer to "Data Cataloging".

Exabyte-scale data ingest

  • Scan billions of files and objects in a day
  • Real-time event notifications
  • Automatic indexing

Data Visualization

  • Fast queries of billions of records
  • Multi-faceted search
  • Drilldown Dashboard

Data Activation

  • Application software development kit (SDK)
  • Extensible architecture
  • Solution blueprints

Business-oriented data mapping

  • System-level data tagging
  • Contextual data tagging
  • Policy-driven workflows
The following figure illustrates a high-level view of the Data Cataloging architecture.
Figure 1. Data Cataloging architecture
Data Cataloging architecture diagram

Data management

Data Cataloging connects to the data sources shown in the architecture image (Data Cataloging architecture), and automatically harvests and indexes the system metadata where the system metadata refers to certain information. This might include the following information.
  • It might include the names of the files and objects.
  • It might include the bucket or path where the data resides.
  • It might include the size.
  • It might include the time the data sources were last modified.

After the data is ingested, analytics are automatically applied to classify and group the data according to the different system metadata attributes. The data can be inspected automatically in Data Cataloging by using built-in content search capabilities to identify sensitive and personally identifiable information and perform data classification. The content inspection capabilities can also be used by researchers and data scientists to extract content from their data sets. This easy-to-use extraction ability assists with data discovery.

The records that are maintained by Data Cataloging can also be further enriched with custom metadata tags that map the data to business constructs and further increase the value of the data.

You can use Data Cataloging catalog to gain insight about your data and to find your data easily.

The Data Cataloging architecture also supports a community-supported catalog of open source applications that enhance and customize the capabilities of Data Cataloging with third-party extensions. Users can find and install available applications and can develop and share new applications that use an SDK that contains sample code and a fully published API. For more information, see Creating your own applications to use in the Data Cataloging application catalog. For more information, see the topic Creating your own applications to use in the Data Cataloging application catalog in the Administration section.

Figure 2. Application SDK architecture
Application SDK architecture