What is a data lake?

A data lake is a shared data environment that comprises multiple repositories and capitalizes on big data technologies. It provides data to an organization for a variety of analytics processing including:

  • discovery and exploration of data
  • simple ad hoc analytics
  • complex analysis for business decisions
  • reporting
  • real-time analytics

Organizations are increasingly exploring the data lake approach to address demands for an agile yet secure and well-governed data environment that supports both structured and unstructured data.

 

Learn more about what a data lake is - and what it isn't.

A data lake is . . .

  • An environment where users can access vast amounts of raw data
  • An environment for developing and proving an analytics model, and then moving it into production
  • An analytics sandbox for exploring data to gain insight
  • An enterprise-wide catalog that helps users find data and link business terms with technical metadata
  • An environment for enabling reuse of data transformations and queries

A data lake is not . . .

  • A data warehouse or data mart for housing all of the data in an enterprise
  • A replacement operational data store (ODS)
  • A high-performance production environment
  • A production reporting application
  • A purpose-built system to solve a specific problem (though a purpose-built data mart could be fed from a data lake)

Data Lake Executive Brief

The Governed Data Lake Approach

Learn how to expand self-service data access to accelerate analytics and actionable insights.

Read the Data Lake brief