What is a data lake?

A data lake is a shared data environment that comprises multiple repositories and capitalizes on big data technologies. It provides data to an organization for a variety of analytics processing including:

  • discovery and exploration of data
  • simple ad hoc analytics
  • complex analysis for business decisions
  • reporting
  • real-time analytics

Organizations are increasingly exploring the data lake approach to address demands for an agile yet secure and well-governed data environment that supports both structured and unstructured data.


Learn more about what a data lake is - and what it isn't.

A data lake is . . .

  • An environment where users can access vast amounts of raw data
  • An environment for developing and proving an analytics model, and then moving it into production
  • An analytics sandbox for exploring data to gain insight
  • An enterprise-wide catalog that helps users find data and link business terms with technical metadata
  • An environment for enabling reuse of data transformations and queries

A data lake is not . . .

  • A data warehouse or data mart for housing all of the data in an enterprise
  • A replacement operational data store (ODS)
  • A high-performance production environment
  • A production reporting application
  • A purpose-built system to solve a specific problem (though a purpose-built data mart could be fed from a data lake)


Data Lake: Taming the Data Dragon

Learn important benefits of a data lake, as a trusted data asset, to IT and data scientists allowing for agility in responding to project needs, while keeping costs down, maintaining the resilience of business-critical data provisioning and preventing ungoverned data environments and usage from springing up.

The Governed Data Lake Approach

See how a governed data lake offers a powerful approach to capitalizing on the massive influx of data available today. By following some key best practices for constructing a governed data lake, organizations can provide ready access to a wide range of data across the enterprise while helping ensure data is trustworthy and secure.