Data Lake

Discovery of facts, patterns in data, and ad hoc reporting

IBM + Hortonworks

IBM and Hortonworks have partnered to bring you the future of data science!

Read the press release Contact us to learn more

What is a data lake?

A data lake is a storage repository that holds an enormous amount of raw or refined data in native format until it is accessed. The term data lake is usually associated with Hadoop-oriented object storage in which an organization’s data is loaded into the Hadoop platform and then business analytics and data-mining tools are applied to the data where it resides on the Hadoop cluster. However, data lakes can also be used effectively without incorporating Hadoop depending on the needs and goals of the organization. The term data lake is increasingly being used to describe any large data pool in which the schema and data requirements are not defined until the data is queried.
NEEDS ALT ATRIBUTE

Features

Easier data access to a broad range of data across the organization

Access structured and unstructured data residing both on premises and in the cloud.

Faster data preparation

Take less time to access and locate data, thereby speeding up data preparation and reuse efforts

Enhanced agility

Components of the data lake can be employed as a sandbox that enables users to build and test analytics models with greater agility.

More accurate insights, stronger decisions

Track data lineage to help ensure data is trustworthy.

Capabilities

Hadoop

Manage large volumes and different types of data with open source Apache Hadoop systems. Tap into unmatched performance, simplicity and standards compliance to use all data, regardless of where it resides. Visualize, filter and analyze large data sets into consumable, business-specific contexts.

LEARN MORE

Spark

Build algorithms quickly, iterate faster and put analytics into action with Apache Spark. Easily create models that capture insight from complex data, and apply that insight in time to drive outcomes. Access all data, build analytic models quickly, iterate fast in a unified programming model and deploy those analytics anywhere.

LEARN MORE

 

Stream computing

Stream computing enables organizations to process data streams which are always on and never ceasing. This helps them spot opportunities and risks across all data in time to effect change.

LEARN MORE

Governance and Metadata Tools

Governance and Metadata Tools enable you to locate and retrieve information about data objects as well as their meaning, physical location, characteristics, and usage.

LEARN MORE

Resources