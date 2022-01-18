Unlike a data warehouse, a data lake can store both structured and unstructured data, and it does not require a defined schema to store data, a characteristic known as “schema-on-read.” This flexibility in storage requirements is particularly useful for data scientists, data engineers, and developers, allowing them to access data for data discovery exercises and machine learning projects.

A recent Voice of the Enterprise report from 451 Research determined that almost “three quarters (71%) of enterprises are currently using or piloting a data lake environment or plan to do so within the next 12 months, and 53% of respondents are already in deployment or POC.” Respondents in this report highlight business agility as a key benefit from their deployments, which can vary. They also found that data lakes are typically hosted either in the cloud, or "on premises" through an organization's data centers.



While adopters are finding value in data lakes, some can fall victim to becoming data swamps or data pits. A data swamp is the result of a poorly managed data lake-that is, it lacks in appropriate data quality and data governance practices to provide insightful learnings. Without the proper oversight, the data in these repositories will be rendered useless. Data pits, on the other hand, are similar to data swamps in that they provide little business value, but the source of the data issue is unclear in these instances. Similarly, involvement from data governance and data science teams can help to safeguard against these pitfalls.