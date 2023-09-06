Detection



It’s important that this cycle starts with detection because the bedrock of the DataOps movement is founded on a data quality initiative.



This first stage of the DataOps cycle is validation-focused. These include the same data quality checks that have been used since the inception of the data warehouse. They were looking at column schema and row-level validations. Essentially, you are ensuring all datasets adhere to the business rules in your data system.



This data quality framework that lives in the detection stage is important but reactionary by its very nature. It’s giving you the ability to know whether the data that’s already stored in your data lake or data warehouse (and likely already being utilized) is in the form you expect.



It’s also important to note that you are validating datasets and following business rules you know. If you don’t know the causes of issues, you cannot establish new business rules for your engineers to follow. This realization fuels the demand for a continuous data observability approach that ties directly into all stages of your data lifecycle, starting with your source data.



Awareness



Awareness is a visibility-focused stage of the DataOps phase. This is where the conversation around data governance comes into the picture and a metadata-first approach is introduced. Centralizing and standardizing pipeline and dataset metadata across your data ecosystem gives teams visibility into issues within the entire organization.



The centralization of metadata is crucial to giving the organization awareness into the end-to-end health of its data. Doing this allows you to move toward a more proactive approach to solving data issues. If there is bad data that is entering your “domain,” you can trace the error to a certain point upstream in your data system. For example, Data Engineering Team A can now go on to look at Data Engineering Team B’s pipelines and be able to understand what’s going on and collaborate with them to fix the issue.



The vice-versa also applies. Data Engineering Team B can detect an issue and trace what impact it will have on downstream dependencies. This means Data Engineering Team A will know that an issue will happen and can take whatever measures are necessary to contain it.

Iteration

Here, teams focus on data-as-code. This stage of the cycle is process-focused. Teams are ensuring that they have repeatable and sustainable standards that will be applied to all data development to ensure that they get the same trustworthy data at the end of those pipelines.



The gradual improvement of the data platform’s overall health is now made possible by the detection of issues, awareness of the upstream root causes and efficient processes for iteration.