You hear the sound of that dreaded slack notification. You’ve received an alert for low completeness in a critical table that lives in your warehouse. You’re responsible for your data platform and managing data quality across hundreds of pipelines. It’s up to you to fix this. The next few hours are about to be painful.
Now, you’re jumping between lineage tools, dashboards, and your orchestrator UI to try to figure out which pipelines feed into this table. All the while, the tickets from downstream consumers start to roll in. No pressure, right?
An hour later, you’ve narrowed your search from hundreds of pipelines, to tens. Which pipeline is the culprit? Which log will give you the answer you need?
Looks like you only have one way of finding out: manually combing through each individual log to find that needle in the haystack. You prepare yourself for an arduous process.
Finally, you find your clue. After looking through dozens of logs, you find your error; a Type Change error. Great, one of the most dreaded errors of all—an error that easily slips under the radar of most pipeline monitoring tools and orchestrators.
You still have a long road ahead of you. It’s been hours since the first alert fired, and you don’t know why the schema change happened or how to fix it.
If the data source in this example is internal, you might be asking yourself:
- Is there an error in the user code?
- Did someone drop a column by mistake?
- Was it intentional?
If the data source is external, you open up a whole other can of worms:
- Did they change the schema purposely?
- Is this a bug in their user code?
- Who can I contact to get more information?
- How long will it take for them to respond and implement a fix?
In either scenario, you will be stuck twiddling your thumbs while you track the right people down, reconfigure your pipelines to adhere to the new logic, and backfill the data. At this point, bad data is already infecting your automatic processes and data SLAs may have already been missed.
This is the reality for most data engineers. You know there’s a problem in your warehouse, but you don’t know much else. Your data observability starts and ends in the warehouse.
It doesn’t have to be that way.