A few years ago, an innocuous configuration change in a major retailer’s Microsoft Dynamics CRM meant that the number of inventory displayed on each item online ceased to reflect reality. The counter simply stopped updating.
People continued to purchase, but the volume number stayed constant. By the time the data engineering team was alerted, things had gotten bad.
Most items were available for purchase online, but also for in-store pickup. Lots of people chose in-store pickup. The orders were processed, and items that did not exist were nevertheless sold. So consumers visited stores where retail associates scrambled to find substitutes or promise discounts or somehow appease them. Lines formed. Store visitors had to wait to purchase and were turned off by so many people angrily jabbing their phones. And because it took days to discover the problem and for the pipeline to be fixed, it was a few days more before things were resolved.
Factoring in loss of brand reputation, the mistake cost tens of millions, and need not have happened.
Which is all to say, data issues compound. They can be difficult to spot and address, and grow unseen. It’s easy to fall into a pattern of assuming that everything is working just because you’re still drawing some insights, even while you’re accruing an increasing amount of subterranean data debt.
Furthermore, the truest signs of data quality issues also tend to be lagging indicators. For example, consumers telling you. Or as in the previous retail CRM example, thousands of retail managers and regional vice presidents telling you. That’s bad. That means that the data has been in your system for some time and it will take days for a fix to bear results. Talk about missing consumer expectations.
This is the situation the shipping startup Shipper found itself in, and why they invested so heavily in preventing it from ever occurring. Their data engineering team delivers as near to real-time as possible data to an application that helps ecommerce vendors deliver their inventory to a shipping port. It’s not just their consumers’ expectations they have to worry about—it’s their consumers’ consumers. And when their system was sometimes two days out of date, it created cascading ripples of missed expectations. Hence, they invested heavily in data quality management and tools that could give them early warning alerts with automatic checks.
Data quality management is a way to make the data quality checks automatic and pervasive, so you’re combating the forces of entropy on your datasets and pipelines with an equal and opposite amount of force.