The accuracy of these user segments can have an impact on revenue generation, so it’s critical that Kanchwala and his team are using the most accurate data, optimized for these campaigns. For example, less accuracy in the models could result in an advertising campaign that under-indexes to the segment the customer aims to reach or that does not reach the intended audience segment.
Since they use data pipelines such as Apache Airflow and Sagemaker to make these model predictions, the pipelines need to be reliable, and the data needs to be accurate.
“For our perspective, a lot of business decisions are being made on the segments and predictions that we make,” says Kanchwala. “As we built these segments, we strive to ensure that the data going into the prediction pipelines are accurate so that the predictions coming out of those pipelines are accurate. Any loss of accuracy here could impact someone’s business decisions or bottom line.”
Like for most data and ML engineering teams, it was challenging to track model performance over time and input proactive alerting to be notified when changes occur. If his team is unaware of data issues, then a customer could be making decisions using predictions based on outdated or less relevant data.