There are a lot of guides out there for getting started with Apache Airflow; how to build data pipelines, how to schedule processes, how to integrate with various data systems like Snowflake and Spark, but when we started our journey with Airflow, it took us time to find the best way to manage the lifecycle of our Airflow deployments—how to iterate and quickly deploy changes to production in our DAGs.
In this article, I’m going to discuss how we at IBM® Databand® manage our pipeline development lifecycle, including how we deploy iterations over multiple Airflow environments—development, staging, and production. This post will talk more about development culture and less about internal technical stuff (we need to save something interesting for the next post!)
I hope that some of these practices help you in your everyday data engineering.