4. No versioning in Airflow Scheduler
You’ll find many traditional software development and DevOps practices missing from Airflow, and a big one of those is the ability to maintain versions of your pipelines. There’s no easy way to document all that you’ve built and, if needed, revert to a prior version. If, for example, you delete a Task from your DAG and redeploy it, you’ll lose the associated metadata on the Task Instance.
This makes Airflow somewhat fragile, and unless you’ve written a script to capture this yourself, it makes debugging issues much more difficult. It isn’t possible to backtest possible fixes against historical data to validate them.
Again, Airflow does provide the formal code representation. Your challenge is applying other software development and DevOps tools to fill the missing functionality.
5. Windows users can’t use it locally
Not much else to say here. Unless you use specific Docker compose files which aren’t part of the main repository, it’s not possible.
6. Debugging is time-consuming
Airflow Scheduler not working? Better refill your coffee. You may have some time-consuming debugging ahead of you.
That’s because, in our opinion, Airflow doesn’t sufficiently distinguish between operators that orchestrate and operators that execute. Many operators do both. And while that may have helped with the initial coding of the platform, it’s a fatal inclusion that makes it very difficult to debug. If something goes wrong, your developers will have to examine their DataFlow parameters first, then the operator itself, every single time.
For this reason, tools like Databand can be a big help. Databand excels in helping you understand the health of your infrastructure at every level: global Airflow, DAG, task, and user-facing. Instead of spending data engineering time on learning highly specific features, Databand allows data engineers to really focus on solving problems for the business.
Apache Airflow—a stellar option despite flaws
Like any open source contributor who takes time to propose new changes, we hope this article is construed as the love note that it is. We here at Databand are active contributors to the Airflow community and eager to see it grow beyond its existing limitations and to better serve more ETL and data science use cases.
As we said before, 86% of users plan to stick with it over other operation engines. Another 86% say they’d highly recommend it. We’re happy to say we belong to both groups—it’s a great tool. And for those of you just getting acquainted with Airflow, just know that if you go in with aforementioned issues in mind, Airflow Scheduler can be well-worth the effort. See how Databand brings all your Airflow observability activities together to simplify and centralize your Apache Airflow observability. If you’re ready to take a deeper look, book a demo today.