How I reduced $10000 Monthly AWS Glue Bill to $400 using Airflow
During my time as a Devops Engineer at Vance, we were running around 80 ETL pipelines on AWS Glue, but as our workloads scaled, so did our costs—hitting a staggering $10,000 per month. This wasn’t sustainable. After analyzing our pipeline, we realized that AWS Glue's serverless nature was costing us heavily for idle time and unnecessary compute To fix this, I migrated our ETL workloads to Apache Airflow, running on EC2 instances with ECS, and orchestrated everything using Terraform. The result? A 96% cost reduction, bringing our bill down to just $400 per month, without compromising on performance. While Airflow is a great alternative to Glue, there's little documentation on setting it up properly with Terraform with Celery Executor — especially for cost optimization. This blog walks you through how we did it, the challenges we faced, and how you can do the same to slash your AWS Glue costs. Needless to say, this was indeed a war Introduction I am Akash Singh, a third year engineering student and Open Source Contributor from Bangalore. Here is my LinkedIn, GitHub and Twitter I go by the name SkySingh04 online. The three parts of the pain Migrating from AWS Glue to Apache Airflow involves setting up three core components: Webserver – The UI for managing DAGs (Directed Acyclic Graphs) and monitoring job execution. Scheduler – Responsible for triggering and scheduling DAG runs. Workers – Execute the actual tasks in the DAGs. Using Terraform, we provisioned ECS to run all three parallely and enable them to communicate with each other, which we will get to next. Once Airflow is up and running, the next step is to migrate our ETL workflows. We will Glue jobs into Airflow DAGs, and then nuke the the Glue jobs, marking the final step in cutting down our AWS costs. The Magical Dockerfile You can use the following Dockerfile and push it to ECR and reference it in the upcoming configs : FROM apache/airflow:latest-python3.9 USER root RUN apt-get update && \ apt-get install -y --no-install-recommends \ git \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* RUN mkdir -p /opt/airflow/dags /opt/airflow/logs && \ chown -R airflow:root /opt/airflow && \ chmod -R 755 /opt/airflow/logs USER airflow RUN pip install --no-cache-dir \ apache-airflow-providers-github \ apache-airflow-providers-amazon \ apache-airflow-providers-mysql \ apache-airflow-providers-mongo \ apache-airflow[celery,redis] \ pandas COPY --chown=airflow:root dags/* /opt/airflow/dags/ ENV AIRFLOW__LOGGING__BASE_LOG_FOLDER=/opt/airflow/logs \ AIRFLOW__LOGGING__WORKER_LOG_SERVER_PORT=8793 \ AIRFLOW__LOGGING__LOGGING_LEVEL=INFO \ AIRFLOW__LOGGING__LOG_FORMAT='[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s' \ AIRFLOW__LOGGING__SIMPLE_LOG_FORMAT='%(asctime)s %(levelname)s - %(message)s' \ AIRFLOW__LOGGING__DAG_PROCESSOR_LOG_TARGET=file \ AIRFLOW__LOGGING__TASK_LOG_READER=task \ AIRFLOW__LOGGING__DAG_FILE_PROCESSOR_LOG_TARGET=/opt/airflow/logs/dag_processor_manager/dag_processor_manager.log \ AIRFLOW__LOGGING__DAG_PROCESSOR_MANAGER_LOG_LOCATION=/opt/airflow/logs/dag_processor_manager/dag_processor_manager.log ENV AIRFLOW__CORE__DAGS_FOLDER=/opt/airflow/dags RUN mkdir -p /opt/airflow/logs/scheduler \ /opt/airflow/logs/web \ /opt/airflow/logs/worker \ /opt/airflow/logs/dag_processor_manager \ /opt/airflow/logs/task_logs USER root RUN chown -R airflow:root /opt/airflow && \ chmod -R 755 /opt/airflow USER airflow This Dockerfile will be used for all three of our components and very nicely setups logging as well. The DAGs are directly baked into the Docker Image, we will get to that in a bit. Builld the image, tag it, push it to ECR and move to the next step! // Detect dark theme var iframe = document.getElementById('tweet-1880591497571246356-197'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1880591497571246356&theme=dark" } Airflow Web Server A Terraform script can be written to set up Apache Airflow on AWS using ECS (Elastic Container Service) with EC2 launch type. We need to make sure to add: CloudWatch Logging: Creates a log group (/ecs/airflow) with a retention of 3 days. Security Groups: Allows inbound HTTP (port 80) and HTTPS (port 443) traffic for the Application Load Balancer (ALB). Enables unrestricted outbound traffic. TLS/SSL with ACM & Route 53: Provisions an ACM (AWS Certificate Manager) certificate for airflow.internal.example.com using DNS validation. Configures Route 53 DNS records to resolve the Airflow URL to the ALB. Application Load Balancer (ALB): Creates an internal ALB for the Airflow webserver, supporting IPv4 & IPv6 (dualstack). Configures an HTTP listener (port 80) to

During my time as a Devops Engineer at Vance, we were running around 80 ETL pipelines on AWS Glue, but as our workloads scaled, so did our costs—hitting a staggering $10,000 per month. This wasn’t sustainable. After analyzing our pipeline, we realized that AWS Glue's serverless nature was costing us heavily for idle time and unnecessary compute
To fix this, I migrated our ETL workloads to Apache Airflow, running on EC2 instances with ECS, and orchestrated everything using Terraform. The result? A 96% cost reduction, bringing our bill down to just $400 per month, without compromising on performance.
While Airflow is a great alternative to Glue, there's little documentation on setting it up properly with Terraform with Celery Executor — especially for cost optimization. This blog walks you through how we did it, the challenges we faced, and how you can do the same to slash your AWS Glue costs.
Needless to say, this was indeed a war
Introduction
I am Akash Singh, a third year engineering student and Open Source Contributor from Bangalore.
Here is my LinkedIn, GitHub and Twitter
I go by the name SkySingh04 online.
The three parts of the pain
Migrating from AWS Glue to Apache Airflow involves setting up three core components:
- Webserver – The UI for managing DAGs (Directed Acyclic Graphs) and monitoring job execution.
- Scheduler – Responsible for triggering and scheduling DAG runs.
- Workers – Execute the actual tasks in the DAGs.
Using Terraform, we provisioned ECS to run all three parallely and enable them to communicate with each other, which we will get to next.
Once Airflow is up and running, the next step is to migrate our ETL workflows. We will Glue jobs into Airflow DAGs, and then nuke the the Glue jobs, marking the final step in cutting down our AWS costs.
The Magical Dockerfile
You can use the following Dockerfile and push it to ECR and reference it in the upcoming configs :
FROM apache/airflow:latest-python3.9
USER root
RUN apt-get update && \
apt-get install -y --no-install-recommends \
git \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /opt/airflow/dags /opt/airflow/logs && \
chown -R airflow:root /opt/airflow && \
chmod -R 755 /opt/airflow/logs
USER airflow
RUN pip install --no-cache-dir \
apache-airflow-providers-github \
apache-airflow-providers-amazon \
apache-airflow-providers-mysql \
apache-airflow-providers-mongo \
apache-airflow[celery,redis] \
pandas
COPY --chown=airflow:root dags/* /opt/airflow/dags/
ENV AIRFLOW__LOGGING__BASE_LOG_FOLDER=/opt/airflow/logs \
AIRFLOW__LOGGING__WORKER_LOG_SERVER_PORT=8793 \
AIRFLOW__LOGGING__LOGGING_LEVEL=INFO \
AIRFLOW__LOGGING__LOG_FORMAT='[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s' \
AIRFLOW__LOGGING__SIMPLE_LOG_FORMAT='%(asctime)s %(levelname)s - %(message)s' \
AIRFLOW__LOGGING__DAG_PROCESSOR_LOG_TARGET=file \
AIRFLOW__LOGGING__TASK_LOG_READER=task \
AIRFLOW__LOGGING__DAG_FILE_PROCESSOR_LOG_TARGET=/opt/airflow/logs/dag_processor_manager/dag_processor_manager.log \
AIRFLOW__LOGGING__DAG_PROCESSOR_MANAGER_LOG_LOCATION=/opt/airflow/logs/dag_processor_manager/dag_processor_manager.log
ENV AIRFLOW__CORE__DAGS_FOLDER=/opt/airflow/dags
RUN mkdir -p /opt/airflow/logs/scheduler \
/opt/airflow/logs/web \
/opt/airflow/logs/worker \
/opt/airflow/logs/dag_processor_manager \
/opt/airflow/logs/task_logs
USER root
RUN chown -R airflow:root /opt/airflow && \
chmod -R 755 /opt/airflow
USER airflow
This Dockerfile will be used for all three of our components and very nicely setups logging as well. The DAGs are directly baked into the Docker Image, we will get to that in a bit. Builld the image, tag it, push it to ECR and move to the next step!
// Detect dark theme var iframe = document.getElementById('tweet-1880591497571246356-197'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1880591497571246356&theme=dark" }
Airflow Web Server
A Terraform script can be written to set up Apache Airflow on AWS using ECS (Elastic Container Service) with EC2 launch type. We need to make sure to add:
-
CloudWatch Logging:
- Creates a log group (
/ecs/airflow
) with a retention of 3 days.
- Creates a log group (
-
Security Groups:
- Allows inbound HTTP (port 80) and HTTPS (port 443) traffic for the Application Load Balancer (ALB).
- Enables unrestricted outbound traffic.
-
TLS/SSL with ACM & Route 53:
- Provisions an ACM (AWS Certificate Manager) certificate for airflow.internal.example.com using DNS validation.
- Configures Route 53 DNS records to resolve the Airflow URL to the ALB.
-
Application Load Balancer (ALB):
- Creates an internal ALB for the Airflow webserver, supporting IPv4 & IPv6 (
dualstack
). - Configures an HTTP listener (port 80) to redirect traffic to HTTPS (port 443).
- Sets up an HTTPS listener (port 443) to forward requests to the ECS target group.
- Creates an internal ALB for the Airflow webserver, supporting IPv4 & IPv6 (
-
ECS Task Definition for Airflow Webserver:
- Defines an ECS task for the Airflow webserver running on an EC2-backed ECS cluster.
- Uses a Docker image stored in AWS ECR (
aws_ecr_repository.airflow.repository_url:latest
). - Allocates 2GB memory (
2048MB
). - Maps container port 8080 to the host for web access.
- Defines a health check (
http://localhost:8080/health
).
-
ECS Service for Airflow:
- Creates an ECS service named "airflow-webserver" with 1 desired task.
- Associates the ECS service with the ALB target group for load balancing.
- Enables
execute-command
to allow debugging via AWS SSM. - Uses a capacity provider strategy for ECS resource management.
-
DNS Configuration:
- Configures a Route 53 A record (
airflow.internal.example.com
) pointing to the ALB.
- Configures a Route 53 A record (
The Terraform script includes several environment variables in the ECS task definition:
-
Database Connection (
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
):- Specifies the PostgreSQL database connection string for Airflow’s metadata database.
- Uses AWS KMS-encrypted secrets to securely store the database password.
-
User Management:
-
_AIRFLOW_WWW_USER_CREATE
: Ensures the default Airflow web user is created. -
_AIRFLOW_WWW_USER_USERNAME
: Sets the username (default:airflow
). -
_AIRFLOW_WWW_USER_PASSWORD
: Stores the password securely via AWS KMS secrets.
-
-
Security & Web Configuration:
-
AIRFLOW__WEBSERVER__EXPOSE_CONFIG
: Enables exposing Airflow configuration via the web UI. -
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK
: Enables a built-in scheduler health check.
-
-
Database Migrations & Initialization:
-
_AIRFLOW_DB_MIGRATE
: Ensures Airflow runs necessary database migrations on startup.
-
Now go ahead and run terraform plan
and terraform apply
and you should see a lot of resources being created. If everything went correctly, you will see the airflow ui on the url you specified :
Airflow Scheduler
The Airflow Scheduler is responsible for orchestrating DAG executions and ensuring scheduled tasks run at the correct time. A Terraform script can be written to provision the scheduler as an ECS service, configures CloudWatch logging, and enables auto-scaling to manage resource usage effectively.
While most of this is similar to the webserver, in summary, we need to added :
-
Logs Scheduler Execution in CloudWatch (
/ecs/airflow-scheduler/
). -
Monitors Performance via StatsD Metrics (
airflow-metrics
namespace). - Runs in an ECS Cluster with Auto Scaling, ensuring efficient resource allocation.
- Uses CloudWatch Agent for Monitoring, helping analyze task execution times.
- Secured by a Restricted Security Group, blocking unwanted traffic.
Now, go ahead and run terraform plan
and terraform apply
, and the Airflow Scheduler will be provisioned successfully!