Replacing Lambda Triggers with EventBridge in S3-to-Glue Workflows
In one of our production data platforms, we used Lambda functions to trigger AWS Glue jobs every time a file landed in an S3 folder. That setup worked fine when there were only two or three data sources. But as the system expanded to support more than 10 folders, it required deploying and maintaining an equal number of nearly identical Lambda functions, each wired to specific prefixes and jobs. The architecture became increasingly brittle and harder to manage. This post outlines how that structure was replaced using EventBridge, with prefix-based filtering and direct Glue job targets. No Lambda. No maintenance overhead. Scaling Limits and Operational Gaps Using S3 events to trigger Lambda comes with several limitations: A single Lambda function can’t be mapped to multiple S3 prefixes Each one requires separate deployment and IAM permissions Failures due to cold starts, dependency packaging, or misconfiguration often go undetected Event tracing is difficult without additional logging As more sources were added, silent failures became a recurring issue. In some cases, downstream data loads were missed completely. Objectives for a Scalable Triggering System A more resilient and maintainable system is needed to: Support multiple S3 prefixes cleanly Trigger different Glue jobs based on the prefix Include native retry behavior Offer better traceability and alerting EventBridge addressed these requirements directly. Event-Driven Architecture Overview The updated solution routes S3 events through EventBridge: S3 emits object-created events to EventBridge EventBridge rules filter events by prefix (e.g., applications/batch_load/) Each rule targets a specific Glue job Retries are handled natively Failures route to an SQS dead-letter queue (DLQ) An SNS topic forwards DLQ alerts to Slack for visibility This approach eliminated the need for intermediary Lambda functions. Step-by-Step Implementation Enable EventBridge Notifications on S3 Open the S3 bucket Go to Properties → Event notifications Enable "Send events to Amazon EventBridge" No further configuration is needed on the S3 side. Create EventBridge Rules by Prefix To map the applications/batch_load/ prefix to a job called batch_job_glue: Go to EventBridge → Rules → Create rule Name the rule: trigger-batch-glue-job Use the following event pattern: { "source": ["aws.s3"], "detail": { "bucket": {"name": ["my-etl-data-bucket"] }, "object": { "key": [{"prefix": "applications/batch_load/"}] } } } Set the target as the corresponding Glue job Under Input transformer, configure: Input Path: { "detail": { "bucket": {"name": "detail.bucket.name" }, "object": {"key": "detail.object.key" } } } Input Template: { "BUCKET_NAME": "detail.bucket.name", "OBJECT_KEY": "detail.object.key" } Repeat this setup for other prefixes and Glue jobs (e.g., applications/batch_job/, applications/daily_runs/) Receiving S3 Context in Glue Jobs Glue jobs only need minimal logic to accept the input parameters: import sys from awsglue.utils import getResolvedOptions args = getResolvedOptions(sys.argv, ['BUCKET_NAME', 'OBJECT_KEY']) bucket = args['BUCKET_NAME'] key = args['OBJECT_KEY'] print(f" Triggered for file: {bucket} and {key}") continuing further logic Example: Failure Detection with DLQ and Alerting In one case, a Glue job failed because the incoming data had a schema change that wasn't compatible with the job's Spark read logic. The job attempted to deserialize a new column that didn't exist in the table definition, which triggered a runtime AnalysisException. Since this was a data issue, retries didn’t help. EventBridge retried the event twice, both failed, and the event was then routed to the SQS DLQ. From there, SNS sent an alert email. This kind of alert-driven feedback loop has helped us catch data contract violations early something we couldn’t reliably do with Lambda triggers. Why Lambda Was a Misfit for This Use Case Lambda wasn’t a good fit here for multiple reasons. First, Glue jobs are often long-running, and Lambda has a 15-minute maximum timeout. If the job takes longer, you’re forced to use asynchronous invocation, which means Lambda fires the job and exits without waiting for it to complete. That leaves you with no visibility into whether the job failed halfway or succeeded, unless you bolt on custom polling, logging, and error handling. EventBridge, on the other hand, ensures reliable triggering with built-in retry and optional dead-letter queues. But it's important to note that this setup doesn't solve job-level failures either. If the Glue job fails midway due to bad data or resource issues, EventBridge considers its job done as long as the job was triggered successfully. To monitor job complet

In one of our production data platforms, we used Lambda functions to trigger AWS Glue jobs every time a file landed in an S3 folder. That setup worked fine when there were only two or three data sources.
But as the system expanded to support more than 10 folders, it required deploying and maintaining an equal number of nearly identical Lambda functions, each wired to specific prefixes and jobs. The architecture became increasingly brittle and harder to manage.
This post outlines how that structure was replaced using EventBridge, with prefix-based filtering and direct Glue job targets. No Lambda. No maintenance overhead.
Scaling Limits and Operational Gaps
Using S3 events to trigger Lambda comes with several limitations:
A single Lambda function can’t be mapped to multiple S3 prefixes
Each one requires separate deployment and IAM permissions
Failures due to cold starts, dependency packaging, or misconfiguration often go undetected
Event tracing is difficult without additional logging
As more sources were added, silent failures became a recurring issue. In some cases, downstream data loads were missed completely.
Objectives for a Scalable Triggering System
A more resilient and maintainable system is needed to:
Support multiple S3 prefixes cleanly
Trigger different Glue jobs based on the prefix
Include native retry behavior
Offer better traceability and alerting
EventBridge addressed these requirements directly.
Event-Driven Architecture Overview
The updated solution routes S3 events through EventBridge:
S3 emits object-created events to EventBridge
EventBridge rules filter events by prefix (e.g., applications/batch_load/)
Each rule targets a specific Glue job
Retries are handled natively
Failures route to an SQS dead-letter queue (DLQ)
An SNS topic forwards DLQ alerts to Slack for visibility
This approach eliminated the need for intermediary Lambda functions.
Step-by-Step Implementation
- Enable EventBridge Notifications on S3
Open the S3 bucket
Go to Properties → Event notifications
Enable "Send events to Amazon EventBridge"
No further configuration is needed on the S3 side.
- Create EventBridge Rules by Prefix
To map the applications/batch_load/ prefix to a job called batch_job_glue:
Go to EventBridge → Rules → Create rule
Name the rule: trigger-batch-glue-job
Use the following event pattern:
{
"source": ["aws.s3"],
"detail": {
"bucket": {"name": ["my-etl-data-bucket"] },
"object": {
"key": [{"prefix": "applications/batch_load/"}]
}
}
}
Set the target as the corresponding Glue job
Under Input transformer, configure:
Input Path:
{
"detail": {
"bucket": {"name": "detail.bucket.name" },
"object": {"key": "detail.object.key" }
}
}
Input Template:
{
"BUCKET_NAME": "detail.bucket.name",
"OBJECT_KEY": "detail.object.key"
}
Repeat this setup for other prefixes and Glue jobs (e.g., applications/batch_job/, applications/daily_runs/)
Receiving S3 Context in Glue Jobs
Glue jobs only need minimal logic to accept the input parameters:
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv, ['BUCKET_NAME', 'OBJECT_KEY'])
bucket = args['BUCKET_NAME']
key = args['OBJECT_KEY']
print(f" Triggered for file: {bucket} and {key}")
continuing further logic
Example: Failure Detection with DLQ and Alerting
In one case, a Glue job failed because the incoming data had a schema change that wasn't compatible with the job's Spark read logic. The job attempted to deserialize a new column that didn't exist in the table definition, which triggered a runtime AnalysisException. Since this was a data issue, retries didn’t help. EventBridge retried the event twice, both failed, and the event was then routed to the SQS DLQ. From there, SNS sent an alert email.
This kind of alert-driven feedback loop has helped us catch data contract violations early something we couldn’t reliably do with Lambda triggers.
Why Lambda Was a Misfit for This Use Case
Lambda wasn’t a good fit here for multiple reasons. First, Glue jobs are often long-running, and Lambda has a 15-minute maximum timeout. If the job takes longer, you’re forced to use asynchronous invocation, which means Lambda fires the job and exits without waiting for it to complete. That leaves you with no visibility into whether the job failed halfway or succeeded, unless you bolt on custom polling, logging, and error handling.
EventBridge, on the other hand, ensures reliable triggering with built-in retry and optional dead-letter queues. But it's important to note that this setup doesn't solve job-level failures either. If the Glue job fails midway due to bad data or resource issues, EventBridge considers its job done as long as the job was triggered successfully.
To monitor job completion and handle runtime failures, teams still need to subscribe to Glue Job State Change events, wire up CloudWatch alerts, or implement a status feedback loop. But the triggering side becomes fully managed and cost-effective with no code, no maintenance, just clean event routing.
Replacing a set of Lambda functions with a few well-scoped EventBridge rules significantly reduced operational complexity. The architecture is leaner, traceable, and cheaper to run. For teams managing multi-source S3 ETL pipelines, this approach simplifies the start of the workflow, even though runtime monitoring still requires its layer.