CSV Imports to DynamoDB at Scale

I recently had to populate a DynamoDB table with over 740,000 items as part of a migration project. I tried three different approaches to see what would give me the best mix of speed, cost, and operational sanity. I’m sharing what I learned here in case you’re facing the same question: “What’s the best way to load a large amount of data into DynamoDB?” Spoiler: Step Functions with batched Lambda invocations is the winner — but the path to that conclusion was full of interesting surprises. Before You Go Too Far... If your data is stored in S3 as a CSV or JSON file, and you're looking for a simple, no-code solution to load it directly into DynamoDB, AWS offers an out-of-the-box option. You can use the DynamoDB Data Import feature from the S3 console to create a table and populate it from your S3 bucket with minimal effort. This feature is ideal if you don't need custom pipelines or complex data transformations. You simply upload your data, configure the table, and let DynamoDB handle the rest. For more details on this feature, check out the official documentation: DynamoDB S3 Data Import. If that fits your use case, this post might be more than you need — feel free to stop here. The Setup I needed to insert 740,000 items into DynamoDB from a CSV file as part of a migration. Each item was under 1KB, which meant write capacity usage stayed predictable and efficient. I tested: An AWS Glue Job using Apache Spark. A Step Function with distributed map writing items directly via PutItem. A Step Function batching 1,000 items per state, passed to a Lambda using BatchWriteItem. Here’s what I found. Benchmark Results Method Time to Load 740K Items Cost Notes Glue Job ~12 minutes ~$2.1 Good for large files & data transformation Step Function + Direct DynamoDB ~120 minutes ~$18.5 Every item is a state transition — ouch Step Function + Lambda Batches ~5 minutes ~$1.78 Fastest and cheapest with high configurability Option 1: Glue Job Glue is great when you’re dealing with S3-based batch inputs or doing big ETL-style transformations. I used 10 Data Processing Units (DPUs), and the job finished in about 12 minutes. Summary: ⏩ Fast & Scalable ⚙️ Great for data transformation

May 5, 2025 - 08:50

I recently had to populate a DynamoDB table with over 740,000 items as part of a migration project. I tried three different approaches to see what would give me the best mix of speed, cost, and operational sanity. I’m sharing what I learned here in case you’re facing the same question: “What’s the best way to load a large amount of data into DynamoDB?”

Spoiler: Step Functions with batched Lambda invocations is the winner — but the path to that conclusion was full of interesting surprises.

Before You Go Too Far...

If your data is stored in S3 as a CSV or JSON file, and you're looking for a simple, no-code solution to load it directly into DynamoDB, AWS offers an out-of-the-box option. You can use the DynamoDB Data Import feature from the S3 console to create a table and populate it from your S3 bucket with minimal effort.

This feature is ideal if you don't need custom pipelines or complex data transformations. You simply upload your data, configure the table, and let DynamoDB handle the rest.

For more details on this feature, check out the official documentation: DynamoDB S3 Data Import.

If that fits your use case, this post might be more than you need — feel free to stop here.

The Setup

I needed to insert 740,000 items into DynamoDB from a CSV file as part of a migration. Each item was under 1KB, which meant write capacity usage stayed predictable and efficient. I tested:

An AWS Glue Job using Apache Spark.
A Step Function with distributed map writing items directly via PutItem.
A Step Function batching 1,000 items per state, passed to a Lambda using BatchWriteItem.

Here’s what I found.

Benchmark Results

Method	Time to Load 740K Items	Cost	Notes
Glue Job	~12 minutes	~$2.1	Good for large files & data transformation
Step Function + Direct DynamoDB	~120 minutes	~$18.5	Every item is a state transition — ouch
Step Function + Lambda Batches	~5 minutes	~$1.78	Fastest and cheapest with high configurability

Option 1: Glue Job

Glue is great when you’re dealing with S3-based batch inputs or doing big ETL-style transformations. I used 10 Data Processing Units (DPUs), and the job finished in about 12 minutes.

Summary:

⏩ Fast & Scalable
⚙️ Great for data transformation