Designing Time-Series Data In DynamoDB

DynamoDB is excellent at efficiently storing and querying time-series data. However, the efficiency remains entirely dependent on your ability to model your data properly to support the access patterns required. Improper data models can lead to higher costs and performance issues. Let’s explore some best practices and data models for storing and querying time-series data in DynamoDB. Designing a table for time-series data Choosing the right primary keys Choosing the right partition key is essential for time-series data (and for any type of data in fact). A common design pattern is to prefix an identifier to the partition key and a high level time component. Here’s an example: pk: device#123#2025-03-21 We add the “device” keyword followed by the device ID and then the date (without the time). The sort key should be a timestamp (in standardized ISO 8601 format) of the event to enable efficient range queries. Here’s an example: sk: time#2025-03-21T12:30:00Z So each reading from our device would be stored in this key structure. Querying time-series data efficiently To fetch data for a specific time range, we can use the BETWEEN query method (rather than using inefficient Scan methods). Say we want to retrieve all readings from device 123 on March 21st 2025. We can run the following query: TableName: "devices", KeyConditionExpression: "pk = :pk AND sk BETWEEN :start AND :end", ExpressionAttributeValues: { ":pk": "device#123#2025-03-21", ":start": "time#2025-03-21T00:00:00Z", ":end": "time#2025-03-21T23:59:59Z" } By extending the start and end date we can expand our filter to whatever range we need. Handling high-traffic time-series workloads Partition Splitting to avoid hot keys A single partition can handle a total of 3000 read units and 1000 write units per second. If your time-series data has a high ingestion rate you can run into hot partitions. The solution for hot partitions is typically to shard the keys by more specific time components. For example, if you experience the “day” partitions experience heat, you can shard them by hour or minute. Rather than the partition key being “device#123#2025–03–21”, it can become “device#123#2025–03–21T12:00”. Now your data for that deviceID is partitioned by hour, reducing the risk of a hot partition. Expiring Old Data Time-series data often becomes obsolete after a certain amount of time. This is where using TTL is beneficial. Each item should have a TTL attribute so that it can automatically be removed by DynamoDB’s system. This optimizes storage costs and space on your table. Storing Aggregated Data Finally, if you often query summaries or aggregate totals, you can pre-aggregate data into a single item and query this item instead more efficiently. For example, instead of storing every event, you can store hourly or daily aggregation values with minimum, maximum and average values. To achieve this, you can use DynamoDB Streams. Every time a new event is recorded, use the stream to update a “count” value. Summary Time-series data is often a great fit for DynamoDB, but as always efficient data modeling is critical. By designing your data to accommodate for BETWEEN query methods, you can retrieve time-series data in ranges very efficiently. Implementing these best practices mentioned will make sure you reduce query and storage costs and ensure high performance, low latency, and high scalability.

Apr 6, 2025 - 15:19

DynamoDB is excellent at efficiently storing and querying time-series data.

However, the efficiency remains entirely dependent on your ability to model your data properly to support the access patterns required.

Improper data models can lead to higher costs and performance issues.

Let’s explore some best practices and data models for storing and querying time-series data in DynamoDB.

Designing a table for time-series data

Choosing the right primary keys
Choosing the right partition key is essential for time-series data (and for any type of data in fact).

A common design pattern is to prefix an identifier to the partition key and a high level time component. Here’s an example:

pk: device#123#2025-03-21

We add the “device” keyword followed by the device ID and then the date (without the time).

The sort key should be a timestamp (in standardized ISO 8601 format) of the event to enable efficient range queries. Here’s an example:

sk: time#2025-03-21T12:30:00Z

So each reading from our device would be stored in this key structure.

Querying time-series data efficiently

To fetch data for a specific time range, we can use the BETWEEN query method (rather than using inefficient Scan methods).

Say we want to retrieve all readings from device 123 on March 21st 2025. We can run the following query:

TableName: "devices",
KeyConditionExpression: "pk = :pk AND sk BETWEEN :start AND :end",
ExpressionAttributeValues: {
    ":pk": "device#123#2025-03-21",
    ":start": "time#2025-03-21T00:00:00Z",
    ":end": "time#2025-03-21T23:59:59Z"
  }

By extending the start and end date we can expand our filter to whatever range we need.

Handling high-traffic time-series workloads

Partition Splitting to avoid hot keys

A single partition can handle a total of 3000 read units and 1000 write units per second.

If your time-series data has a high ingestion rate you can run into hot partitions.

The solution for hot partitions is typically to shard the keys by more specific time components.

For example, if you experience the “day” partitions experience heat, you can shard them by hour or minute.

Rather than the partition key being “device#123#2025–03–21”, it can become “device#123#2025–03–21T12:00”.

Now your data for that deviceID is partitioned by hour, reducing the risk of a hot partition.

Expiring Old Data
Time-series data often becomes obsolete after a certain amount of time.

This is where using TTL is beneficial. Each item should have a TTL attribute so that it can automatically be removed by DynamoDB’s system.

This optimizes storage costs and space on your table.

Storing Aggregated Data
Finally, if you often query summaries or aggregate totals, you can pre-aggregate data into a single item and query this item instead more efficiently.

For example, instead of storing every event, you can store hourly or daily aggregation values with minimum, maximum and average values.

To achieve this, you can use DynamoDB Streams. Every time a new event is recorded, use the stream to update a “count” value.

Summary

Time-series data is often a great fit for DynamoDB, but as always efficient data modeling is critical.

By designing your data to accommodate for BETWEEN query methods, you can retrieve time-series data in ranges very efficiently.

Implementing these best practices mentioned will make sure you reduce query and storage costs and ensure high performance, low latency, and high scalability.