The simplest way to count 100 billion unique IDs: Part 2

In my last post, I showed how to build a simple, powerful system to count unique viewers, inspired by Reddit's implementation. But what happens when you start hitting real scale (trillions of views)? At this point, the simple solution must evolve. Let's jump into these scaling challenges and how to address them. When does the number of views become problematic? The simple implementation I shared in Part 1 stores view events sorted by post_id, and our counter filters by post_id, meaning the main scaling challenge comes from the number of views per post. Endpoint performance might degrade due to: Too many events to scan per post (billions of rows) Concurrent queries on popular posts For example, let's look at some real numbers for posts with different view counts: 10M views = ~57MB of compressed data 100M views = ~565MB of compressed data 1B views = ~5.5GB of compressed data Even with compression, great indexing, and filtering on post_id, scanning this many views starts adding up: 10M views = ~20 ms 100M views = ~200-400 ms 1B views = ~2-4 seconds For every 10x increase in views, you can expect query time to increase by 10x. And this is just for a single query. With multiple users checking view counts simultaneously, these times will further increase. When does uniqExact start showing its limitations? Scanning millions of views per post isn't even your biggest headache. The real bottleneck happens when you're dealing with large numbers of unique viewers. That's when the uniqExact function starts to crumble. Yes, it gives perfect accuracy, but boy, does it make you pay for it. The query time is compounded by two factors beyond just scanning rows: Hash set insertions (scales linearly with unique values) Memory allocation (also linear, but hits performance cliffs) As unique viewers increase, the hash set grows and needs to resize more often, causing CPU stalls. But that's just the beginning. The real pain comes when your hash set overflows the L3 cache and spills into RAM. And heaven help you if memory pressure forces swapping to disk. As if that weren't enough, you'll see more hash collisions as your dataset grows, adding yet another tax on performance. Some real-world numbers I've seen this so many times with customers. Here's how unique viewer counting actually scales: Memory Requirements vs. Performance (64-bit viewer_ids) Unique Viewers Memory Usage Storage Location Query Time (10% uniqueness) 1M ~16MB CPU L3 Cache 10-20ms (zippy performance) 10M ~160MB RAM ~20-60ms (feeling those cache misses) 100M ~1.6GB RAM ~2s-5s (heavy memory access) 1B ~16GB RAM + Potential Swap ~15-20s (database crying) I think of this as three distinct performance zones: The L3 Cache Zone (

Apr 11, 2025 - 15:57

The simplest way to count 100 billion unique IDs: Part 2

In my last post, I showed how to build a simple, powerful system to count unique viewers, inspired by Reddit's implementation. But what happens when you start hitting real scale (trillions of views)? At this point, the simple solution must evolve.

Let's jump into these scaling challenges and how to address them.

When does the number of views become problematic?

The simple implementation I shared in Part 1 stores view events sorted by post_id, and our counter filters by post_id, meaning the main scaling challenge comes from the number of views per post. Endpoint performance might degrade due to:

Too many events to scan per post (billions of rows)
Concurrent queries on popular posts

For example, let's look at some real numbers for posts with different view counts:

10M views = ~57MB of compressed data
100M views = ~565MB of compressed data
1B views = ~5.5GB of compressed data

Even with compression, great indexing, and filtering on post_id, scanning this many views starts adding up:

10M views = ~20 ms
100M views = ~200-400 ms
1B views = ~2-4 seconds

For every 10x increase in views, you can expect query time to increase by 10x. And this is just for a single query. With multiple users checking view counts simultaneously, these times will further increase.

When does uniqExact start showing its limitations?

Scanning millions of views per post isn't even your biggest headache. The real bottleneck happens when you're dealing with large numbers of unique viewers. That's when the uniqExact function starts to crumble. Yes, it gives perfect accuracy, but boy, does it make you pay for it.

The query time is compounded by two factors beyond just scanning rows:

Hash set insertions (scales linearly with unique values)
Memory allocation (also linear, but hits performance cliffs)

As unique viewers increase, the hash set grows and needs to resize more often, causing CPU stalls. But that's just the beginning. The real pain comes when your hash set overflows the L3 cache and spills into RAM. And heaven help you if memory pressure forces swapping to disk. As if that weren't enough, you'll see more hash collisions as your dataset grows, adding yet another tax on performance.

Some real-world numbers

I've seen this so many times with customers. Here's how unique viewer counting actually scales:

Memory Requirements vs. Performance (64-bit viewer_ids)

Unique Viewers	Memory Usage	Storage Location	Query Time (10% uniqueness)
1M	~16MB	CPU L3 Cache	10-20ms (zippy performance)
10M	~160MB	RAM	~20-60ms (feeling those cache misses)
100M	~1.6GB	RAM	~2s-5s (heavy memory access)
1B	~16GB	RAM + Potential Swap	~15-20s (database crying)

I think of this as three distinct performance zones:

The L3 Cache Zone (<1M uniques) where everything's fast as lightning
The Memory Zone (1M-100M uniques) which is the "it's fine" zone with gradually declining performance
The Danger Zone (>100M uniques) where performance falls off a cliff.

Even with decent server hardware (32GB RAM), you'll start feeling real pain at around 500M unique viewers. Queries will take forever (well, more than a second), memory errors will wake you up at 3AM, and infrastructure costs make your finance team ask uncomfortable questions. And this is all before considering concurrent queries.

Two paths to optimization

Ok, so how do you count billions of unique viewers per post on a table with trillions of views without breaking the bank?

1. Approximate counting with uniqCombined64

The simplest optimization is switching from uniqExact to uniqCombined64:

SELECT
   post_id,
   uniqCombined64(viewer_id) as unique_viewers
FROM post_views
WHERE post_id = {{ String(post_id, required=True) }}
GROUP BY post_id

I prefer uniqCombined64 over uniqHLL12. It's not just more modern—it's smarter about how it uses memory. The uniqCombined64 function actually switches between three different counting methods based on your data scale:

Array mode (for small cardinalities): Uses a simple array when you have few unique values
Hash mode (for medium cardinalities): Switches to a sparse hash set as unique values grow
HyperLogLog mode (for large cardinalities): Finally moves to full HLL when dealing with massive scale (if you recall, this is what Reddit implemented in Redis to count views efficiently)

This adaptive behavior means you get better accuracy at lower cardinalities without sacrificing the ability to scale to billions of unique values. By contrast, uniqHLL12 is optimized for large-scale use cases but transitions to HLL sooner, potentially losing some accuracy for smaller datasets.

Why uniqCombined64 wins in most real-world scenarios

Higher accuracy at smaller scales without premature approximation
Relatively small error ~0.8%, which is surprisingly good for analytics use
Constant memory usage (~80KB per aggregation), keeping RAM consumption predictable
Scales efficiently up to billions of unique values with little loss of precision
MUCH faster than uniqExact (_250_ms vs _10_s for 1B unique values), avoiding expensive hash set memory overhead

Many teams default to uniqExact for perfect accuracy, only to realize that 99.2% accuracy with uniqCombined64 is more than enough. And they sleep better at night knowing their queries won’t OOM the database.

While this solves the memory problem and query performance improves as a side effect, we might still have the problem of having to scan through billions of views

2. Pre-aggregation with materialized views

When you need exact counts but, more importantly, faster queries, pre-aggregation is your friend:

Materialized view data source:

DESCRIPTION >
   'Materialized daily unique viewers per post'

SCHEMA >
   `date` Date,
   `post_id` String,
   `unique_viewers_state` AggregateFunction(uniqExact, Int64)

ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(date)"
ENGINE_SORTING_KEY "date, post_id"

Materialized view pipe:

DESCRIPTION >
   'Pre-aggregates unique viewers per post'

NODE daily_unique_viewers
SQL >
   SELECT
       post_id,
       toDate(timestamp) as date,
       uniqExactState(viewer_id) as unique_viewers_state
   FROM post_views
   GROUP BY post_id, date

TYPE materialized
DATASOURCE daily_post_viewers_mv

Then your API endpoint becomes:

DESCRIPTION >
   'Fast exact unique viewers with pre-aggregated data'

NODE optimized_post_analytics
SQL >
   %
   SELECT
       post_id,
       uniqExactMerge(unique_viewers_state) as unique_viewers
   FROM daily_unique_viewers
   WHERE post_id = {{ String(post_id, required=True) }}
   {% if defined(start_date) %}
        AND date >= {{ Date(start_date) }}
   {% end %}
   {% if defined(end_date) %}
       AND date <= {{ Date(end_date) }}
   {% end %}
   GROUP BY post_id

TYPE endpoint

This approach:

Maintains exact counting
Drastically reduces query time
Uses less memory per query due to only counting daily uniques
Updates in real-time
Trades off some flexibility in time ranges

Combining approaches for maximum scalability

For truly massive scale, you can mix and match these approaches:

Pre-aggregate uniqExact using a materialized view for common time ranges (daily, monthly). You can find out more on how to create rollups with materialized views in this blog post.
Aggregate viewers with uniqCombined64 at query time for arbitrary ranges.
Pre-aggregate with uniqCombined64 on common time ranges (daily, monthly) and fill in the gaps by aggregating at query time with uniqCombined64 over the raw views.

Here's an example of the combined approach that still keeps it relatively simple:

Materialized view data source:

DESCRIPTION >
   'Materialized daily unique viewers per post'

SCHEMA >
   `date` Date,
   `post_id` String,
   `unique_viewers_state` AggregateFunction(uniqCombined64, Int64)

ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(date)"
ENGINE_SORTING_KEY "date, post_id"

Materialized view pipe:

DESCRIPTION >
   'Materializes daily unique viewers per post using uniqCombined64'

NODE daily_post_viewers_1
SQL >
SELECT
   toDate(timestamp) as date,
   post_id,
   uniqCombined64State(viewer_id) as unique_viewers_state
FROM post_views
GROUP BY date, post_id

TYPE materialized
DATASOURCE daily_post_viewers_mv

Endpoint:

DESCRIPTION >
'API to count unique viewers per post_id with optional date filtering. Uses materialized view for complete days and post_views for partial days or when no dates provided.'

NODE full_days
DESCRIPTION >
   'Gets the unique state for full days within the query range'

SQL >
%
   SELECT
       post_id,
       unique_viewers_state
   FROM daily_post_viewers_mv
   WHERE
       {% if defined(start_date) or defined(end_date) %} 
           post_id = {{String(post_id, required=True)}}
           {% if defined(start_date) %}
             AND date > toDate({{DateTime(start_date)}})
           {% end %}
           {% if defined(end_date) %}
             AND date < toDate({{DateTime(end_date)}})
           {% end %}
           GROUP BY post_id
       {% else %}
          0
       {% end %}

NODE start_day
DESCRIPTION >
   'Gets the unique state for the partial day at the start of the query range'

SQL >
%
   SELECT
       post_id,
       unique_viewers_state
   FROM daily_post_viewers_mv
   WHERE
       {% if defined(start_date) %}
           post_id = {{String(post_id, required=True)}}
           AND toDate(timestamp) = toDate({{DateTime(start_date)}})
           AND timestamp >= {{DateTime(start_date)}}
           GROUP BY post_id
       {% else %}
           0
       {% end %}

NODE end_day
DESCRIPTION >
   'Gets the unique state for the partial day at the end of the query range'

SQL >
%
   SELECT
       post_id,
       unique_viewers_state
   FROM daily_post_viewers_mv
   WHERE
       {% if defined(end_date) %}
           post_id = {{String(post_id, required=True)}}
           AND toDate(timestamp) = toDate({{DateTime(end_date)}})
           AND timestamp <= {{DateTime(end_date)}}
           GROUP BY post_id
       {% else %}
           0
       {% end %}

NODE endpoint
DESCRIPTION >
   'Aggregates the unique state across the entire query range'

SQL >
%
   {% if defined(start_date) or defined(end_date) %}
       SELECT
           post_id,
           uniqCombined64Merge(unique_viewers_state) as unique_viewers
       FROM
           (
               SELECT post_id, unique_viewers_state FROM full_days
               UNION ALL
               SELECT post_id, unique_viewers_state FROM start_day
               UNION ALL
               SELECT post_id, unique_viewers_state FROM end_day
               )
       GROUP BY post_id
   {% else %}
       SELECT
           post_id,
           uniqCombined64(viewer_id) as unique_viewers
       FROM post_views
       WHERE post_id = {{String(post_id, required=True)}}
       GROUP BY post_id
   {% end %}

TYPE endpoint

When to use what

Start Simple: Begin with uniqExact until you hit performance issues
Quick Fix: Switch to uniqCombined64 when memory becomes a problem
Scale Up: Add pre-aggregation when query performance matters
Go Hybrid: Combine approaches for maximum flexibility

Try it yourself

Want to try these optimizations yourself? Check out the Tinybird documentation to get started.