Song recommendations proof-of-concept memory measurements

An attempt at measurement, and some results. This is an article in a larger series about functional programming design alternatives, and a direct continuation of the previous article. The question lingering after the Impureim Sandwich proof of concept is: What are the memory requirements of front-loading all users, songs, and scrobbles? One can guess, as I've already done, but it's safer to measure. In this article, you'll find a description of the experiment, as well as some results. Test program # Since I don't measure application memory profiles that often, I searched the web to learn how, and found this answer by Jon Skeet. That's a reputable source, so I'm assuming that the described approach is appropriate. I added a new command-line executable to the source code and made this the entry point: const int size = 100_000; static async Task Main() { var before = GC.GetTotalMemory(true); var (listeners, scrobbles) = await Populate(); var after = GC.GetTotalMemory(true); var diff = after - before; Console.WriteLine("Total memory: {0:N0}B.", diff); GC.KeepAlive(listeners); GC.KeepAlive(scrobbles); } listeners and scrobbles are two dictionaries of data, as described in the previous article. Together, they contain the data that we measure. Both are populated by this method: private static async Task Populate() { var service = PopulateService(); var listeners = await service.CollectAllTopListeners(); var scrobbles = await service.CollectAllTopScrobbles(); return (listeners, scrobbles); } The service variable is a FakeSongService object populated with randomly generated data. The CollectAllTopListeners and CollectAllTopScrobbles methods are the same as described in the previous article. When the method returns the two dictionaries, the service object goes out of scope and can be garbage-collected. When the program measures the memory load, it measures the size of the two dictionaries, but not service. I've reused the FsCheck generators for random data generation: private static SongService PopulateService() { var users = RecommendationsProviderTests.Gen.UserName.Sample(size); var songs = RecommendationsProviderTests.Gen.Song.Sample(size); var scrobbleGen = from user in Gen.Elements(users) from song in Gen.Elements(songs) from scrobbleCount in Gen.Choose(1, 10) select (user, song, scrobbleCount); var service = new FakeSongService(); foreach (var (user, song, scrobbleCount) in scrobbleGen.Sample(size)) service.Scrobble(user, song, scrobbleCount); return service; } A Gen object comes with a Sample method you can use to request a specified number of randomly generated values. In order to keep the code simple, I used the size value for both the number of songs, number of users, and number of scrobbles. This probably creates too few scrobbles; a topic that requires further discussion later. Measurements # I ran the above program with various size values; 100,000 up to 1,000,000 in 100,000 increments, and from there up to 1,000,000 (one million) in 500,000 increments. At the higher values, it took a good ten minutes to run the program. As the chart indicates, I ran the program with various data representations (more on that below). While there are four distinct data series, they overlap pairwise so perfectly that the graph doesn't show the difference. The record and struct record data series are so identical that you can't visibly see the difference. The same is true for the bitmasked class and the bitmasked struct data series, which only go to size 500,000. There are small, but noticeable jumps from 4,500,000 to 5,000,000 and again from 8,500,000 to 9,000,000, but the overall impression is that the relationship is linear. It seems safe to conclude that the solution scales linearly with the data size. The number of bytes per size is almost constant and averages to 178 bytes. How does that compare to my previous memory size estimates? There, I estimated a song and a scrobble to require 8 bytes each, and a user less than 32 bytes. The way the above simulation runs, it generates one song, one user, and one scrobble per size unit. Therefore, I'd expect the average memory cost per experiment size to be around 8 + 8 + 32 = 48, plus some overhead from the dictionaries. Given that the number I measure is 178, that's 130 bytes of overhead. Honestly, that's more than I expected. I expect a dictionary to maintain an array of keys, perhaps hashed with a bucket per hash value. Perhaps, had I picked another data structure than a plain old Dict

May 12, 2025 - 08:54

Song recommendations proof-of-concept memory measurements

An attempt at measurement, and some results.

This is an article in a larger series about functional programming design alternatives, and a direct continuation of the previous article. The question lingering after the Impureim Sandwich proof of concept is: What are the memory requirements of front-loading all users, songs, and scrobbles?

One can guess, as I've already done, but it's safer to measure. In this article, you'll find a description of the experiment, as well as some results.

Test program #

Since I don't measure application memory profiles that often, I searched the web to learn how, and found this answer by Jon Skeet. That's a reputable source, so I'm assuming that the described approach is appropriate.

I added a new command-line executable to the source code and made this the entry point:

const int size = 100_000;
 
static async Task Main()
{
    var before = GC.GetTotalMemory(true);
 
    var (listeners, scrobbles) = await Populate();
 
    var after = GC.GetTotalMemory(true);
 
    var diff = after - before;
 
    Console.WriteLine("Total memory: {0:N0}B.", diff);
 
    GC.KeepAlive(listeners);
    GC.KeepAlive(scrobbles);
}

listeners and scrobbles are two dictionaries of data, as described in the previous article. Together, they contain the data that we measure. Both are populated by this method:

private static async Task<(
    IReadOnlyDictionary<int, IReadOnlyCollection>,
    IReadOnlyDictionary<string, IReadOnlyCollection>)> Populate()
{
    var service = PopulateService();
    var listeners = await service.CollectAllTopListeners();
    var scrobbles = await service.CollectAllTopScrobbles();
    return (listeners, scrobbles);
}

The service variable is a FakeSongService object populated with randomly generated data. The CollectAllTopListeners and CollectAllTopScrobbles methods are the same as described in the previous article. When the method returns the two dictionaries, the service object goes out of scope and can be garbage-collected. When the program measures the memory load, it measures the size of the two dictionaries, but not service.

I've reused the FsCheck generators for random data generation:

private static SongService PopulateService()
{
    var users = RecommendationsProviderTests.Gen.UserName.Sample(size);
    var songs = RecommendationsProviderTests.Gen.Song.Sample(size);
    var scrobbleGen =
        from user in Gen.Elements(users)
        from song in Gen.Elements(songs)
        from scrobbleCount in Gen.Choose(1, 10)
        select (user, song, scrobbleCount);
 
    var service = new FakeSongService();
    foreach (var (user, song, scrobbleCount) in scrobbleGen.Sample(size))
        service.Scrobble(user, song, scrobbleCount);
 
    return service;
}

A Gen object comes with a Sample method you can use to request a specified number of randomly generated values.

In order to keep the code simple, I used the size value for both the number of songs, number of users, and number of scrobbles. This probably creates too few scrobbles; a topic that requires further discussion later.

Measurements #

I ran the above program with various size values; 100,000 up to 1,000,000 in 100,000 increments, and from there up to 1,000,000 (one million) in 500,000 increments. At the higher values, it took a good ten minutes to run the program.

Song recommendations memory cost line chart.

As the chart indicates, I ran the program with various data representations (more on that below). While there are four distinct data series, they overlap pairwise so perfectly that the graph doesn't show the difference. The record and struct record data series are so identical that you can't visibly see the difference. The same is true for the bitmasked class and the bitmasked struct data series, which only go to size 500,000.

There are small, but noticeable jumps from 4,500,000 to 5,000,000 and again from 8,500,000 to 9,000,000, but the overall impression is that the relationship is linear. It seems safe to conclude that the solution scales linearly with the data size.

The number of bytes per size is almost constant and averages to 178 bytes. How does that compare to my previous memory size estimates? There, I estimated a song and a scrobble to require 8 bytes each, and a user less than 32 bytes. The way the above simulation runs, it generates one song, one user, and one scrobble per size unit. Therefore, I'd expect the average memory cost per experiment size to be around 8 + 8 + 32 = 48, plus some overhead from the dictionaries.

Given that the number I measure is 178, that's 130 bytes of overhead. Honestly, that's more than I expected. I expect a dictionary to maintain an array of keys, perhaps hashed with a bucket per hash value. Perhaps, had I picked another data structure than a plain old Dictionary, it's possible that the overhead would be different. Or perhaps I just don't understand .NET's memory model, when push comes to shove.

I then tried to split the single size parameter into three that would control the number of users, songs, and scrobbles independently. Setting both the number of users and songs to ten million, I then ran a series of simulations with increasing scrobble counts.

Scrobble memory cost line chart.

The relationship still looks linear, and at a hundred million scrobbles (and ten million users and ten million songs), the simulation uses 8.3 GB of memory.

I admit that I'm still a bit puzzled by the measurements, compared to my previous estimates. I would have expected those sizes to require about 1,2 GB, plus overhead, so the actual measurements are off by a factor of 7. Not quite an order of magnitude, but close.

Realism #

How useful are these measurements? How realistic are the experiments' parameters? Most streaming audio services report having catalogues with around 100 million songs, which is ten times more than what I've measured here. Such services may also have significantly more users than ten million, but what is going to make or break this architecture option (keeping all data in memory) is how many scrobbles users have, and how many times they listen to each song.

Even if we naively still believe that a scrobble only takes up 8 bytes, it doesn't follow automatically that 100 scrobbles take up 800 bytes. It depends on how many repeats there are. Recall how we may model a scrobble:

public sealed record Scrobble(Song Song, int ScrobbleCount);

If a user listens to the same song ten times, we don't have to create ten Scrobble objects; we can create one and set the ScrobbleCount to 10.

The memory requirement to store users' scrobbles depend on the average listening pattern. Even with millions of users, we may be able to store scrobbles in memory if users listen to relatively few songs. On the other hand, if they only listen to each song once, it's probably not going to fit in memory.

Still, we're dangerously close to the edge of what we can fit in memory. Shouldn't I just declare bankruptcy on that idea and move on?

The purpose of this overall article series is to demonstrate alternatives to the Impureim Sandwich pattern, so I'm ultimately going to do exactly that: Move on.

But not yet.

Sharding #

Some applications are truly global in nature, and when that's the case, keeping everything in memory may not be 'web scale'.

Still, I've seen more than one international company treat geographic areas as separate entities. This may be for legal reasons, or other business concerns that are unrelated to technology constraints.

As a programmer, you may think that a song recommendations service ought to be truly global. After all, more data produces more accurate results, right?

Your business owners may not think so. They may be concerned that regional music tastes may 'bleed over' market boundaries, and that this could ultimately scare customers away.

Even if you can technically prove that this isn't a relevant concern, because you can write an algorithm that takes this into account, you may get a direct order that, say, Southeast Asian scrobbles may not be used in North America, or vice verse.

It's worth investigating whether such business or legal constraints are already in place, because if they are, this may mean that you can shard the data, and that each shard still fits in memory.

You may still think that I'm trying to salvage a bad idea, but that's not my agenda. I discuss these topics because in my experience, many programmers don't consider them. Understanding the wider context of a problem may suggest solutions that you might otherwise dismiss.

But what if the business constraints change in the future? Shouldn't we be ready for that?

Yes and no. You should consider how such changes would impact the architecture. Then you discuss the advantages and disadvantages with other stakeholders.

Keep in mind that the reason to consider an Impureim Sandwich is because it's simple and easy to implement and maintain. Other alternatives may be more 'scalable', but also riskier to implement. You should involve other stakeholders in such decisions.

Song representations #

The first of the above charts draws graphs for four data series:

struct record
record
bitmasked struct
bitmasked class

These measure four different ways to model data; here more specifically a song.

My initial model of a song was a record:

public sealed record Song(int Id, bool IsVerifiedArtist, byte Rating);

Then I thought that perhaps, since the type only contains value types, it might be better to turn the above record into a record struct:

public record struct Song(int Id, bool IsVerifiedArtist, byte Rating);

It turns out that it makes no visible difference. In the chart, the two data series are so close to each other that you can't distinguish them.

Then I thought that instead of an int, a bool, and a byte, I could use a single bitmask to model all three values.

After all, I was only guessing when it came to data types anyway. It's likely that Rating is only a five-point or ten-point scale, but I still used a byte to model it. This suggests that I'm not using 96% of the data type's range. Perhaps I could use one of those redundant bits for IsVerifiedArtist, instead of an entire bool.

Taking this further, modelling the Id as an int suggests that you may have 4,294,967,295 unique songs. That's 4.3 billion songs - at least an order of magnitude more than those 100 million songs that we hear about. In reality though, most systems that use int for IDs only do so because int is CLS-compliant, and uint is not. In other words, most systems that use int for IDs most likely only use the positive half, which means there are 16 bytes to use for other purposes. Enter the bitmasked song:

public readonly struct Song
{
    private const uint idMask =
        0b0000_0111_1111_1111_1111_1111_1111_1111;
    private const uint isVerifiedArtistMask =
        0b1000_0000_0000_0000_0000_0000_0000_0000;
    private const uint ratingMask =
        0b0111_1000_0000_0000_0000_0000_0000_0000;
    private readonly uint bits;
 
    public Song(int id, bool isVerifiedArtist, byte rating)
    {
        var idBits = (uint)id & idMask;
        var isVerifiedArtistBits = isVerifiedArtist ? isVerifiedArtistMask : 0u;
        var ratingBits = ((uint)rating << 27) & ratingMask;
        bits = idBits | isVerifiedArtistBits | ratingBits;
    }
 
    public int Id => (int)(bits & idMask);
 
    public bool IsVerifiedArtist =>
        (bits & isVerifiedArtistMask) == isVerifiedArtistMask;
 
    public byte Rating => (byte)((bits & ratingMask) >> 27);
}

In this representation, I've set aside the lower 27 bits for the ID, enabling IDs to range between 0 and 134,217,727. The top bit is used for IsVerifiedArtist, and the remaining four bits for Rating.

This data structure only holds a single uint, and since I made it a struct, I thought it'd have minimal overhead.

As you can see in the above chart, that's not the case. When I run the experiment, this representation requires more memory.

Just to make sure I wasn't missing anything obvious, I tried making the bitmasked Song a class instead. No visible difference.

If you're wondering why the bitmasked data series only go to 500,000, it's because this change made the experiments glacial. It took somewhere between 12 and 24 hours to run the experiment with a size of 500,000.

For what it's worth, I don't think the slowdown is directly related to the data representation, but rather to the change I had to make to the FsCheck-based data generator:

let songParams = gen {
    let maxId = 0b0111_1111_1111_1111_1111_1111_1111
    let! songId = Gen.choose (1, maxId)
    let! isVerified = ArbMap.generate ArbMap.defaults
    let! rating = Gen.choose (0, 10) |> Gen.map byte
    return songId, isVerified, rating }
 
["Song">]
let song = gen {
    let! (id, isVerifiedArtist, rating) = songParams
    return Song (id, isVerifiedArtist, rating) }

I can't explain why the bitmasked representation requires more memory, but I'm increasingly having a nagging feeling that I've made a mistake somewhere. If you can spot a mistake, please let me know by leaving a comment.

Other data representations #

I also considered whether it'd make sense to represent the entire data set as a huge matrix. One could, for example, let rows represent users, and columns songs, and let each element represent the number of times a user has listened to a particular song:

User	Song 1	Song 2	Song 3	...
123	0	0	4	...
456	2	0	4	...
...

Let's say that you may expect some users to listen to a song more than 255 times, but probably not more than 65,535 times. Thus, you could store each play count as a ushort. Still, you would need users x songs values, so if you have 100 million songs and 10 million users, that implies 2 PB of memory. That doesn't sound useful.

On the other hand, most of those elements are going to be 0, so perhaps one could use an adjacency list instead. That is, however, essentially what an IReadOnlyDictionary<string, IReadOnlyCollection> is, so we're now back where we started.

Conclusion #

This article reports on some measurements I've made of memory requirements, assuming that we keep all scrobble data in memory. While I suspect that I've made a mistake, it still seems reasonable to conclude that the song recommendations scenario is on the edge of what's possible with the Impureim Sandwich pattern.

That's okay. I'm interested in understanding the limitations of solutions.

I do think, however, that it's worth taking note of just how massive amounts of data are required before the Impureim Sandwich pattern becomes untenable.

When I describe the pattern, the most common reaction is that it doesn't scale. And, as this article has explored, it's true that it doesn't scale. But it scales much, much better than you think. You may have billions of entities in your system, and they may still fit in a few gigabytes. Don't dismiss the Impureim Sandwich before you've made a real effort to understand the memory implications of it. Your intuition is likely to work against you.

I'll round off this part of the article series by showing how the Impureim Sandwich looks in other, more functional languages.

Next: Song recommendations as an F# Impureim Sandwich.

This blog is totally free, but if you like it, please consider supporting it.