Song recommendations as an F# Impureim Sandwich

A pure function on potentially massive data. This article is part of a larger article series titled Alternative ways to design with functional programming. In the previous article, you saw an example of applying the Impureim Sandwich pattern to the problem at hand: A song recommendation engine that sifts through much historical data. As already covered in Song recommendations as an Impureim Sandwich, the drawback, if you will, of a Recawr Sandwich is that you need to collect all data from impure sources before you can pass it to a pure function. It may happen that you need so much data that this strategy becomes untenable. This may be the case here, but surprisingly often, what strikes us humans as being vast amounts are peanuts for computers. So even if you don't find this particular example realistic, I'll forge ahead and show how to apply the Recawr Sandwich pattern to this problem. This is essentially a port to F# of the C# code from the previous article. If you rather want to see some more realistic architectures to deal with the overall problem, you can always go back to the table of contents in the first article of the series. In this article, I'm working with the fsharp-pure-function branch of the Git repository. Collecting all the data # Like in the previous article, we may start by adding two more members to the SongService interface, which will enable us to enumerate all songs and all users. The full interface, with all four methods, then looks like this: type SongService =     abstract GetAllSongs : unit -> Task     abstract GetAllUsers : unit -> Task     abstract GetTopListenersAsync : songId : int -> Task     abstract GetTopScrobblesAsync : userName : string -> Task If you compare with the interface definition shown in the article Porting song recommendations to F#, you'll see that GetAllSongs and GetAllUsers are the new methods. They enable you to collect all top listeners, and all top scrobbles, even though it may take some time. To gather all the top listeners, we may add this collectAllTopListeners action: let collectAllTopListeners (songService : SongService) = task {     let d = Dictionary ()     let! songs = songService.GetAllSongs ()     do! songs |> TaskSeq.iter (fun s -> task {         let! topListeners = songService.GetTopListenersAsync s.Id         d.Add (s.Id, topListeners) } )     return d :> IReadOnlyDictionary } Likewise, you can amass all the top scrobbles with a similar action: let collectAllTopScrobbles (songService : SongService) = task {     let d = Dictionary ()     let! users = songService.GetAllUsers ()     do! users |> TaskSeq.iter (fun u -> task {         let! topScrobbles = songService.GetTopScrobblesAsync u.UserName         d.Add (u.UserName, topScrobbles) } )     return d :> IReadOnlyDictionary } As you may have noticed, they're so similar that, had there been more than two, we might consider extracting the similar parts to a reusable operation. In both cases, we start with the action that enables us to enumerate all the resources (songs or scrobbles) that we're interested in. For each of these, we then invoke the action to get the 'top' resources for that song or scrobble. There's a massive n+1 problem here, but you could conceivably parallelize all these queries, as they're independent. Still, it's bound to take much time, possibly hours. All the data is kept in a dictionary per action, so two massive dictionaries in all. Once these two actions return, we're done with the read phase of the Recawr Sandwich. Traversals # You may have wondered about the above TaskSeq.iter action. That's not part of the standard library. What is it, and where does it come from? It's a specialized traversal, designed to make asynchronous Commands more streamlined. let iter f xs = task {     let! units = traverse f xs     Seq.iter id units } If you've ever wondered why the identity function (id) is useful, here's an example. In the first line of code, units is a unit seq value; i.e. a sequence of unit values. To make TaskSeq.iter as easy to use as possible, it should turn that multitude of unit values into a single unit value. There's more than one way to do that, but I found that using Seq.iter was about the most succinct option I could think of. Be that as it may, Seq.iter requires as an argument a function that returns unit, and since we already have unit values, id does the job. The iter action uses the TaskSeq module's traverse function, which is defined like this: let traverse f xs =     let go acc x = task {         let! x' = x         let! acc' = acc         return Seq.append

May 19, 2025 - 11:40
 0
Song recommendations as an F# Impureim Sandwich

A pure function on potentially massive data.

This article is part of a larger article series titled Alternative ways to design with functional programming. In the previous article, you saw an example of applying the Impureim Sandwich pattern to the problem at hand: A song recommendation engine that sifts through much historical data.

As already covered in Song recommendations as an Impureim Sandwich, the drawback, if you will, of a Recawr Sandwich is that you need to collect all data from impure sources before you can pass it to a pure function. It may happen that you need so much data that this strategy becomes untenable. This may be the case here, but surprisingly often, what strikes us humans as being vast amounts are peanuts for computers.

So even if you don't find this particular example realistic, I'll forge ahead and show how to apply the Recawr Sandwich pattern to this problem. This is essentially a port to F# of the C# code from the previous article. If you rather want to see some more realistic architectures to deal with the overall problem, you can always go back to the table of contents in the first article of the series.

In this article, I'm working with the fsharp-pure-function branch of the Git repository.

Collecting all the data #

Like in the previous article, we may start by adding two more members to the SongService interface, which will enable us to enumerate all songs and all users. The full interface, with all four methods, then looks like this:

type SongService =
    abstract GetAllSongs : unit -> Task<IEnumerable<Song>>
    abstract GetAllUsers : unit -> Task<IEnumerable<User>>
    abstract GetTopListenersAsync : songId : int -> Task<IReadOnlyCollection<User>>
    abstract GetTopScrobblesAsync : userName : string -> Task<IReadOnlyCollection<Scrobble>>

If you compare with the interface definition shown in the article Porting song recommendations to F#, you'll see that GetAllSongs and GetAllUsers are the new methods.

They enable you to collect all top listeners, and all top scrobbles, even though it may take some time. To gather all the top listeners, we may add this collectAllTopListeners action:

let collectAllTopListeners (songService : SongService) = task {
    let d = Dictionary<intIReadOnlyCollection<User>> ()
    let! songs = songService.GetAllSongs ()
    do! songs |> TaskSeq.iter (fun s -> task {
        let! topListeners = songService.GetTopListenersAsync s.Id
        d.Add (s.Id, topListeners) } )
    return d :> IReadOnlyDictionary<_, _> }

Likewise, you can amass all the top scrobbles with a similar action:

let collectAllTopScrobbles (songService : SongService) = task {
    let d = Dictionary<stringIReadOnlyCollection<Scrobble>> ()
    let! users = songService.GetAllUsers ()
    do! users |> TaskSeq.iter (fun u -> task {
        let! topScrobbles = songService.GetTopScrobblesAsync u.UserName
        d.Add (u.UserName, topScrobbles) } )
    return d :> IReadOnlyDictionary<_, _> }

As you may have noticed, they're so similar that, had there been more than two, we might consider extracting the similar parts to a reusable operation.

In both cases, we start with the action that enables us to enumerate all the resources (songs or scrobbles) that we're interested in. For each of these, we then invoke the action to get the 'top' resources for that song or scrobble. There's a massive n+1 problem here, but you could conceivably parallelize all these queries, as they're independent. Still, it's bound to take much time, possibly hours.

All the data is kept in a dictionary per action, so two massive dictionaries in all. Once these two actions return, we're done with the read phase of the Recawr Sandwich.

Traversals #

You may have wondered about the above TaskSeq.iter action. That's not part of the standard library. What is it, and where does it come from?

It's a specialized traversal, designed to make asynchronous Commands more streamlined.

let iter f xs = task {
    let! units = traverse f xs
    Seq.iter id units }

If you've ever wondered why the identity function (id) is useful, here's an example. In the first line of code, units is a unit seq value; i.e. a sequence of unit values. To make TaskSeq.iter as easy to use as possible, it should turn that multitude of unit values into a single unit value. There's more than one way to do that, but I found that using Seq.iter was about the most succinct option I could think of. Be that as it may, Seq.iter requires as an argument a function that returns unit, and since we already have unit values, id does the job.

The iter action uses the TaskSeq module's traverse function, which is defined like this:

let traverse f xs =
    let go acc x = task {
        let! x' = x
        let! acc' = acc
        return Seq.append acc' [x'] }
    xs |> Seq.map f |> Seq.fold go (task { return [] })

The type of traverse is ('a -> #Task<'c>) -> 'a seq -> Task<'c seq>; that is, it applies an asynchronous action to each of a sequence of 'a values, and returns an asynchronous workflow that contains a sequence of 'c values.

Dictionary lookups #

In .NET, queries that may fail are idiomatically modelled with methods that take out parameters. This is also true for dictionary lookups. Since that kind of design doesn't compose well, it's useful to add a little helper function that instead may return an empty value. While you'd generally do that by returning an option value, in this case, an empty collection is more appropriate.

let findOrEmpty key (d : IReadOnlyDictionary<_, IReadOnlyCollection<_>>) =
   match d.TryGetValue key with
   | truev -> v
   | _       -> List.empty

You may have noticed that I also added a similar helper function in the C# example, although there I called it GetOrEmpty.

Pure function with local mutation #

As a first step, we may wish to turn the GetRecommendationsAsync method into a pure function. If you look through the commits in the Git repository, you can see that I actually did this through a series of micro-commits, but here I only present a more coarse-grained version of the changes I made.

Instead of a method on a class, we now have a self-contained function that takes, as arguments, two dictionaries, but no SongService dependency.

let getRecommendations topScrobbles topListeners userName =
    // 1. Get user's own top scrobbles
    // 2. Get other users who listened to the same songs
    // 3. Get top scrobbles of those users
    // 4. Aggregate the songs into recommendations
 
    let scrobbles = topScrobbles |> Dict.findOrEmpty userName
    let scrobblesSnapshot =
        scrobbles
        |> Seq.sortByDescending (fun s -> s.ScrobbleCount)
        |> Seq.truncate 100
        |> Seq.toList
 
    let recommendationCandidates = ResizeArray ()
    for scrobble in scrobblesSnapshot do
        let otherListeners =
            topListeners |> Dict.findOrEmpty scrobble.Song.Id
        let otherListenersSnapshot =
            otherListeners
            |> Seq.filter (fun u -> u.TotalScrobbleCount >= 10_000)
            |> Seq.sortByDescending (fun u -> u.TotalScrobbleCount)
            |> Seq.truncate 20
            |> Seq.toList
 
        for otherListener in otherListenersSnapshot do
            let otherScrobbles =
                topScrobbles |> Dict.findOrEmpty otherListener.UserName
            let otherScrobblesSnapshot =
                otherScrobbles
                |> Seq.filter (fun s -> s.Song.IsVerifiedArtist)
                |> Seq.sortByDescending (fun s -> s.Song.Rating)
                |> Seq.truncate 10
                |> Seq.toList
 
            otherScrobblesSnapshot
            |> List.map (fun s -> s.Song)
            |> recommendationCandidates.AddRange
 
    let recommendations =
        recommendationCandidates
        |> Seq.sortByDescending (fun s -> s.Rating)
        |> Seq.truncate 200
        |> Seq.toList
        :> IReadOnlyCollection<_>
 
    recommendations

Since this is now a pure function, there's no need to run as an asynchronous workflow. The function no longer returns a Task, and I've also dispensed with the Async suffix.

The implementation still has imperative remnants. It initializes an empty ResizeArray (AKA List), and loops through nested loops to repeatedly call AddRange.

Even though the function contains local state mutation, none of it escapes the function's scope. The function is referentially transparent because it always returns the same result when given the same input, and it has no side effects.

You might still wish that it was 'more functional', which is certainly possible.

A single expression #

A curious property of expression-based languages is that you can conceivably write functions in 'one line of code'. Granted, it would often be a terribly wide line, not at all readable, a beast to maintain, and often with poor performance, so not something you'd want to alway do.

In this case, however, we can do that, although in order to stay within an 80x24 box, we break the expression over multiple lines.

let getRecommendations topScrobbles topListeners userName =
    // 1. Get user's own top scrobbles
    // 2. Get other users who listened to the same songs
    // 3. Get top scrobbles of those users
    // 4. Aggregate the songs into recommendations
 
    topScrobbles
    |> Dict.findOrEmpty userName
    |> Seq.sortByDescending (fun s -> s.ScrobbleCount)
    |> Seq.truncate 100
    |> Seq.collect (fun scrobble ->
        topListeners
        |> Dict.findOrEmpty scrobble.Song.Id
        |> Seq.filter (fun u -> u.TotalScrobbleCount >= 10_000)
        |> Seq.sortByDescending (fun u -> u.TotalScrobbleCount)
        |> Seq.truncate 20
        |> Seq.collect (fun otherListener ->
            topScrobbles
            |> Dict.findOrEmpty otherListener.UserName
            |> Seq.filter (fun s -> s.Song.IsVerifiedArtist)
            |> Seq.sortByDescending (fun s -> s.Song.Rating)
            |> Seq.truncate 10
            |> Seq.map (fun s -> s.Song)))
    |> Seq.sortByDescending (fun s -> s.Rating)
    |> Seq.truncate 200
    |> Seq.toList
    :> IReadOnlyCollection<_>

To be honest, the four lines of comments push the function definition over the edge of 24 lines of code, but without them, this variation actually does fit an 80x24 box. Even so, I'm not arguing that this is the best possible way to organize and lay out this function.

You may rightly complain that it's too dense. Perhaps you're also concerned about the arrow code tendency.

I'm not disagreeing, but at least this represents a milestone where the function is not only referentially transparent, but also implemented without local mutation. Not that that really should be the most important criterion, but once you have an entirely expression-based implementation, it's usually easier to break it up into smaller building blocks.

Composition from smaller functions #

To improve readability and maintainability, we may now extract helper functions. The first one easily suggests itself.

let private getUsersOwnTopScrobbles topScrobbles userName =
    topScrobbles
    |> Dict.findOrEmpty userName
    |> Seq.sortByDescending (fun s -> s.ScrobbleCount)
    |> Seq.truncate 100

Each of the subexpressions in the above code listing are candidates for the same kind of treatment, like this one:

let private getOtherUsersWhoListenedToTheSameSongs topListeners scrobble =
    topListeners
    |> Dict.findOrEmpty scrobble.Song.Id
    |> Seq.filter (fun u -> u.TotalScrobbleCount >= 10_000)
    |> Seq.sortByDescending (fun u -> u.TotalScrobbleCount)
    |> Seq.truncate 20

Notice that these helper methods are marked private so that they remain implementation details within the module that exports the getRecommendations function.

With a few more helper functions, you can now implement the getRecommendations function by composing the helpers.

let getRecommendations topScrobbles topListeners =
    getUsersOwnTopScrobbles topScrobbles
    >> Seq.collect (
        getOtherUsersWhoListenedToTheSameSongs topListeners
        >> Seq.collect (getTopSongsOfOtherUser topScrobbles))
    >> aggregateSongsIntoRecommendations

Notice that I've named each of the helper functions after the code comments that accompanied the previous incarnations of this function. If we consider code comments apologies for not properly organizing the code, we've now managed to structure it in such a way that those apologies are no longer required.

Conclusion #

If you accept the (perhaps preposterous) assumption that it's possible to fit the required data in persistent data structures, refactoring the recommendation algorithm to a pure function isn't that difficult. That's the pure part of a Recawr Sandwich. While I haven't shown the actual sandwich here, it's identical to the example shown in Song recommendations as a C# Impureim Sandwich.

I find the final incarnation of the code shown here to be quite attractive. While I've kept the helper functions private, it's always an option to promote them to public functions if you find that warranted. This could improve testability of the overall code base, albeit at the risk of increasing the surface area of the API that you have to maintain and secure.

There are always trade-offs to be considered. Even if you, eventually, find that for this particular example, the input data size is just too big to make this alternative viable, there are, in my experience, many other situations when this kind of architecture is a good solution. Even if the input size is a decent amount of megabytes, the simplification offered by an Impureim Sandwich may trump the larger memory footprint. As always, if you're concerned about performance, measure it.

Before we turn to alternative architectures, we'll survey how this variation looks in Haskell. As is generally the case in this article series, if you don't care about Haskell, you can always go back to the table of contents in the first article in the series and instead navigate to the next article that interests you.

Next: Song recommendations as a Haskell Impureim Sandwich.


This blog is totally free, but if you like it, please consider supporting it.