Cleaner web feed aggregation with App::FeedDeduplicator
I write blog posts in a number of different places: Davblog has been my general blog for about twenty years Perl Hacks is where I write about Perl My Substack newsletter is mostly tech stuff but can also wander into entrepreneurship and other topics And most of those posts get syndicated to other places: Tech stuff will usually end up on dev.to Non-tech stuff will go to Medium Occasionally, stuff about Perl will be republished on perl.com It’s also possible that I’ll write original posts on one of these syndication sites without posting to one of my sites first. Recently, when revamping my professional website I decided that I wanted to display a list recent posts from all of those sources. But because of the syndication, it was all a bit noisy: multiple copies of the same post, repeated titles, and a poor reading experience. What I wanted was a single, clean feed — a unified view of everything I’ve written, without repetition. So I wrote a tool. The Problem I wanted to: Aggregate several feeds into one Remove syndicated duplicates automatically Prefer the canonical/original version of each post Output the result in Atom (or optionally RSS or JSON) The Solution: App::FeedDeduplicator App::FeedDeduplicator is a new CPAN module and CLI tool for aggregating and deduplicating web feeds. It reads a list of feed URLs from a JSON config file, downloads and parses them, filters out duplicates (based on canonical URLs or titles), sorts the results by date, and emits a clean, modern feed. How It Works A JSON config file provides the list of feeds and the desired output format: { "output_format": "json", "max_entries": 10, "feeds": [{ "feed": "https://perlhacks.com/feed/", "web": "https://perlhacks.com/", "name": "Perl Hacks" }, { "feed": "https://davecross.substack.com/feed", "web": "https://davecross.substack.com/", "name": "Substack" }, { "feed": "https://blog.dave.org.uk/feed/", "web": "https://blog.dave.org.uk/", "name": "Davblog" }, { "feed": "https://dev.to/feed/davorg", "web": "https://dev.to/davorg", "name": "Dev.to" }, { "feed": "https://davorg.medium.com/feed", "web": "https://davorg.medium.com/", "name": "Medium" }] } Each feed is fetched and parsed using XML::Feed For each entry, the linked page is scanned for a tag If found, that canonical URL is used to detect duplicates; if not, the entry’s title is used as a fallback Duplicates are discarded, keeping only one version (preferably canonical) The resulting list is sorted by date and emitted in Atom, RSS, or JSON Installation and Usage Install via CPAN: cpanm App::FeedDeduplicator Then run it with: feed-deduplicator config.json If no config file is specified, it will try the FEED_DEDUP_CONFIG environment variable or fallback to ~/.feed-deduplicator/config.json. There’s also a Docker image with the latest version installed. Under the Hood The tool is written in Perl 5.38+ and uses the new class feature (perlclass) for a cleaner OO structure: App::FeedDeduplicator::Aggregator handles feed downloading and parsing App::FeedDeduplicator::Deduplicator detects and removes duplicates App::FeedDeduplicator::Publisher generates the final output What’s Next? It’s all very much a work in progress at the moment. It works for me, but there are bound to be some improvements needed, so it works for more people. A few things I already know I want to improve: Add a configuration option for the LWP::Useragent agent identifier string Add configuration options for the fixed elements of the generated web feed (name, link and things like that) Add a per-feed limit for the number of entries published (I can see a use case where someone wants to publish a single entry from each feed) Some kind of configuration template for the JSON version of the output Try It Out If you want a clean, single-source feed that represents your writing without duplication, App::FeedDeduplicator might be just what you need. I’m using it now to power the aggregated feed on my site. Let me know what you think! The post Cleaner web feed aggregation with App::FeedDeduplicator first appeared on Perl Hacks.

I write blog posts in a number of different places:
- Davblog has been my general blog for about twenty years
- Perl Hacks is where I write about Perl
- My Substack newsletter is mostly tech stuff but can also wander into entrepreneurship and other topics
And most of those posts get syndicated to other places:
- Tech stuff will usually end up on dev.to
- Non-tech stuff will go to Medium
- Occasionally, stuff about Perl will be republished on perl.com
It’s also possible that I’ll write original posts on one of these syndication sites without posting to one of my sites first.
Recently, when revamping my professional website I decided that I wanted to display a list recent posts from all of those sources. But because of the syndication, it was all a bit noisy: multiple copies of the same post, repeated titles, and a poor reading experience.
What I wanted was a single, clean feed — a unified view of everything I’ve written, without repetition.
So I wrote a tool.
The Problem
I wanted to:
- Aggregate several feeds into one
- Remove syndicated duplicates automatically
- Prefer the canonical/original version of each post
- Output the result in Atom (or optionally RSS or JSON)
The Solution: App::FeedDeduplicator
App::FeedDeduplicator is a new CPAN module and CLI tool for aggregating and deduplicating web feeds.
It reads a list of feed URLs from a JSON config file, downloads and parses them, filters out duplicates (based on canonical URLs or titles), sorts the results by date, and emits a clean, modern feed.
How It Works
- A JSON config file provides the list of feeds and the desired output format:
{
"output_format": "json",
"max_entries": 10,
"feeds": [{
"feed": "https://perlhacks.com/feed/",
"web": "https://perlhacks.com/",
"name": "Perl Hacks"
}, {
"feed": "https://davecross.substack.com/feed",
"web": "https://davecross.substack.com/",
"name": "Substack"
}, {
"feed": "https://blog.dave.org.uk/feed/",
"web": "https://blog.dave.org.uk/",
"name": "Davblog"
}, {
"feed": "https://dev.to/feed/davorg",
"web": "https://dev.to/davorg",
"name": "Dev.to"
}, {
"feed": "https://davorg.medium.com/feed",
"web": "https://davorg.medium.com/",
"name": "Medium"
}]
}
- Each feed is fetched and parsed using XML::Feed
- For each entry, the linked page is scanned for a
tag
- If found, that canonical URL is used to detect duplicates; if not, the entry’s title is used as a fallback
- Duplicates are discarded, keeping only one version (preferably canonical)
- The resulting list is sorted by date and emitted in Atom, RSS, or JSON
Installation and Usage
Install via CPAN:
cpanm App::FeedDeduplicator
Then run it with:
feed-deduplicator config.json
If no config file is specified, it will try the FEED_DEDUP_CONFIG
environment variable or fallback to ~/.feed-deduplicator/config.json
.
There’s also a Docker image with the latest version installed.
Under the Hood
The tool is written in Perl 5.38+ and uses the new class
feature (perlclass
) for a cleaner OO structure:
-
App::FeedDeduplicator::Aggregator
handles feed downloading and parsing -
App::FeedDeduplicator::Deduplicator
detects and removes duplicates -
App::FeedDeduplicator::Publisher
generates the final output
What’s Next?
It’s all very much a work in progress at the moment. It works for me, but there are bound to be some improvements needed, so it works for more people. A few things I already know I want to improve:
- Add a configuration option for the LWP::Useragent agent identifier string
- Add configuration options for the fixed elements of the generated web feed (name, link and things like that)
- Add a per-feed limit for the number of entries published (I can see a use case where someone wants to publish a single entry from each feed)
- Some kind of configuration template for the JSON version of the output
Try It Out
If you want a clean, single-source feed that represents your writing without duplication, App::FeedDeduplicator
might be just what you need.
I’m using it now to power the aggregated feed on my site. Let me know what you think!
The post Cleaner web feed aggregation with App::FeedDeduplicator first appeared on Perl Hacks.