Wikimedia Drowning in AI Bot Traffic as Crawlers Consume 65% of Resources

Web crawlers collecting training data for AI models are overwhelming Wikipedia's infrastructure, with bot traffic growing exponentially since early 2024, according to the Wikimedia Foundation. According to data released April 1, bandwidth for multimedia content has surged 50% since January, primarily from automated programs scraping Wikimedia Commons' 144 million openly licensed media files. This unprecedented traffic is causing operational challenges for the non-profit. When Jimmy Carter died in December 2024, his Wikipedia page received 2.8 million views in a day, while a 1.5-hour video of his 1980 presidential debate caused network traffic to double, resulting in slow page loads for some users. Analysis shows 65% of the foundation's most resource-intensive traffic comes from bots, despite bots accounting for only 35% of total pageviews. The foundation's Site Reliability team now routinely blocks overwhelming crawler traffic to prevent service disruptions. "Our content is free, our infrastructure is not," the foundation said, announcing plans to establish sustainable boundaries for automated content consumption. Read more of this story at Slashdot.

Apr 5, 2025 - 02:06
 0
Wikimedia Drowning in AI Bot Traffic as Crawlers Consume 65% of Resources
Web crawlers collecting training data for AI models are overwhelming Wikipedia's infrastructure, with bot traffic growing exponentially since early 2024, according to the Wikimedia Foundation. According to data released April 1, bandwidth for multimedia content has surged 50% since January, primarily from automated programs scraping Wikimedia Commons' 144 million openly licensed media files. This unprecedented traffic is causing operational challenges for the non-profit. When Jimmy Carter died in December 2024, his Wikipedia page received 2.8 million views in a day, while a 1.5-hour video of his 1980 presidential debate caused network traffic to double, resulting in slow page loads for some users. Analysis shows 65% of the foundation's most resource-intensive traffic comes from bots, despite bots accounting for only 35% of total pageviews. The foundation's Site Reliability team now routinely blocks overwhelming crawler traffic to prevent service disruptions. "Our content is free, our infrastructure is not," the foundation said, announcing plans to establish sustainable boundaries for automated content consumption.

Read more of this story at Slashdot.