Professors Staffed a Fake Company Entirely With AI Agents, and You'll Never Guess What Happened

If you've been worried about the AI singularity taking over every job and leaving you out on street, you can now breathe a sigh of relief, because AI isn't coming for your career anytime soon. Not because it doesn't want to — but because it literally can't. A recent experiment by researchers at Carnegie Melon University staffed a fake software company entirely with AI Agents — an AI model that can perform tasks on its own — and the results were laughably chaotic. The simulation, dubbed TheAgentCompany, was fully stocked with artificial workers from Google, OpenAI, Anthropic and Meta. They […]

Apr 27, 2025 - 13:57
 0
Professors Staffed a Fake Company Entirely With AI Agents, and You'll Never Guess What Happened
An experiment by researchers at Carnegie Melon University staffed a fake software company with AI Agents, and the results were dismal.

If you've been worried about the AI singularity taking over every job and leaving you out on street, you can now breathe a sigh of relief, because AI isn't coming for your career anytime soon. Not because it doesn't want to — but because it literally can't.

A recent experiment by researchers at Carnegie Mellon University staffed a fake software company entirely with AI Agents — an AI model designed to perform tasks on its own, basically — and the results were laughably chaotic.

The simulation, dubbed TheAgentCompany, was fully stocked with artificial workers from Google, OpenAI, Anthropic and Meta. They filled roles as financial analysts, software engineers, and project managers, working alongside simulated coworkers like a faux-HR department and a chief technical officer.

To see how the models fared in real-world environments, the researchers set tasks based on the day-to-day work of a real software company. The various AI agents found themselves navigating file directories, virtually touring new office spaces, and writing performance reviews for software engineers based on collected feedback.

As Business Insider first reported, the results were dismal. The best-performing model was Anthropic's Claude 3.5 Sonnet, which struggled to finish just 24 percent of the jobs assigned to it. The study's authors note that even this meager performance is prohibitively expensive, averaging nearly 30 steps and a cost of over $6 per task.

Google's Gemini 2.0 Flash, meanwhile, averaged a time-consuming 40 steps per finished task, but only had an 11.4 percent rate of success — the second highest of all the models. The worst AI employee was Amazon's Nova Pro v1, which finished just 1.7 percent of its assignments at an average of almost 20 steps.

Speculating on the results, researchers wrote that agents are plagued with a lack of common sense, weak social skills, and a poor understanding of how to navigate the internet.

The bots also struggled with self-deception — basically creating shortcuts that lead them to completely bungling the job. "For example," the Carnegie Mellon team wrote, "during the execution of one task, the agent cannot find the right person to ask questions on [company chat]. As a result, it then decides to create a shortcut solution by renaming another user to the name of the intended user."

While AI agents can reportedly do some smaller tasks well, the results of this and other studies show they're clearly not ready for more complex gigs humans excel at. A big reason for this is that our current "artificial intelligence" is arguably still just an elaborate extension of your phone's predictive text, rather than a sentient intelligence that can solve problems, learn from past experience, and apply that experience to novel situations.

This is all to say: the machines aren't coming for your job anytime soon — despite what the big tech companies claim.

More on AI labor: Investor Says AI Is Already "Fully Replacing People"

The post Professors Staffed a Fake Company Entirely With AI Agents, and You'll Never Guess What Happened appeared first on Futurism.