[Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel (Rohan Paul/@rohanpaul_ai)

Rohan Paul / @rohanpaul_ai: [Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel — This is really BAD news of LLM's coding skill. ☹️ The best Frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel. LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI ("International [image]

Jun 17, 2025 - 11:40

0

[Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel (Rohan Paul/@rohanpaul_ai)

Rohan Paul / @rohanpaul_ai:
[Thread] A new US paper shows the best frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel — This is really BAD news of LLM's coding skill. ☹️ The best Frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel. LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI ("International [image]

Tags:

Previous Article

Sky, ITV, and Channel 4 plan to provide streaming ad space in one marketplace, l...

Studies: women are 25% less likely than men to have basic digital skills, are mo...

Related Posts

Texas-based Securonix acquires ThreatQuotient, which helps companies analyze threat intelligence more efficiently; ThreatQuotient had raised about $170M (Maria Deutscher/SiliconANGLE)

Texas-based Securonix acquires ThreatQuotient, which he...

Jun 13, 2025 0

MIT says it no longer stands behind a widely circulated research paper by its student that claimed an AI tool boosted discoveries in a materials science lab (Justin Lahart/Wall Street Journal)

MIT says it no longer stands behind a widely circulated...

May 17, 2025 0

German police identify TrickBot and Conti ransomware leader "Stern" as Vitaly Nikolaevich Kovalev, believed to be in Russia and shielded from extradition (Wired)

German police identify TrickBot and Conti ransomware le...

May 30, 2025 0

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.