AI still can’t count. I built a dataset to prove it: VisQuant

I’ve been experimenting with GPT-4V, Claude, and Gemini and realized something strange: They can describe art. Solve riddles. Explain GPTs. But ask: “How many pencils are on the table?” Or “Which object is left of the cup?” And they fall apart. So I built a benchmark to test that specifically: What is VisQuant? 100 synthetic images 40+ everyday object types Labeled object counts and spatial layout 2 reasoning Q&A pairs per image Grounded annotations in JSON and CSV Baseline tested on GPT-4V Entirely open-source What It Tests VisQuant isolates the visual intelligence primitives models often skip over: Counting Spatial relationships Left/right/stacked inference Multi-hop VQA from structured scenes Why? Because current benchmarks like VQAv2 or GQA are messy, noisy, and hide these weaknesses. VisQuant is small, clean, focused — and it exposes real gaps in model reasoning. Get It:

Apr 2, 2025 - 21:12

0

AI still can’t count. I built a dataset to prove it: VisQuant

I’ve been experimenting with GPT-4V, Claude, and Gemini and realized something strange:

They can describe art. Solve riddles. Explain GPTs.
But ask: “How many pencils are on the table?”
Or “Which object is left of the cup?”
And they fall apart.

So I built a benchmark to test that specifically:

What is VisQuant?

100 synthetic images
40+ everyday object types
Labeled object counts and spatial layout
2 reasoning Q&A pairs per image
Grounded annotations in JSON and CSV
Baseline tested on GPT-4V
Entirely open-source

What It Tests
VisQuant isolates the visual intelligence primitives models often skip over:

Counting
Spatial relationships
Left/right/stacked inference
Multi-hop VQA from structured scenes

Why?
Because current benchmarks like VQAv2 or GQA are messy, noisy, and hide these weaknesses.
VisQuant is small, clean, focused — and it exposes real gaps in model reasoning.

Get It:

Tags:

Previous Article

Day 2: Linux Basics – Essential Commands (Part 2)

Aula02 - Arquivo2

Related Posts

Sprint 5 - Mischief Happens

Mar 27, 2025 0

Interface vs Abstract Class?

Interface vs Abstract Class?

Mar 22, 2025 0

Does Claude 3.7 Sonnet Generate Insecure Code?

Does Claude 3.7 Sonnet Generate Insecure Code?

Feb 27, 2025 0

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.