Grok 3 vs. Deepseek r1: A deep analysis

Almost everyone now knows about the DeepSeek R1 model, an open-source AI from China that took the internet by storm. The main selling point of DeepSeek is that it's completely free and open-source and can rival some of OpenAI's paid models, like the o1. Then, on February 16, Elon Musk announced Grok 3, labelling it as the "Smartest AI on Earth." // Detect dark theme var iframe = document.getElementById('tweet-1890958798841389499-176'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1890958798841389499&theme=dark" } But how does it compare to our free model, DeepSeek R1? This made me curious, and I decided to test how Grok 3 compares against DeepSeek R1 in areas like coding, reasoning, math, and creative writing. Let's find out if all the hype around Grok 3 holds any weight. I tested both models against a list of prompts I created and showed the results and my thoughts on each model's ability in various tasks. TL;DR If you want to jump straight to the conclusion, here's a quick summary of the findings comparing DeepSeek R1 and Grok 3: Reasoning and Math: DeepSeek R1 and Grok 3 models perform similarly on reasoning questions. Coding: Grok 3 outperforms DeepSeek R1 and writes much better code. Creative Writing: Both models perform well in creative writing. DeepSeek is known to be a great model, but I prefer Grok 3. Brief on Grok 3 AI Model Grok 3 is the latest language model from xAI, offering 10x the computational power. It includes tools like DeepSearch for step-by-step reasoning and Big Brain Mode for handling complex tasks. Currently, the Grok 3 model is in beta mode, but is it really that much better than the DeepSeek R1 model? We'll find out by the end of this article. According to the official benchmarks shared by the xAI team at the launch event, Grok 3 appears to be a game-changer, outperforming all its competitors in almost every benchmark. DeepSeek R1 vs. Grok 3 I've used Chatbot Arena to test both of these models side by side, as it is the only available and trusted third-party site that allows testing the early Grok 3 model. Reasoning Problems ℹ️ Here, we will check the reasoning capabilities of both the models. 1. Russian Roulette Let's start up the show with an interesting question: You are playing Russian roulette with a six-shooter revolver. Your opponent puts in five bullets, spins the chambers and fires at himself, but no bullet comes out. He gives you the choice of whether or not he should spin the chambers again before firing at you. Should he spin again? Response from Grok 3: Response from DeepSeek R1: Final Verdict: Both of the models answered the problem correctly with correct reasoning. ✅ 2. Olympic Athlete Siblings I am the sister of two Olympic athletes. But these two athletes are not my sisters. How is this possible? Response from Grok 3: Response from DeepSeek R1: Final Verdict: Here as well, both the models answered the problem correctly with correct reasoning. ✅ 3. Find out the Liar The first two questions were straightforward. Let's conclude the reasoning test with a slightly trickier question: You meet three people: Haris, Antony, and Michael. Haris says, "At least one of us is a liar." Antony says, "Haris is lying." Michael says, "Antony is telling the truth." Determine who is lying and who is telling the truth. Response from Grok 3: Here, Grok 3 provided thorough reasoning, considering each case and breaking down the problem step by step to support its conclusion. Response from DeepSeek R1: Final Verdict: Both models answered the problem correctly ✅, but the reasoning of the Grok 3 model stands out to me. Summary: From this, we can see that both models are quite strong in reasoning capabilities, as they both provided correct answers to all my reasoning questions. Still, the reasoning capability of the Grok 3 model seems to outperform R1 at times. Mathematics

Feb 21, 2025 - 15:36

Almost everyone now knows about the DeepSeek R1 model, an open-source AI from China that took the internet by storm.

The main selling point of DeepSeek is that it's completely free and open-source and can rival some of OpenAI's paid models, like the o1.

Then, on February 16, Elon Musk announced Grok 3, labelling it as the "Smartest AI on Earth."

// Detect dark theme var iframe = document.getElementById('tweet-1890958798841389499-176'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1890958798841389499&theme=dark" }

But how does it compare to our free model, DeepSeek R1? This made me curious, and I decided to test how Grok 3 compares against DeepSeek R1 in areas like coding, reasoning, math, and creative writing.

Let's find out if all the hype around Grok 3 holds any weight.

I tested both models against a list of prompts I created and showed the results and my thoughts on each model's ability in various tasks.

TL;DR

If you want to jump straight to the conclusion, here's a quick summary of the findings comparing DeepSeek R1 and Grok 3:

Reasoning and Math: DeepSeek R1 and Grok 3 models perform similarly on reasoning questions.

Coding: Grok 3 outperforms DeepSeek R1 and writes much better code.

Creative Writing: Both models perform well in creative writing. DeepSeek is known to be a great model, but I prefer Grok 3.

Brief on Grok 3 AI Model

Grok 3 is the latest language model from xAI, offering 10x the computational power. It includes tools like DeepSearch for step-by-step reasoning and Big Brain Mode for handling complex tasks.

Currently, the Grok 3 model is in beta mode, but is it really that much better than the DeepSeek R1 model? We'll find out by the end of this article.

According to the official benchmarks shared by the xAI team at the launch event, Grok 3 appears to be a game-changer, outperforming all its competitors in almost every benchmark.

DeepSeek R1 vs. Grok 3

I've used Chatbot Arena to test both of these models side by side, as it is the only available and trusted third-party site that allows testing the early Grok 3 model.

Reasoning Problems

ℹ️ Here, we will check the reasoning capabilities of both the models.

1. Russian Roulette

Let's start up the show with an interesting question:

You are playing Russian roulette with a six-shooter revolver. Your opponent puts in five bullets, spins the chambers and fires at himself, but no bullet comes out. He gives you the choice of whether or not he should spin the chambers again before firing at you. Should he spin again?

Response from Grok 3:

Response from DeepSeek R1:

Final Verdict: Both of the models answered the problem correctly with correct reasoning. ✅

2. Olympic Athlete Siblings

I am the sister of two Olympic athletes. But these two athletes are not my sisters. How is this possible?

Response from Grok 3:

Response from DeepSeek R1:

Final Verdict: Here as well, both the models answered the problem correctly with correct reasoning. ✅

3. Find out the Liar

The first two questions were straightforward. Let's conclude the reasoning test with a slightly trickier question:

You meet three people: Haris, Antony, and Michael. Haris says, "At least one of us is a liar." Antony says, "Haris is lying." Michael says, "Antony is telling the truth." Determine who is lying and who is telling the truth.

Response from Grok 3:

Here, Grok 3 provided thorough reasoning, considering each case and breaking down the problem step by step to support its conclusion.

Response from DeepSeek R1:

Final Verdict: Both models answered the problem correctly ✅, but the reasoning of the Grok 3 model stands out to me.

Summary:

From this, we can see that both models are quite strong in reasoning capabilities, as they both provided correct answers to all my reasoning questions. Still, the reasoning capability of the Grok 3 model seems to outperform R1 at times.