My Experience of Letting AI Handle a Whole PR Without Touching the Code

Today, I want to share my experience using Cline (https://github.com/cline/cline) to manage an entire pull request (PR) without touching the code myself. Rules I set the following rules for myself to complete the PR: Only let the AI write the code. Editing the code generated by the AI was not allowed. Use Cline as much as possible to let the AI handle everything. If it didn't work, use Copilot Chat. As a last resort, copy and paste from the web version of ChatGPT. Running commands like linting by myself was allowed. (It's just faster to run the commands myself than telling AI to run the exact command.) The task I chose for the AI was a refactoring task to unify the implementation of mocking across the codebase, which varied by developer. Environment I used the vanilla Cline VS Code extension. I tried the following two models: vscode-lm:copilot/gpt-4 vscode-lm:copilot/gpt-4o The codebase was an average Laravel application. Positives The PR was completed without touching the code. The task was completed entirely through prompts. I wasn't sure if it was possible, but Cline made it. This was a different experience from Copilot, where once the prompt was written, the AI handled everything. The AI can handle tasks beyond just writing code Cline was able to handle various tasks beyond just reading and writing code. Here are some tasks I asked it to do: Use ripgrep to find refactoring points Run tests to verify the correctness of the code and fix errors if any Format the code to a consistent style Generate Git commits, comments, and PR descriptions from git diff Write blog posts from notes I took during experiments (like this article!) These techniques are useful and something I plan to use in my day-to-day tasks. Negatives The biggest negative point was that honestly, it felt faster and easier to write the code myself. AI is not that smart yet The task was relatively simple, unifying the implementation of mocks, but the AI struggled to understand the intent of the prompts and the existing code. It often made unrelated changes or misunderstood what it meant to do. There were also many instances where it got stuck in an infinite loop, repeatedly fixing the same error. Additionally, I had to include sample code in the prompt to show the correct way to write mocks. This might be because many PHP codes are written without tests, so AI doesn't have enough training data. Prompt engineering is challenging Related to the above, prompt engineering is necessary to correctly convey what you want the AI to perform. Here is the prompt I used, which is very long and specific. It took trial and error to reach the final version. I believe I could have finished the task myself in the time it took to write this prompt. # Prompt In the tests/ folder, I want to unify how mocking is implemented. # Rules - There are instances of using mock(), Mockery::mock(), or $this->mock(). - Don't change anything apart from mock(), Mockery::mock(), or $this->mock() even if you find an error or improvements. ## mock() - Keep mock() if it only has a first argument. - If it has an anonymous function to make assertions for the second argument, remove the second argument and add lines to do the equivalent. - The mock should look like below without comments: $mock = mock(UserRepository::class); // create mock. Only use the first argument $mock->shouldReceive('function name') // insert function name to mock ->with('expected input') // insert expected args if applicable ->andReturn('expected output') // insert expected output if applicable ->once(); ## Mockery::mock() - Change it to use mock() and do the same for mock() ## $this->mock() - Change it to use mock() - Add app()->instance()/app()->bind() to bind the mock to the mocked class - Prefer to use app()->instance(). But when instantiation needs to pass arguments, use app()->bind() and pass an anonymous function for the second argument. # Contexts - This is a Laravel app and uses Pest as a testing library. - It uses Mockery for mocking. - You can verify if the test passes or not by running `make test-unit && make test-feature` - You can check the use of $this->mock or Mockery::mock by running these commands: + rg '\$this->mock' tests/ + rg Mockery::mock tests/ Code generation is quite slow When using GPT-4, it took quite a while for the actual code diff to be presented. GPT-4o was faster than GPT-4, but it still sometimes stopped generating code. Also, when the file itself was large, even a single line change would make the AI generate the entire file. Because of this, in my machine, editing files with 500+ lines failed. Therefore, when changing only part of a large file, it was sometimes faster to use Copilot Chat, which can show only the diff. These points are still frustrating compared to normal coding, but I believe they will be resolved as models improve in the future.

May 6, 2025 - 10:31

My Experience of Letting AI Handle a Whole PR Without Touching the Code

Today, I want to share my experience using Cline (https://github.com/cline/cline) to manage an entire pull request (PR) without touching the code myself.

Rules

I set the following rules for myself to complete the PR:
Only let the AI write the code. Editing the code generated by the AI was not allowed.
Use Cline as much as possible to let the AI handle everything. If it didn't work, use Copilot Chat. As a last resort, copy and paste from the web version of ChatGPT.
Running commands like linting by myself was allowed. (It's just faster to run the commands myself than telling AI to run the exact command.)

The task I chose for the AI was a refactoring task to unify the implementation of mocking across the codebase, which varied by developer.

Environment

I used the vanilla Cline VS Code extension.
I tried the following two models:

vscode-lm:copilot/gpt-4
vscode-lm:copilot/gpt-4o

The codebase was an average Laravel application.

Positives

The PR was completed without touching the code.

The task was completed entirely through prompts. I wasn't sure if it was possible, but Cline made it. This was a different experience from Copilot, where once the prompt was written, the AI handled everything.

The AI can handle tasks beyond just writing code

Cline was able to handle various tasks beyond just reading and writing code.

Here are some tasks I asked it to do:

Use ripgrep to find refactoring points
Run tests to verify the correctness of the code and fix errors if any
Format the code to a consistent style
Generate Git commits, comments, and PR descriptions from git diff
Write blog posts from notes I took during experiments (like this article!)

These techniques are useful and something I plan to use in my day-to-day tasks.

Negatives

The biggest negative point was that honestly, it felt faster and easier to write the code myself.

AI is not that smart yet

The task was relatively simple, unifying the implementation of mocks, but the AI struggled to understand the intent of the prompts and the existing code. It often made unrelated changes or misunderstood what it meant to do.

There were also many instances where it got stuck in an infinite loop, repeatedly fixing the same error.
Additionally, I had to include sample code in the prompt to show the correct way to write mocks. This might be because many PHP codes are written without tests, so AI doesn't have enough training data.

Prompt engineering is challenging

Related to the above, prompt engineering is necessary to correctly convey what you want the AI to perform.
Here is the prompt I used, which is very long and specific.
It took trial and error to reach the final version.
I believe I could have finished the task myself in the time it took to write this prompt.

# Prompt
In the tests/ folder, I want to unify how mocking is implemented.

# Rules
- There are instances of using mock(), Mockery::mock(), or $this->mock().
- Don't change anything apart from mock(), Mockery::mock(), or $this->mock() even if you find an error or improvements.

## mock()
- Keep mock() if it only has a first argument.
- If it has an anonymous function to make assertions for the second argument, remove the second argument and add lines to do the equivalent.
- The mock should look like below without comments:
$mock = mock(UserRepository::class); // create mock. Only use the first argument

$mock->shouldReceive('function name') // insert function name to mock
->with('expected input') // insert expected args if applicable
->andReturn('expected output') // insert expected output if applicable
->once();

## Mockery::mock()
- Change it to use mock() and do the same for mock()

## $this->mock()
- Change it to use mock()
- Add app()->instance()/app()->bind() to bind the mock to the mocked class
- Prefer to use app()->instance(). But when instantiation needs to pass arguments, use app()->bind() and pass an anonymous function for the second argument.

# Contexts
- This is a Laravel app and uses Pest as a testing library.
- It uses Mockery for mocking.
- You can verify if the test passes or not by running `make test-unit && make test-feature`
- You can check the use of $this->mock or Mockery::mock by running these commands:
+ rg '\$this->mock' tests/
+ rg Mockery::mock tests/

Code generation is quite slow

When using GPT-4, it took quite a while for the actual code diff to be presented. GPT-4o was faster than GPT-4, but it still sometimes stopped generating code.

Also, when the file itself was large, even a single line change would make the AI generate the entire file. Because of this, in my machine, editing files with 500+ lines failed.
Therefore, when changing only part of a large file, it was sometimes faster to use Copilot Chat, which can show only the diff.
These points are still frustrating compared to normal coding, but I believe they will be resolved as models improve in the future.

Bonus

Here are some useful prompts I found.

Create a commit

Read the staging changes by running `git --no-pager diff --staged` and commit it with an appropriate message.
Make the first line concise and add details after a newline if necessary
When letting Cline read the git command results, use the ` - -no-pager` option to avoid having to scroll the command result manually.

Create PR details

When creating a PR on GitHub, there is usually a project template, and this prompt can automatically fill it out.

Compare the diff between the current branch and master branch and fill this template for PR.

# Title
Title for this PR
# Motivation
The motivation of this PR
# Description
Details of the PR.

Conclusion

Overall, I found it more useful than I expected.
Although there are still some frustrating points, I believe that with faster and smarter models in the future, AI coding might become mainstream.
Happy coding (or prompting)!