AI-Assisted Testing with Browseruse
Why High-Level AI-Powered Testing? Automated UI testing is a critical part of modern QA. However, writing stable UI tests—especially for legacy applications or hard-to-automate interfaces—can be: Time-consuming Fragile due to frequent DOM changes or lack of test IDs Manual and resource-intensive to maintain AI-powered high-level automation offers a compelling alternative: Reduces QA effort for sanity and smoke tests Avoids the need for writing detailed selectors or test data Enables automation of legacy or dynamically rendered applications This blog demonstrates how existing open-source tools can be combined to build flexible and AI-enhanced automation workflows—without relying on expensive commercial platforms. Tools Used This approach integrates: Browseruse: An LLM-driven browser controller that uses Playwright under the hood Playwright: A robust framework for browser automation LLMs (Gemini): Handles interpretation of natural language QA tasks In this example, we use Gemini 2.0 Flash via LangChain due to its free availability. However, you can easily swap in alternatives like GPT-4, Claude 3, or even local models via Ollama. Available Alternatives in the Market Several commercial tools offer similar AI-powered testing capabilities, but What differentiates Browseruse + LLM: Fully open source agent Compatible with any LLM provider Easy to extend with custom component logic Real Example: One Script, No Test Data, No Selectors Consider a test application running locally at http://localhost:5173/, which includes a form submission page. Below is a complete script to test it using Browseruse and Gemini: from browser_use import Agent from dotenv import load_dotenv from langchain_google_genai import ChatGoogleGenerativeAI import asyncio llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash") load_dotenv() async def main(): success_task = """ You are an automated QA assistant performing exploratory UI testing. Open http://localhost:5173/ Fill the form with appropriate values Click Save after filling mandatory values Return task as success if you see success toast message or fail with reasons needed. """ agent = Agent( task=success_task, llm=llm, ) await agent.run() asyncio.run(main()) This script requires: No test-specific data No HTML selectors or test IDs No manual handling of dropdowns or wait conditions The LLM interprets the flow and executes it end-to-end using intelligent heuristics. You can also write a negative test case to validate failed submissions: failure_task = """ You are an automated QA assistant performing exploratory UI testing. Open http://localhost:5173/ Validate if the form works for mandatory values Return task as success if you see error toast message for validation fails. """ Challenges and Considerations Prompt Design Complex UI components like Ant Design’s rc-select or dynamic dropdowns require detailed prompt crafting or custom handlers. Execution Time LLM-based test runs tend to be slower than hand-coded automation scripts, particularly for long forms or multi-step flows. Cost While Gemini Flash is free, using models like GPT-4 or Claude 3 may incur usage-based costs. Benefits Using this approach, you can: Test new UIs without creating test scripts Add smoke checks for legacy flows without deep DOM knowledge Detect form-level bugs early in the development cycle All with the robustness of Playwright and the intelligence of LLMs. What’s Next Implement screenshot comparisons for visual regressions Generate test reports linked to user stories Auto-record successful runs as replayable test cases Fine-tune prompts for app-specific language and behavior

Why High-Level AI-Powered Testing?
Automated UI testing is a critical part of modern QA. However, writing stable UI tests—especially for legacy applications or hard-to-automate interfaces—can be:
- Time-consuming
- Fragile due to frequent DOM changes or lack of test IDs
- Manual and resource-intensive to maintain
AI-powered high-level automation offers a compelling alternative:
- Reduces QA effort for sanity and smoke tests
- Avoids the need for writing detailed selectors or test data
- Enables automation of legacy or dynamically rendered applications
This blog demonstrates how existing open-source tools can be combined to build flexible and AI-enhanced automation workflows—without relying on expensive commercial platforms.
Tools Used
This approach integrates:
- Browseruse: An LLM-driven browser controller that uses Playwright under the hood
- Playwright: A robust framework for browser automation
- LLMs (Gemini): Handles interpretation of natural language QA tasks
In this example, we use Gemini 2.0 Flash via LangChain due to its free availability. However, you can easily swap in alternatives like GPT-4, Claude 3, or even local models via Ollama.
Available Alternatives in the Market
Several commercial tools offer similar AI-powered testing capabilities, but What differentiates Browseruse + LLM:
- Fully open source agent
- Compatible with any LLM provider
- Easy to extend with custom component logic
Real Example: One Script, No Test Data, No Selectors
Consider a test application running locally at http://localhost:5173/
, which includes a form submission page.
Below is a complete script to test it using Browseruse and Gemini:
from browser_use import Agent
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
import asyncio
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
load_dotenv()
async def main():
success_task = """
You are an automated QA assistant performing exploratory UI testing.
Open http://localhost:5173/
Fill the form with appropriate values
Click Save after filling mandatory values
Return task as success if you see success toast message or fail with reasons needed.
"""
agent = Agent(
task=success_task,
llm=llm,
)
await agent.run()
asyncio.run(main())
This script requires:
- No test-specific data
- No HTML selectors or test IDs
- No manual handling of dropdowns or wait conditions
The LLM interprets the flow and executes it end-to-end using intelligent heuristics.
You can also write a negative test case to validate failed submissions:
failure_task = """
You are an automated QA assistant performing exploratory UI testing.
Open http://localhost:5173/
Validate if the form works for mandatory values
Return task as success if you see error toast message for validation fails.
"""
Challenges and Considerations
Prompt Design
- Complex UI components like Ant Design’s
rc-select
or dynamic dropdowns require detailed prompt crafting or custom handlers.
Execution Time
- LLM-based test runs tend to be slower than hand-coded automation scripts, particularly for long forms or multi-step flows.
Cost
- While Gemini Flash is free, using models like GPT-4 or Claude 3 may incur usage-based costs.
Benefits
Using this approach, you can:
- Test new UIs without creating test scripts
- Add smoke checks for legacy flows without deep DOM knowledge
- Detect form-level bugs early in the development cycle
All with the robustness of Playwright and the intelligence of LLMs.
What’s Next
- Implement screenshot comparisons for visual regressions
- Generate test reports linked to user stories
- Auto-record successful runs as replayable test cases
- Fine-tune prompts for app-specific language and behavior