Skill Detail

Benchmark browser agents on a fixed stealth and task suite with browser-use benchmark

Compare browser-agent reliability on a repeatable task and anti-bot suite before choosing a stack or claiming progress.

Browser AutomationMulti-Framework
Browser Automation Multi-Framework Security Reviewed
⭐ 71 GitHub stars
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill benchmark-browser-agents-on-a-fixed-stealth-and-task-suite-with-browser-use-benchmark Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Tools required
Python, uv, benchmark repository dependencies, required API keys for the judge model and selected browser provider, target browser agent configuration
Install & setup
Clone the benchmark repository, install dependencies with the documented uv workflow, populate the required .env variables, then run the provided evaluation commands for the main browser task suite or the stealth benchmark against the browser provider you want to compare.
Author
browser-use
Publisher
Organization
Last updated
Apr 21, 2026
Quick brief

Use browser-use benchmark when the job is to measure browser-agent performance on a fixed evaluation suite, not to operate websites in production. It runs benchmark tasks and stealth checks across browser providers and model setups so you can compare reliability with the same harness each time. Invoke this instead of using browser-use normally when you need evidence about browser-agent quality before changing models, providers, or orchestration choices. The scope boundary is benchmark execution and comparison on fixed suites, not the broader browser-use automation library itself.