Skill Detail

Benchmark browser agents on a fixed stealth and task suite with browser-use benchmark

Compare browser-agent reliability on a repeatable task and anti-bot suite before choosing a stack or claiming progress.

Browser AutomationMulti-Framework

Browser Automation Multi-Framework Security Reviewed

⭐ 71 GitHub stars

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill benchmark-browser-agents-on-a-fixed-stealth-and-task-suite-with-browser-use-benchmark

Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Tools required

Python, uv, benchmark repository dependencies, required API keys for the judge model and selected browser provider, target browser agent configuration

Install & setup

Clone the benchmark repository, install dependencies with the documented uv workflow, populate the required .env variables, then run the provided evaluation commands for the main browser task suite or the stealth benchmark against the browser provider you want to compare.

Author

browser-use

Publisher

Organization

Last updated

Apr 21, 2026

Quick brief

Use browser-use benchmark when the job is to measure browser-agent performance on a fixed evaluation suite, not to operate websites in production. It runs benchmark tasks and stealth checks across browser providers and model setups so you can compare reliability with the same harness each time. Invoke this instead of using browser-use normally when you need evidence about browser-agent quality before changing models, providers, or orchestration choices. The scope boundary is benchmark execution and comparison on fixed suites, not the broader browser-use automation library itself.

Best fit

When to reach for it

Best when the job fits Browser Automation.
Works naturally with Multi-Framework setups.
Requires Python, uv, benchmark repository dependencies, required API keys for the….
Installation is straightforward: Clone the benchmark repository, install dependencies with the documented uv workflow, populate the required .env variables,…

Trust & provenance

Why this listing is credible

Trust status: Security Reviewed.
71 GitHub stars on the linked upstream source.
Last updated Apr 21, 2026.

View source ↗