Skill Detail

Benchmark deep research agents across factual, quality, and process dimensions with MiroEval

Score deep research agents on benchmark tasks using factual verification, report-quality scoring, and process evaluation before model or workflow changes ship.

Code Quality & ReviewMulti-Framework

Code Quality & Review Multi-Framework Published

⭐ 34 GitHub stars

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill benchmark-deep-research-agents-across-factual-quality-and-process-dimensions-with-miroeval

Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source

At a glance

Tools required

Python, uv, model result JSON, required API keys for judge and retrieval services

Install & setup

Run `uv sync`, copy `.env.template` to `.env` and add the required API keys, prepare a model-results JSON file, then execute `bash run_eval.sh –input data/method_results/my_model.json –model_name my_model`.

Author

MiroMindAI

Publisher

Organization

Last updated

Apr 18, 2026

Quick brief

Use MiroEval when you need to benchmark a deep research system against a fixed task set and score not just the final report but also factual correctness and research process quality. Invoke it instead of normal product usage when the job is comparative evaluation of research-agent outputs before rollout or model changes, not general web research. The boundary is a benchmarked deep-research evaluation workflow with defined input/result schemas and scoring dimensions, which keeps it skill-shaped rather than a generic platform listing.

Best fit

When to reach for it

Best when the job fits Code Quality & Review.
Works naturally with Multi-Framework setups.
Requires Python, uv, model result JSON, required API keys for judge….
Installation is straightforward: Run `uv sync`, copy `.env.template` to `.env` and add the required API keys, prepare a model-results…

Trust & provenance

Why this listing is credible

Trust status: Published.
34 GitHub stars on the linked upstream source.
Last updated Apr 18, 2026.

View source ↗