Skill Detail

Grade agent trajectories and tool-use decisions with AgentEvals

Score whether an agent took a sensible intermediate path, called tools correctly, and reached the outcome without relying only on final-answer checks.

Code Quality & ReviewCustom Agents
Code Quality & Review Custom Agents Security Reviewed
⭐ 550 GitHub stars ⬇ 251k/wk npm
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill grade-agent-trajectories-and-tool-use-decisions-with-agentevals Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Tools required
Python or TypeScript runtime, agent run outputs or trajectories, optional LLM judge provider
Install & setup
pip install agentevals or npm install agentevals @langchain/core, then pass captured agent trajectories into the provided evaluators.
Author
LangChain
Publisher
Open Source Project
Last updated
Apr 19, 2026
Quick brief

Use AgentEvals when you need to judge the path an agent took, not just whether the final answer looked good. The upstream package is specifically about evaluating agent trajectories, including message sequences, tool calls, graph paths, and LLM-as-judge scoring.

How it works

What this skill actually does

Invoke this instead of a general observability stack or broad eval product when the immediate job is trajectory grading inside tests or evaluation suites. The scope boundary is tight: AgentEvals evaluates agent steps and tool-use paths. It is not a general framework, hosted platform, or catch-all agent builder listing.