Train agent policies with rLLM reinforcement learning
Use rLLM to evaluate, trace, reward, and train LLM agents with reinforcement learning across common agent frameworks.
npx skills add agentskillexchange/skills --skill train-agent-policies-with-rllm-reinforcement-learning
Use rLLM when an operator wants to improve an agent through reinforcement learning rather than only prompt edits. The workflow is to wrap or route the existing agent through rLLM, define a reward or evaluator, run CLI benchmarks or custom rollouts, collect traces, and train against the selected backend.
What this skill actually does
Invoke this when an agent has measurable task outcomes and needs repeatable eval-to-training loops across frameworks such as LangGraph, SmolAgent, Strands, OpenAI Agents SDK, Google ADK, or plain OpenAI clients. The boundary is RL training and benchmark workflow for agents, not a generic ML training framework or model library.
Inputs are task definitions, rollout code, model gateway settings, reward functions, and benchmark suites. Outputs are traces, scores, reward summaries, and trained checkpoints that the operator can review before promoting a policy into an agent workflow or continuing another training run.
Inputs and prerequisites: Python 3.11 or newer, rLLM, agent code or benchmark task, reward/evaluator function, optional Tinker or verl training backend.
Setup notes: Install rLLM from the GitHub package with uv or pip, run a benchmark with rllm eval or wrap an existing agent rollout, define an evaluator, then launch training with the selected backend.
Source and verification boundary: use https://docs.rllm-project.com as the canonical reference before running the workflow; keep commands, API calls, CLI usage, and generated outputs reviewable against that upstream source.
Framework fit: publish this as a Multi-Framework workflow only when the operator can invoke the documented toolchain directly, rather than treating the upstream project as a generic product listing.