Skill Detail

Run multimodal computer and browser operator workflows with UI-TARS Desktop

Use UI-TARS Desktop when an operator needs a local or remote GUI agent to drive desktop, browser, and computer-use tasks with multimodal model support.

Browser AutomationMulti-Framework
Browser Automation Multi-Framework Security Reviewed
⭐ 36.3k GitHub stars
INSTALL WITH ANY AGENT
npx skills add agentskillexchange/skills --skill run-multimodal-computer-and-browser-operator-workflows-with-ui-tars-desktop Copy
Works best when you want a reusable capability, not another fragile one-off prompt.
At a glance
Tools required
UI-TARS Desktop or Agent TARS, supported multimodal model credentials or local model setup, browser or desktop environment, optional MCP tools
Install & setup
Follow the Agent TARS and UI-TARS Desktop quick-start documentation to install the desktop/operator stack, configure either a local or remote operator, connect the intended multimodal model provider, and test with a supervised browser or desktop task before allowing broader automation.
Author
ByteDance
Publisher
Organization
Last updated
Jun 12, 2026
Quick brief

Use UI-TARS Desktop when an agent needs to operate graphical applications, browsers, or a remote computer through a bounded computer-use workflow. The operator configures the local or remote UI-TARS/Agent TARS environment, connects the chosen multimodal model and MCP tools, and lets the agent observe screens, plan steps, and execute GUI/browser actions under supervision. Invoke this instead of using a product normally when the task requires repeatable computer-use automation, browser/desktop operation, or evaluation of GUI-agent behavior across local and remote operator modes. The scope boundary is the UI-TARS Desktop and Agent TARS operator stack for multimodal GUI/browser automation, not a generic model framework or desktop app listing.

How it works

What this skill actually does

Inputs and prerequisites: UI-TARS Desktop or Agent TARS, supported multimodal model credentials or local model setup, browser or desktop environment, optional MCP tools.

Setup notes: Follow the Agent TARS and UI-TARS Desktop quick-start documentation to install the desktop/operator stack, configure either a local or remote operator, connect the intended multimodal model provider, and test with a supervised browser or desktop task before allowing broader automation.

Source and verification boundary: use https://agent-tars.com as the canonical reference before running the workflow; keep commands, API calls, CLI usage, and generated outputs reviewable against that upstream source.

Framework fit: publish this as a Multi-Framework workflow only when the operator can invoke the documented toolchain directly, rather than treating the upstream project as a generic product listing.