Skill Detail

Run multimodal computer and browser operator workflows with UI-TARS Desktop

Use UI-TARS Desktop when an operator needs a local or remote GUI agent to drive desktop, browser, and computer-use tasks with multimodal model support.

Browser AutomationMulti-Framework

Browser Automation Multi-Framework Security Reviewed

⭐ 36.3k GitHub stars

INSTALL WITH ANY AGENT

npx skills add agentskillexchange/skills --skill run-multimodal-computer-and-browser-operator-workflows-with-ui-tars-desktop

Copy

Works best when you want a reusable capability, not another fragile one-off prompt.

View source Documentation

At a glance

Tools required

UI-TARS Desktop or Agent TARS, supported multimodal model credentials or local model setup, browser or desktop environment, optional MCP tools

Install & setup

Follow the Agent TARS and UI-TARS Desktop quick-start documentation to install the desktop/operator stack, configure either a local or remote operator, connect the intended multimodal model provider, and test with a supervised browser or desktop task before allowing broader automation.

Author

ByteDance

Publisher

Organization

Last updated

Jun 12, 2026

Quick brief

Use UI-TARS Desktop when an agent needs to operate graphical applications, browsers, or a remote computer through a bounded computer-use workflow. The operator configures the local or remote UI-TARS/Agent TARS environment, connects the chosen multimodal model and MCP tools, and lets the agent observe screens, plan steps, and execute GUI/browser actions under supervision. Invoke this instead of using a product normally when the task requires repeatable computer-use automation, browser/desktop operation, or evaluation of GUI-agent behavior across local and remote operator modes. The scope boundary is the UI-TARS Desktop and Agent TARS operator stack for multimodal GUI/browser automation, not a generic model framework or desktop app listing.

How it works

What this skill actually does

Inputs and prerequisites: UI-TARS Desktop or Agent TARS, supported multimodal model credentials or local model setup, browser or desktop environment, optional MCP tools.

Setup notes: Follow the Agent TARS and UI-TARS Desktop quick-start documentation to install the desktop/operator stack, configure either a local or remote operator, connect the intended multimodal model provider, and test with a supervised browser or desktop task before allowing broader automation.

Source and verification boundary: use https://agent-tars.com as the canonical reference before running the workflow; keep commands, API calls, CLI usage, and generated outputs reviewable against that upstream source.

Framework fit: publish this as a Multi-Framework workflow only when the operator can invoke the documented toolchain directly, rather than treating the upstream project as a generic product listing.

Best fit

When to reach for it

Best when the job fits Browser Automation.
Works naturally with Multi-Framework setups.
Requires UI-TARS Desktop or Agent TARS, supported multimodal model credentials or….
Installation is straightforward: Follow the Agent TARS and UI-TARS Desktop quick-start documentation to install the desktop/operator stack, configure either…

Trust & provenance

Why this listing is credible

Trust status: Security Reviewed.
36.3k GitHub stars on the linked upstream source.
Last updated Jun 12, 2026.

View source ↗ Documentation ↗