Run multimodal computer and browser operator workflows with UI-TARS Desktop
Use UI-TARS Desktop when an operator needs a local or remote GUI agent to drive desktop, browser, and computer-use tasks with multimodal model support.
npx skills add agentskillexchange/skills --skill run-multimodal-computer-and-browser-operator-workflows-with-ui-tars-desktop
Use UI-TARS Desktop when an agent needs to operate graphical applications, browsers, or a remote computer through a bounded computer-use workflow. The operator configures the local or remote UI-TARS/Agent TARS environment, connects the chosen multimodal model and MCP tools, and lets the agent observe screens, plan steps, and execute GUI/browser actions under supervision. Invoke this instead of using a product normally when the task requires repeatable computer-use automation, browser/desktop operation, or evaluation of GUI-agent behavior across local and remote operator modes. The scope boundary is the UI-TARS Desktop and Agent TARS operator stack for multimodal GUI/browser automation, not a generic model framework or desktop app listing.
What this skill actually does
Inputs and prerequisites: UI-TARS Desktop or Agent TARS, supported multimodal model credentials or local model setup, browser or desktop environment, optional MCP tools.
Setup notes: Follow the Agent TARS and UI-TARS Desktop quick-start documentation to install the desktop/operator stack, configure either a local or remote operator, connect the intended multimodal model provider, and test with a supervised browser or desktop task before allowing broader automation.
Source and verification boundary: use https://agent-tars.com as the canonical reference before running the workflow; keep commands, API calls, CLI usage, and generated outputs reviewable against that upstream source.
Framework fit: publish this as a Multi-Framework workflow only when the operator can invoke the documented toolchain directly, rather than treating the upstream project as a generic product listing.