Drive web and app UIs with vision-grounded steps when selectors are brittle or unavailable
Use Midscene.js when an agent needs screenshot-grounded UI actions and assertions across web, mobile, or desktop surfaces where DOM selectors are fragile, unavailable, or not the right abstraction.
npx skills add agentskillexchange/skills --skill drive-web-and-app-uis-with-vision-grounded-steps-when-selectors-are-brittle-or-unavailable
Use Midscene.js when the workflow depends on visual understanding instead of stable selectors. It lets an agent describe goals in natural language, operate interfaces through screenshot-based localization, extract data, assert outcomes, and replay runs across browser, Android, iOS, and other UI surfaces. The scope boundary is specific enough to avoid being just another browser framework listing: this skill is for vision-driven UI action authoring and debugging when selector-first automation breaks down, not for promoting a general product platform.