SYSTEM ONLINE

UI-Vision Suite

A comprehensive collection of datasets, models, and research papers for advancing desktop automation and UI understanding

10K+ Tasks ℹ️
70K+ Actions ℹ️
87 Platforms ℹ️

Featured Projects

New

StarUI

● In Development Coming Soon

A compact AI agent designed to achieve state-of-the-art performance on UI-Vision and OS-World-G benchmarks using precise, task-oriented supervision and the UI-Vision-Ground dataset.

3.5M+ Elements
470K Instruction Pairs
87 Applications
Desktop Agents Grounding Instruction Following

UI-Vision Suite Datasets

Comprehensive datasets for building the next generation of intelligent computer automation agents

Coming Soon

UI-Vision-Actions

Comprehensive action dataset with 10K+ detailed action trajectories, 70K+ atomic actions (Click, Drag, Press), and rich chain-of-thought traces for every user action.

10K+ Tasks
70K+ Actions
87 Platforms
Chain-of-Thought Atomic Actions Action Planning

UI-Vision Suite Ecosystem

The comprehensive ecosystem for building powerful computer use agents

87 Diverse Platforms

Open source applications across 6 categories: Education, Browsers, Development, Productivity, Creativity, Entertainment

Coverage: 6 major software categories
Diversity: From simple calculators to complex IDEs
Real-world: Actual user workflows and scenarios

10K+ User Tasks

Complex multi-step workflows with detailed chain-of-thought reasoning for every action

Actions: 70K+ atomic actions (Click, Drag, Press, etc.)
Reasoning: CoT traces for every user decision
Example: "I need to access settings → Click the 3-line menu → Navigate to privacy..."

3.5M Elements

60K screenshots densely annotated with precise bounding boxes and element metadata

Density: ~58 elements per screenshot average
Precision: Pixel-perfect bounding boxes
Metadata: Element names, types, and relationships

10K+ Videos

High-quality screen recordings with synchronized action annotations and timing data

Quality: HD recordings with precise timestamps
Annotations: Action coordinates and element interactions
Training: Ready for behavioral cloning and imitation learning

Community Usage & Extensions

Research and projects using UI-Vision resources and building on top of them

"Large Language Model-brained GUI Agents: A Survey"

Survey

Zhang et al., arXiv 2024

Comprehensive survey of LLM-based GUI agents, discussing the evolution of visual perception and interaction in GUI environments.

"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"

Research

Yuan et al., arXiv 2025

Advances GUI agent grounding capabilities through self-evolutionary reinforcement learning, building on visual understanding benchmarks.

"OpenCUA: Open Foundations for Computer-Use Agents"

Framework

Wang et al., arXiv 2025

Establishes open foundations for computer-use agents, leveraging vision-language models for diverse computer task automation.

Submit Your Work

Open

Community Contributions

Using UI-Vision resources in your research? Share your work with the community.

Get Involved

Interested in collaborating or contributing to our research? We'd love to hear from you.