Anthropic Expands "Computer Use" Capabilities
Anthropic improves Claude's ability to interact with standard computer interfaces, enhancing visual grounding and multi-app orchestration.
Anthropic Expands "Computer Use" Capabilities
Overview
Anthropic has released significant updates to its "Computer Use" feature, enabling Claude to interact with standard computer interfaces more reliably and with lower latency. This allows Claude to move a cursor, click buttons, and type text across any application, not just the web browser.
New Enhancements
1. Improved Visual Grounding
The latest updates improve Claude's ability to map coordinates of UI elements more accurately. This reduces "miss-clicks" and allows the agent to interact with denser interfaces (e.g., professional IDEs, CAD software).
2. Multi-App Orchestration
Claude can now switch between applications seamlessly. For example, it can read a requirement from Slack, open a local terminal to run a test, and then update a Jira ticket with the results.
3. Latency Reduction
Optimizations in the vision-action loop have reduced the delay between the model's perception of the screen and the execution of the action, making the agent feel more responsive.
Use Cases
- Software Testing: Automatically running E2E tests across a desktop environment.
- Data Entry: Migrating data between legacy desktop apps that lack APIs.
- Developer Productivity: Automating repetitive setup tasks across multiple tools.
Security and Safety
Anthropic emphasizes a "human-gated" approach. Users can monitor the agent's screen in real-time and can instantly kill the session if the agent behaves unexpectedly.
