Anthropic Expands "Computer Use" Capabilities

AnthropicComputer UseAgent UI

Anthropic improves Claude's ability to interact with standard computer interfaces, enhancing visual grounding and multi-app orchestration.

Anthropic Expands "Computer Use" Capabilities

Overview

Anthropic has released significant updates to its "Computer Use" feature, enabling Claude to interact with standard computer interfaces more reliably and with lower latency. This allows Claude to move a cursor, click buttons, and type text across any application, not just the web browser.

New Enhancements

1. Improved Visual Grounding

The latest updates improve Claude's ability to map coordinates of UI elements more accurately. This reduces "miss-clicks" and allows the agent to interact with denser interfaces (e.g., professional IDEs, CAD software).

2. Multi-App Orchestration

Claude can now switch between applications seamlessly. For example, it can read a requirement from Slack, open a local terminal to run a test, and then update a Jira ticket with the results.

3. Latency Reduction

Optimizations in the vision-action loop have reduced the delay between the model's perception of the screen and the execution of the action, making the agent feel more responsive.

Use Cases

  • Software Testing: Automatically running E2E tests across a desktop environment.
  • Data Entry: Migrating data between legacy desktop apps that lack APIs.
  • Developer Productivity: Automating repetitive setup tasks across multiple tools.

Security and Safety

Anthropic emphasizes a "human-gated" approach. Users can monitor the agent's screen in real-time and can instantly kill the session if the agent behaves unexpectedly.

Resources