Computer Use Skill

Step 1: Environment Setup

Before using Computer Use, ensure proper sandboxed environment:

Docker Container (Recommended):

Use Anthropic's reference container

docker run -it --rm
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
-p 5900:5900 -p 8501:8501
ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

Virtual Machine: Dedicated VM with minimal privileges and isolated network

Never run on host machine with access to sensitive data or credentials

Step 2: Tool Configuration

Configure the computer use tool with display settings:

const computerTool = { type: 'computer_20250124', // or "computer_20251124" for Opus 4.5 name: 'computer', display_width_px: 1024, display_height_px: 768, display_number: 1, };

Resolution Guidelines:

XGA (1024x768): Default, works well for most tasks
WXGA (1280x800): Better for wide content
1920x1080: Only if needed, may reduce accuracy

Step 3: Agent Loop Implementation

The core pattern for computer use is an agent loop:

async function computerUseAgentLoop(task, maxIterations = 50) { const messages = [{ role: 'user', content: task }];

for (let i = 0; i < maxIterations; i++) { // 1. Call Claude with computer tool const response = await anthropic.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 4096, tools: [computerTool], messages, betas: ['computer-use-2025-01-24'], });

// 2. Check if task complete
if (response.stop_reason === 'end_turn') {
  return extractFinalResult(response);
}

// 3. Process tool use requests
const toolResults = [];
for (const block of response.content) {
  if (block.type === 'tool_use' &#x26;&#x26; block.name === 'computer') {
    const result = await executeComputerAction(block.input);
    toolResults.push({
      type: 'tool_result',
      tool_use_id: block.id,
      content: result,
    });
  }
}

// 4. Add assistant response and tool results
messages.push({ role: 'assistant', content: response.content });
messages.push({ role: 'user', content: toolResults });

}

throw new Error('Max iterations reached'); }

Step 4: Action Execution

Execute computer actions based on Claude's requests:

async function executeComputerAction(input) { const { action, coordinate, text, scroll_direction, scroll_amount } = input;

switch (action) { case 'screenshot': return await captureScreenshot();

case 'left_click':
  await click(coordinate[0], coordinate[1]);
  return await captureScreenshot();

case 'type':
  await typeText(text);
  return await captureScreenshot();

case 'key':
  await pressKey(text);
  return await captureScreenshot();

case 'mouse_move':
  await moveMouse(coordinate[0], coordinate[1]);
  return await captureScreenshot();

case 'scroll':
  await scroll(coordinate, scroll_direction, scroll_amount);
  return await captureScreenshot();

case 'left_click_drag':
  await drag(input.start_coordinate, coordinate);
  return await captureScreenshot();

case 'wait':
  await sleep(input.duration * 1000);
  return await captureScreenshot();

default:
  throw new Error(`Unknown action: ${action}`);

} }

Step 5: Coordinate Scaling

When display resolution differs from tool configuration:

function scaleCoordinates(x, y, fromWidth, fromHeight, toWidth, toHeight) { return [Math.round((x * toWidth) / fromWidth), Math.round((y * toHeight) / fromHeight)]; }

// Example: Scale from 1024x768 to actual 1920x1080 const [scaledX, scaledY] = scaleCoordinates( 500, 400, // Claude's coordinates 1024, 768, // Tool configuration 1920, 1080 // Actual display );

</execution_process>

<best_practices>

Sandboxed Execution: ALWAYS run in Docker container or VM with minimal privileges. Never grant access to sensitive data, authentication credentials, or unrestricted internet.

Human Confirmation: Implement human-in-the-loop confirmation for meaningful actions like form submissions, file deletions, or external communications.

Prompt Injection Protection: Be aware that malicious content in screenshots can attempt to manipulate Claude. Validate actions against the original task.

Resolution Consistency: Keep display resolution consistent throughout a session. XGA (1024x768) provides best balance of accuracy and visibility.

Screenshot After Actions: Always return a screenshot after each action so Claude can verify the result and determine next steps.

Error Recovery: Implement graceful error handling. If an action fails, capture screenshot and let Claude decide how to proceed.

Rate Limiting: Add delays between rapid actions to allow UI to update. Use the wait action when needed.

Beta Headers: Always include the appropriate beta header for your model version.

</best_practices>

<code_example> Complete Agent Loop Example (Node.js)

const Anthropic = require('@anthropic-ai/sdk');

const anthropic = new Anthropic();

// Tool configuration const computerTool = { type: 'computer_20250124', name: 'computer', display_width_px: 1024, display_height_px: 768, display_number: 1, };

// Main agent loop async function runComputerUseTask(task) { console.log(Starting task: ${task});

const messages = [{ role: 'user', content: task }]; let iterations = 0; const maxIterations = 50;

while (iterations < maxIterations) { iterations++; console.log(Iteration ${iterations});

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  tools: [computerTool],
  messages,
  betas: ['computer-use-2025-01-24'],
});

// Check for completion
if (response.stop_reason === 'end_turn') {
  const textBlocks = response.content.filter(b => b.type === 'text');
  return textBlocks.map(b => b.text).join('\n');
}

// Process tool calls
const toolResults = [];
for (const block of response.content) {
  if (block.type === 'tool_use' &#x26;&#x26; block.name === 'computer') {
    console.log(`Action: ${block.input.action}`);

    // Execute action and get screenshot
    const screenshot = await executeAction(block.input);

    toolResults.push({
      type: 'tool_result',
      tool_use_id: block.id,
      content: [
        {
          type: 'image',
          source: {
            type: 'base64',
            media_type: 'image/png',
            data: screenshot,
          },
        },
      ],
    });
  }
}

messages.push({ role: 'assistant', content: response.content });
messages.push({ role: 'user', content: toolResults });

}

throw new Error('Task did not complete within iteration limit'); }

// Example usage runComputerUseTask('Open the calculator app and compute 25 * 47') .then(result => console.log('Result:', result)) .catch(err => console.error('Error:', err));

</code_example>

<code_example> Python Implementation

import anthropic import base64 from typing import Any

client = anthropic.Anthropic()

COMPUTER_TOOL = { "type": "computer_20250124", "name": "computer", "display_width_px": 1024, "display_height_px": 768, "display_number": 1 }

def run_computer_use_task(task: str, max_iterations: int = 50) -> str: """Execute a computer use task with agent loop.""" messages = [{"role": "user", "content": task}]

for iteration in range(max_iterations):
    print(f"Iteration {iteration + 1}")

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        tools=[COMPUTER_TOOL],
        messages=messages,
        betas=["computer-use-2025-01-24"]
    )

    # Check for completion
    if response.stop_reason == "end_turn":
        text_blocks = [b.text for b in response.content if b.type == "text"]
        return "\n".join(text_blocks)

    # Process tool calls
    tool_results = []
    for block in response.content:
        if block.type == "tool_use" and block.name == "computer":
            print(f"Action: {block.input['action']}")

            # Execute action in your environment
            screenshot_b64 = execute_action(block.input)

            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": [{
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": screenshot_b64
                    }
                }]
            })

    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": tool_results})

raise Exception("Max iterations reached")

def execute_action(action_input: dict[str, Any]) -> str: """Execute computer action and return screenshot as base64.""" action = action_input["action"]

# Implement using your automation framework
# (pyautogui, pynput, or container-specific tools)

if action == "screenshot":
    pass  # Just capture
elif action == "left_click":
    x, y = action_input["coordinate"]
    # click(x, y)
elif action == "type":
    text = action_input["text"]
    # type_text(text)
elif action == "key":
    key = action_input["text"]
    # press_key(key)
# ... handle other actions

# Capture and return screenshot
return capture_screenshot_base64()

</code_example>

<usage_example> Available Actions Reference

// Basic actions (all versions) { "action": "screenshot" } { "action": "left_click", "coordinate": [500, 300] } { "action": "type", "text": "Hello, world!" } { "action": "key", "text": "ctrl+s" } { "action": "mouse_move", "coordinate": [500, 300] }

// Enhanced actions (computer_20250124) { "action": "scroll", "coordinate": [500, 400], "scroll_direction": "down", "scroll_amount": 3 } { "action": "left_click_drag", "start_coordinate": [100, 100], "coordinate": [300, 300] } { "action": "right_click", "coordinate": [500, 300] } { "action": "middle_click", "coordinate": [500, 300] } { "action": "double_click", "coordinate": [500, 300] } { "action": "triple_click", "coordinate": [500, 300] } { "action": "left_mouse_down", "coordinate": [500, 300] } { "action": "left_mouse_up", "coordinate": [500, 300] } { "action": "hold_key", "text": "shift", "duration": 1.0 } { "action": "wait", "duration": 2.0 }

// Opus 4.5 only (computer_20251124 with enable_zoom: true) { "action": "zoom", "coordinate": [500, 300], "zoom_direction": "in", "zoom_amount": 2 }

</usage_example>

<usage_example> Docker Reference Container

Pull and run the Anthropic reference container

docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

Run with API key

docker run -it --rm
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
-v $(pwd)/output:/home/computeruse/output
-p 5900:5900
-p 8501:8501
-p 6080:6080
-p 8080:8080
ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest

Access points:

- VNC: localhost:5900 (password: "secret")

- noVNC web: http://localhost:6080/vnc.html

- Streamlit UI: http://localhost:8501

- API: http://localhost:8080

</usage_example>

Tool Versions

Version Beta Header Model Support Key Features

computer_20250124

computer-use-2025-01-24

Sonnet, Haiku Enhanced actions (scroll, drag, wait, etc.)

computer_20251124

computer-use-2025-11-24

Opus 4.5 only Zoom action (requires enable_zoom: true )

Security Requirements

CRITICAL: Sandboxing is Mandatory

Computer Use provides direct control over a computer environment. NEVER run without proper sandboxing:

Use dedicated containers/VMs - Never on host machines with sensitive data
Minimal privileges - No root access, limited filesystem access
Network isolation - Restrict or block internet access
No credentials - Never expose API keys, passwords, or tokens in the environment
Human oversight - Require confirmation for destructive or external actions

Prompt Injection Risks

Malicious content displayed on screen can attempt to manipulate Claude:

Validate that actions align with the original task
Implement allowlists for permitted applications/websites
Monitor for suspicious instruction patterns in screenshots

Error Handling

Error Cause Resolution

invalid_request_error

Missing beta header Add betas: ["computer-use-2025-01-24"]

tool_use_error

Invalid coordinates Ensure coordinates within display bounds

rate_limit_error

Too many requests Implement exponential backoff

Action has no effect UI not ready Add wait action before retrying

Wrong element clicked Coordinate drift Re-capture screenshot and recalculate

Integration with Agents

Primary Agents

developer: Automated testing, UI verification
qa: End-to-end testing, visual regression
devops-troubleshooter: System debugging, log inspection

Use Cases

Automated form filling and data entry
Application testing and QA automation
Desktop application interaction
Browser automation (when headless won't work)
Legacy system integration
Visual verification and screenshots

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md

Check for:

Previous computer use configurations
Known automation patterns
Environment-specific settings

After completing:

New pattern discovered -> .claude/context/memory/learnings.md
Security concern found -> .claude/context/memory/issues.md
Architecture decision -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: If it's not in memory, it didn't happen.

Anthropic Documentation: https://docs.anthropic.com/en/docs/build-with-claude/computer-use
Reference Implementation: https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo
API Reference: https://docs.anthropic.com/en/api/computer-use

computer-use

Safety Notice

Copy this and send it to your AI assistant to learn

Use Anthropic's reference container

Pull and run the Anthropic reference container

Run with API key

Access points:

- VNC: localhost:5900 (password: "secret")

- noVNC web: http://localhost:6080/vnc.html

- Streamlit UI: http://localhost:8501

- API: http://localhost:8080

Source Transparency

Related Skills

filesystem

slack-notifications

chrome-browser