Skip to content

Computer Use Guide

Modified Version: This feature is a heavily modified version of the Computer Use (internal codename "Chicago") found in the leaked Claude Code source. The official implementation relies on Anthropic's private native modules (@ant/computer-use-swift, @ant/computer-use-input) that are not publicly available. We replaced the entire underlying operation layer with a Python bridge: macOS uses pyautogui + mss + pyobjc, and Windows uses pyautogui + mss + win32gui + psutil.


Table of Contents


Overview

Computer Use allows AI models to directly control your computer — taking screenshots, moving the mouse, clicking buttons, typing text, and managing application windows.

24 MCP tools are available:

CategoryTools
Screenshotscreenshot, zoom
Mouseleft_click, right_click, middle_click, double_click, triple_click, left_click_drag, mouse_move, left_mouse_down, left_mouse_up, cursor_position, scroll
Keyboardtype, key, hold_key
Appsopen_application, switch_display
Permissionsrequest_access, list_granted_applications
Clipboardread_clipboard, write_clipboard
Otherwait, computer_batch

Supported Platforms

PlatformArchitectureStatusNotes
macOSApple Silicon (M1/M2/M3/M4)✅ Fully supportedRecommended
macOSIntel x86_64✅ Fully supported
Windowsx64✅ Fully supportedUses win32gui + psutil + pyperclip + screeninfo instead of macOS APIs
LinuxAny⚠️ Theoretically possibleSame as above — pyobjc needs to be replaced with wmctrl + xdotool. Not yet adapted

Requirements

  • Bun >= 1.1.0
  • Python >= 3.8 (venv and dependencies are auto-installed on first use)
  • macOS permissions: Accessibility + Screen Recording
  • Windows: no extra OS permission setup

How It Works

Computer Use operates through a screenshot → analyze → act feedback loop:

┌────────────────────────────────────────────────────┐
│  AI Model (Claude / any Anthropic-protocol model)   │
│                                                     │
│  1. Receives user request: "open Music app"         │
│  2. Calls screenshot tool → receives screen image   │
│  3. Model analyzes pixels, identifies UI elements   │
│     → "search box is at (756, 342)"                 │
│  4. Calls left_click { coordinate: [756, 342] }     │
│  5. Calls type { text: "search query" }             │
│  6. Calls screenshot again → verify → next step...  │
└───────────────┬────────────────────────────────────┘
                │ MCP Tool Call

┌────────────────────────────────────────────────────┐
│  TypeScript Tool Layer (vendor/computer-use-mcp)    │
│  - Security checks (app allowlist, TCC permissions) │
│  - Coordinate transformation                        │
│  - Tool dispatch → executor                         │
└───────────────┬────────────────────────────────────┘
                │ callPythonHelper()

┌────────────────────────────────────────────────────┐
│  Python Bridge                                      │
│  macOS: runtime/mac_helper.py                       │
│  Windows: runtime/win_helper.py                     │
│  pyautogui.click(756, 342)   ← mouse control        │
│  mss.grab(monitor)           ← screenshot            │
│  NSWorkspace / win32gui      ← app management        │
└────────────────────────────────────────────────────┘

Key: Coordinate analysis is performed entirely by the model's vision capabilities — it "sees" the screenshot like a human sees a screen, identifying buttons, text fields, and other UI elements directly from pixels.


Quick Start

1. Install dependencies

bash
bun install

2. Ensure Python 3 is available

bash
python3 --version  # >= 3.8 required

Python dependencies are automatically installed into .runtime/venv/ on first Computer Use invocation.

3. Grant macOS permissions

Accessibility:

bash
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"

Add your terminal app (iTerm, Terminal, Ghostty, etc.) to the allow list.

Screen Recording:

bash
open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"

Add your terminal app as well. You may need to restart your terminal after granting permission.

4. Start

bash
./bin/claude-haha

5. Use

Just ask in natural language:

> Take a screenshot of my desktop
> Open Safari and search for something
> Type "hello" in the text editor

Disable Computer Use

If you only want the regular Coding Agent and do not want to expose computer-use MCP tools, disable it with either command:

bash
claude-haha --no-computer-use
CLAUDE_COMPUTER_USE_ENABLED=0 claude-haha

You can also write the global config file at ~/.claude/cc-haha/computer-use-config.json:

json
{
  "enabled": false
}

The desktop Settings > Computer Use switch writes the same config. Once disabled, new sessions will not inject the dynamic computer-use MCP server or add its desktop-control tools to allowedTools.


Security

MechanismDescription
App allowlistEach session requires explicit authorization for which apps Claude can interact with
Concurrency lockOnly one Claude session can use Computer Use at a time (file lock)
Clipboard guardOriginal clipboard content is saved and restored when typing via clipboard
Sensitive action gatesSystem keyboard shortcuts require additional authorization

Note: Since we replaced the native modules with Python bridge, the global Escape hotkey abort and auto-hide features from the original implementation are not available. Use Ctrl+C to abort instead.


Environment Variables

VariableDefaultDescription
CLAUDE_COMPUTER_USE_ENABLED1Set to 0 to disable Computer Use
CLAUDE_COMPUTER_USE_COORDINATE_MODEpixelsCoordinate mode: pixels or normalized_0_100
CLAUDE_COMPUTER_USE_CLIPBOARD_PASTE1Enable clipboard-based text input
CLAUDE_COMPUTER_USE_MOUSE_ANIMATION1Enable mouse animation
CLAUDE_COMPUTER_USE_DEBUG0Debug mode

Technical Architecture

Gate Bypass

The official Claude Code gates Computer Use behind three layers:

LayerOriginal MechanismOur Approach
Compile-timefeature('CHICAGO_MCP') (Bun macro)Replaced with true
SubscriptionhasRequiredSubscription() (Max/Pro only)getChicagoEnabled() returns true directly
Remote configGrowthBook tengu_malort_pedwaySame — no remote dependency
Default-disabledisDefaultDisabledBuiltin('computer-use')Returns false

Python Bridge

On first invocation, the bridge automatically:

  1. Creates a Python virtual environment (.runtime/venv/)
  2. Installs pip
  3. Installs dependencies (mss, Pillow, pyautogui, pyobjc-*)
  4. Validates via SHA256 hash (only reinstalls when requirements.txt changes)

Approaches We Tried

Approach 1: Extract native .node modules from Claude Code binary ❌

Extracted computer-use-swift.node and computer-use-input.node from the installed Claude Code Mach-O binary. Synchronous methods worked, but async Swift methods (screenshot) hung due to N-API async incompatibility between Bun versions.

Approach 2: Create empty stub packages ❌

Stub packages allowed compilation but provided no actual functionality.

Approach 3: Python Bridge ✅ (current)

Replaced all native module calls with Python subprocess calls via callPythonHelper(). Zero binary dependencies, auto-bootstrapping, full functionality on any macOS.


Known Limitations

LimitationDescription
Linux not adaptedLinux needs wmctrl + xdotool style platform integration
No global Escape abortOriginal used CGEventTap; use Ctrl+C instead
No auto-hide windowsOriginal's prepareDisplay relied on Swift
Slightly higher latency~100ms Python process startup overhead per call

References and Credits

ProjectLicenseContribution
wimi321/macos-computer-use-skillMITPython bridge architecture, mac_helper.py runtime, executor adaptation
domdomegg/computer-use-mcpMITIndependent Computer Use MCP server (nut.js based), used as reference
paoloanzn/free-code-Feature flag system analysis
oboard/claude-code-rev-Early leaked source restoration, stub package reference

Underlying Libraries

LibraryPurpose
pyautoguiMouse and keyboard control
mssScreenshot capture
PillowImage processing and compression
pyobjcmacOS Cocoa/Quartz framework bindings

Released under the MIT License.