Skip to content

Computer Use Guide

Modified Version: This feature is a heavily modified version of the Computer Use (internal codename "Chicago") found in the leaked Claude Code source. The official implementation relies on Anthropic's private native modules (@ant/computer-use-swift, @ant/computer-use-input) that are not publicly available. We replaced the entire underlying operation layer with a Python bridge (pyautogui + mss + pyobjc), enabling anyone to run Computer Use on macOS.


Table of Contents


Overview

Computer Use allows AI models to directly control your computer — taking screenshots, moving the mouse, clicking buttons, typing text, and managing application windows.

24 MCP tools are available:

CategoryTools
Screenshotscreenshot, zoom
Mouseleft_click, right_click, middle_click, double_click, triple_click, left_click_drag, mouse_move, left_mouse_down, left_mouse_up, cursor_position, scroll
Keyboardtype, key, hold_key
Appsopen_application, switch_display
Permissionsrequest_access, list_granted_applications
Clipboardread_clipboard, write_clipboard
Otherwait, computer_batch

Supported Platforms

PlatformArchitectureStatusNotes
macOSApple Silicon (M1/M2/M3/M4)✅ Fully supportedRecommended
macOSIntel x86_64✅ Fully supported
WindowsAny⚠️ Theoretically possibleCore libs (pyautogui + mss) are cross-platform, but pyobjc parts (app management) need to be replaced with win32com. Not yet adapted
LinuxAny⚠️ Theoretically possibleSame as above — pyobjc needs to be replaced with wmctrl + xdotool. Not yet adapted

Requirements

  • Bun >= 1.1.0
  • Python >= 3.8 (venv and dependencies are auto-installed on first use)
  • macOS permissions: Accessibility + Screen Recording

How It Works

Computer Use operates through a screenshot → analyze → act feedback loop:

┌────────────────────────────────────────────────────┐
│  AI Model (Claude / any Anthropic-protocol model)   │
│                                                     │
│  1. Receives user request: "open Music app"         │
│  2. Calls screenshot tool → receives screen image   │
│  3. Model analyzes pixels, identifies UI elements   │
│     → "search box is at (756, 342)"                 │
│  4. Calls left_click { coordinate: [756, 342] }     │
│  5. Calls type { text: "search query" }             │
│  6. Calls screenshot again → verify → next step...  │
└───────────────┬────────────────────────────────────┘
                │ MCP Tool Call

┌────────────────────────────────────────────────────┐
│  TypeScript Tool Layer (vendor/computer-use-mcp)    │
│  - Security checks (app allowlist, TCC permissions) │
│  - Coordinate transformation                        │
│  - Tool dispatch → executor                         │
└───────────────┬────────────────────────────────────┘
                │ callPythonHelper()

┌────────────────────────────────────────────────────┐
│  Python Bridge (runtime/mac_helper.py)              │
│  pyautogui.click(756, 342)   ← mouse control        │
│  mss.grab(monitor)           ← screenshot            │
│  NSWorkspace.open(bundleId)  ← app management        │
└────────────────────────────────────────────────────┘

Key: Coordinate analysis is performed entirely by the model's vision capabilities — it "sees" the screenshot like a human sees a screen, identifying buttons, text fields, and other UI elements directly from pixels.


Quick Start

1. Install dependencies

bash
bun install

2. Ensure Python 3 is available

bash
python3 --version  # >= 3.8 required

Python dependencies are automatically installed into .runtime/venv/ on first Computer Use invocation.

3. Grant macOS permissions

Accessibility:

bash
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"

Add your terminal app (iTerm, Terminal, Ghostty, etc.) to the allow list.

Screen Recording:

bash
open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"

Add your terminal app as well. You may need to restart your terminal after granting permission.

4. Start

bash
./bin/claude-haha

5. Use

Just ask in natural language:

> Take a screenshot of my desktop
> Open Safari and search for something
> Type "hello" in the text editor

Security

MechanismDescription
App allowlistEach session requires explicit authorization for which apps Claude can interact with
Concurrency lockOnly one Claude session can use Computer Use at a time (file lock)
Clipboard guardOriginal clipboard content is saved and restored when typing via clipboard
Sensitive action gatesSystem keyboard shortcuts require additional authorization

Note: Since we replaced the native modules with Python bridge, the global Escape hotkey abort and auto-hide features from the original implementation are not available. Use Ctrl+C to abort instead.


Environment Variables

VariableDefaultDescription
CLAUDE_COMPUTER_USE_ENABLED1Set to 0 to disable Computer Use
CLAUDE_COMPUTER_USE_COORDINATE_MODEpixelsCoordinate mode: pixels or normalized_0_100
CLAUDE_COMPUTER_USE_CLIPBOARD_PASTE1Enable clipboard-based text input
CLAUDE_COMPUTER_USE_MOUSE_ANIMATION1Enable mouse animation
CLAUDE_COMPUTER_USE_DEBUG0Debug mode

Technical Architecture

Gate Bypass

The official Claude Code gates Computer Use behind three layers:

LayerOriginal MechanismOur Approach
Compile-timefeature('CHICAGO_MCP') (Bun macro)Replaced with true
SubscriptionhasRequiredSubscription() (Max/Pro only)getChicagoEnabled() returns true directly
Remote configGrowthBook tengu_malort_pedwaySame — no remote dependency
Default-disabledisDefaultDisabledBuiltin('computer-use')Returns false

Python Bridge

On first invocation, the bridge automatically:

  1. Creates a Python virtual environment (.runtime/venv/)
  2. Installs pip
  3. Installs dependencies (mss, Pillow, pyautogui, pyobjc-*)
  4. Validates via SHA256 hash (only reinstalls when requirements.txt changes)

Approaches We Tried

Approach 1: Extract native .node modules from Claude Code binary ❌

Extracted computer-use-swift.node and computer-use-input.node from the installed Claude Code Mach-O binary. Synchronous methods worked, but async Swift methods (screenshot) hung due to N-API async incompatibility between Bun versions.

Approach 2: Create empty stub packages ❌

Stub packages allowed compilation but provided no actual functionality.

Approach 3: Python Bridge ✅ (current)

Replaced all native module calls with Python subprocess calls via callPythonHelper(). Zero binary dependencies, auto-bootstrapping, full functionality on any macOS.


Known Limitations

LimitationDescription
macOS onlyWindows/Linux need pyobjc replacements
No global Escape abortOriginal used CGEventTap; use Ctrl+C instead
No auto-hide windowsOriginal's prepareDisplay relied on Swift
Slightly higher latency~100ms Python process startup overhead per call

References and Credits

ProjectLicenseContribution
wimi321/macos-computer-use-skillMITPython bridge architecture, mac_helper.py runtime, executor adaptation
domdomegg/computer-use-mcpMITIndependent Computer Use MCP server (nut.js based), used as reference
paoloanzn/free-code-Feature flag system analysis
oboard/claude-code-rev-Early leaked source restoration, stub package reference

Underlying Libraries

LibraryPurpose
pyautoguiMouse and keyboard control
mssScreenshot capture
PillowImage processing and compression
pyobjcmacOS Cocoa/Quartz framework bindings

Released under the MIT License.