Computer Use Guide
Modified Version: This feature is a heavily modified version of the Computer Use (internal codename "Chicago") found in the leaked Claude Code source. The official implementation relies on Anthropic's private native modules (
@ant/computer-use-swift,@ant/computer-use-input) that are not publicly available. We replaced the entire underlying operation layer with a Python bridge (pyautogui+mss+pyobjc), enabling anyone to run Computer Use on macOS.
Table of Contents
- Overview
- Supported Platforms
- How It Works
- Quick Start
- Usage
- Security
- Environment Variables
- Technical Architecture
- Approaches We Tried
- Known Limitations
- References and Credits
Overview
Computer Use allows AI models to directly control your computer — taking screenshots, moving the mouse, clicking buttons, typing text, and managing application windows.
24 MCP tools are available:
| Category | Tools |
|---|---|
| Screenshot | screenshot, zoom |
| Mouse | left_click, right_click, middle_click, double_click, triple_click, left_click_drag, mouse_move, left_mouse_down, left_mouse_up, cursor_position, scroll |
| Keyboard | type, key, hold_key |
| Apps | open_application, switch_display |
| Permissions | request_access, list_granted_applications |
| Clipboard | read_clipboard, write_clipboard |
| Other | wait, computer_batch |
Supported Platforms
| Platform | Architecture | Status | Notes |
|---|---|---|---|
| macOS | Apple Silicon (M1/M2/M3/M4) | ✅ Fully supported | Recommended |
| macOS | Intel x86_64 | ✅ Fully supported | |
| Windows | Any | ⚠️ Theoretically possible | Core libs (pyautogui + mss) are cross-platform, but pyobjc parts (app management) need to be replaced with win32com. Not yet adapted |
| Linux | Any | ⚠️ Theoretically possible | Same as above — pyobjc needs to be replaced with wmctrl + xdotool. Not yet adapted |
Requirements
- Bun >= 1.1.0
- Python >= 3.8 (venv and dependencies are auto-installed on first use)
- macOS permissions: Accessibility + Screen Recording
How It Works
Computer Use operates through a screenshot → analyze → act feedback loop:
┌────────────────────────────────────────────────────┐
│ AI Model (Claude / any Anthropic-protocol model) │
│ │
│ 1. Receives user request: "open Music app" │
│ 2. Calls screenshot tool → receives screen image │
│ 3. Model analyzes pixels, identifies UI elements │
│ → "search box is at (756, 342)" │
│ 4. Calls left_click { coordinate: [756, 342] } │
│ 5. Calls type { text: "search query" } │
│ 6. Calls screenshot again → verify → next step... │
└───────────────┬────────────────────────────────────┘
│ MCP Tool Call
▼
┌────────────────────────────────────────────────────┐
│ TypeScript Tool Layer (vendor/computer-use-mcp) │
│ - Security checks (app allowlist, TCC permissions) │
│ - Coordinate transformation │
│ - Tool dispatch → executor │
└───────────────┬────────────────────────────────────┘
│ callPythonHelper()
▼
┌────────────────────────────────────────────────────┐
│ Python Bridge (runtime/mac_helper.py) │
│ pyautogui.click(756, 342) ← mouse control │
│ mss.grab(monitor) ← screenshot │
│ NSWorkspace.open(bundleId) ← app management │
└────────────────────────────────────────────────────┘Key: Coordinate analysis is performed entirely by the model's vision capabilities — it "sees" the screenshot like a human sees a screen, identifying buttons, text fields, and other UI elements directly from pixels.
Quick Start
1. Install dependencies
bun install2. Ensure Python 3 is available
python3 --version # >= 3.8 requiredPython dependencies are automatically installed into
.runtime/venv/on first Computer Use invocation.
3. Grant macOS permissions
Accessibility:
open "x-apple.systempreferences:com.apple.preference.security?Privacy_Accessibility"Add your terminal app (iTerm, Terminal, Ghostty, etc.) to the allow list.
Screen Recording:
open "x-apple.systempreferences:com.apple.preference.security?Privacy_ScreenCapture"Add your terminal app as well. You may need to restart your terminal after granting permission.
4. Start
./bin/claude-haha5. Use
Just ask in natural language:
> Take a screenshot of my desktop
> Open Safari and search for something
> Type "hello" in the text editorSecurity
| Mechanism | Description |
|---|---|
| App allowlist | Each session requires explicit authorization for which apps Claude can interact with |
| Concurrency lock | Only one Claude session can use Computer Use at a time (file lock) |
| Clipboard guard | Original clipboard content is saved and restored when typing via clipboard |
| Sensitive action gates | System keyboard shortcuts require additional authorization |
Note: Since we replaced the native modules with Python bridge, the global Escape hotkey abort and auto-hide features from the original implementation are not available. Use
Ctrl+Cto abort instead.
Environment Variables
| Variable | Default | Description |
|---|---|---|
CLAUDE_COMPUTER_USE_ENABLED | 1 | Set to 0 to disable Computer Use |
CLAUDE_COMPUTER_USE_COORDINATE_MODE | pixels | Coordinate mode: pixels or normalized_0_100 |
CLAUDE_COMPUTER_USE_CLIPBOARD_PASTE | 1 | Enable clipboard-based text input |
CLAUDE_COMPUTER_USE_MOUSE_ANIMATION | 1 | Enable mouse animation |
CLAUDE_COMPUTER_USE_DEBUG | 0 | Debug mode |
Technical Architecture
Gate Bypass
The official Claude Code gates Computer Use behind three layers:
| Layer | Original Mechanism | Our Approach |
|---|---|---|
| Compile-time | feature('CHICAGO_MCP') (Bun macro) | Replaced with true |
| Subscription | hasRequiredSubscription() (Max/Pro only) | getChicagoEnabled() returns true directly |
| Remote config | GrowthBook tengu_malort_pedway | Same — no remote dependency |
| Default-disabled | isDefaultDisabledBuiltin('computer-use') | Returns false |
Python Bridge
On first invocation, the bridge automatically:
- Creates a Python virtual environment (
.runtime/venv/) - Installs pip
- Installs dependencies (
mss,Pillow,pyautogui,pyobjc-*) - Validates via SHA256 hash (only reinstalls when
requirements.txtchanges)
Approaches We Tried
Approach 1: Extract native .node modules from Claude Code binary ❌
Extracted computer-use-swift.node and computer-use-input.node from the installed Claude Code Mach-O binary. Synchronous methods worked, but async Swift methods (screenshot) hung due to N-API async incompatibility between Bun versions.
Approach 2: Create empty stub packages ❌
Stub packages allowed compilation but provided no actual functionality.
Approach 3: Python Bridge ✅ (current)
Replaced all native module calls with Python subprocess calls via callPythonHelper(). Zero binary dependencies, auto-bootstrapping, full functionality on any macOS.
Known Limitations
| Limitation | Description |
|---|---|
| macOS only | Windows/Linux need pyobjc replacements |
| No global Escape abort | Original used CGEventTap; use Ctrl+C instead |
| No auto-hide windows | Original's prepareDisplay relied on Swift |
| Slightly higher latency | ~100ms Python process startup overhead per call |
References and Credits
| Project | License | Contribution |
|---|---|---|
| wimi321/macos-computer-use-skill | MIT | Python bridge architecture, mac_helper.py runtime, executor adaptation |
| domdomegg/computer-use-mcp | MIT | Independent Computer Use MCP server (nut.js based), used as reference |
| paoloanzn/free-code | - | Feature flag system analysis |
| oboard/claude-code-rev | - | Early leaked source restoration, stub package reference |
Underlying Libraries
| Library | Purpose |
|---|---|
| pyautogui | Mouse and keyboard control |
| mss | Screenshot capture |
| Pillow | Image processing and compression |
| pyobjc | macOS Cocoa/Quartz framework bindings |