Midscene Browser Automation
Automate browser interactions using Midscene with Claude. This skill provides natural language control over a Chrome browser through command-line tools for navigation, interaction, data extraction, and screenshots.
Overview
This skill uses a CLI-based approach where Claude calls browser automation commands via bash. The browser stays open between commands for faster sequential operations and preserves browser state (cookies, sessions, etc.).
Key Features:
-
🧠 Natural language understanding of page elements
-
🎯 Intelligent element identification without CSS selectors
-
👁️ Visual and semantic understanding of web pages
-
🤖 AI-powered interactions and data extraction
Setup Verification
IMPORTANT: Before using any browser commands, you MUST check setup.json in this directory.
First-Time Setup Check
-
Read setup.json (located in skills/midscene-automation/setup.json )
-
Check setupComplete field:
-
If true : All prerequisites are met, proceed with browser commands
-
If false : Setup required - follow the steps below
If Setup is Required (setupComplete: false )
Set API key (required):
export MIDSCENE_MODEL_API_KEY="your-api-key"
Set model (optional):
export MIDSCENE_MODEL_NAME="gpt-4o" export MIDSCENE_MODEL_BASE_URL="https://api.openai.com/v1"
Ensure Google Chrome is installed on your system.
See Model Configuration for more options.
Prerequisites Summary
-
✅ Google Chrome installed on your system
-
✅ AI Model API key configured via environment variable or .env file
DO NOT attempt to use browser commands if setupComplete: false in setup.json. Guide the user through setup first.
Available Commands
Navigate to URLs
node dist/src/cli.js navigate <url>
When to use: Opening any website, loading a specific URL, going to a web page.
Example usage:
-
node dist/src/cli.js navigate https://example.com
-
node dist/src/cli.js navigate https://news.ycombinator.com
Output: JSON with success status, message, and screenshot path
Interact with Pages
node dist/src/cli.js act "<action>"
When to use: Clicking buttons, filling forms, scrolling, selecting options, typing text.
Example usage:
-
node dist/src/cli.js act "click the Sign In button"
-
node dist/src/cli.js act "type 'test@example.com' into the email field"
-
node dist/src/cli.js act "scroll down to the footer"
-
node dist/src/cli.js act "select 'United States' from the country dropdown"
Important:
-
Be as specific as possible - details make a world of difference
-
Midscene uses AI to understand page structure and locate elements semantically
-
No CSS selectors needed - just describe what you want to interact with
Output: JSON with success status, message, and screenshot path
Extract Data
node dist/src/cli.js query "<query>"
When to use: Scraping data, getting specific information, extracting structured content from the page.
Example usage:
-
node dist/src/cli.js query "What is the page title?"
-
node dist/src/cli.js query "List all product names and prices"
-
node dist/src/cli.js query "Extract all article headlines in JSON format"
-
node dist/src/cli.js query "What is the price of the first item?"
Output: JSON with success status, extracted data (result field), and screenshot path
Verify Conditions
node dist/src/cli.js assert "<condition>"
When to use: Verifying page state, checking if elements exist, validating content.
Example usage:
-
node dist/src/cli.js assert "the login button is visible"
-
node dist/src/cli.js assert "the user is logged in"
-
node dist/src/cli.js assert "there are more than 5 items in the list"
-
node dist/src/cli.js assert "the page contains 'Welcome' text"
Output: JSON with success status (true if assertion passes) and message
Take Screenshots
node dist/src/cli.js screenshot
When to use: Visual verification, documenting page state, debugging, creating records.
Notes:
-
Screenshots are saved to the plugin directory's agent/browser_screenshots/ folder
-
Filename includes timestamp for uniqueness
-
Full page screenshots by default
Output: JSON with success status and screenshot path
Clean Up
node dist/src/cli.js close
When to use: After completing all browser interactions, to free up resources.
Output: JSON with success status and message
Browser Behavior
Persistent Browser: The browser stays open between commands for faster sequential operations and to preserve browser state (cookies, sessions, etc.).
Reuse Existing: If Chrome is already running on port 9222, it will reuse that instance.
Safe Cleanup: The browser only closes when you explicitly call the close command.
AI-Powered: Midscene uses AI vision models to understand page structure and content, enabling intelligent interactions without brittle selectors.
Best Practices
-
Always navigate first: Before interacting with a page, navigate to the URL
-
📸 Always view screenshots: After each command (navigate, act, query, assert), use the Read tool to view the screenshot and verify the command worked correctly
-
Use natural language: Describe actions and queries as you would instruct a human
-
Be specific: More context helps Midscene understand what you want ("the blue Submit button in the login form" vs "the button")
-
Handle errors gracefully: Check the success field in JSON output; if false, read the error message and screenshot
-
Close when done: Always clean up browser resources after completing tasks
-
Chain commands: Run multiple commands sequentially without reopening the browser
-
Use query for extraction: query command is optimized for data extraction and understanding page content
-
Use assert for verification: assert command is perfect for validation and testing scenarios
Common Patterns
Simple browsing task
node dist/src/cli.js navigate https://example.com node dist/src/cli.js act "click the login button" node dist/src/cli.js screenshot node dist/src/cli.js close
Data extraction task
node dist/src/cli.js navigate https://example.com/products node dist/src/cli.js query "Extract all product names and prices in JSON format" node dist/src/cli.js close
Multi-step interaction
node dist/src/cli.js navigate https://example.com/login node dist/src/cli.js act "type 'user@example.com' into the email field" node dist/src/cli.js act "type 'mypassword' into the password field" node dist/src/cli.js act "click the submit button" node dist/src/cli.js assert "the user is logged in" node dist/src/cli.js screenshot node dist/src/cli.js close
Search and extract workflow
node dist/src/cli.js navigate https://www.google.com node dist/src/cli.js act "type 'Midscene AI' into the search box" node dist/src/cli.js act "press Enter" node dist/src/cli.js query "What are the titles of the first 3 search results?" node dist/src/cli.js close
Validation workflow
node dist/src/cli.js navigate https://example.com node dist/src/cli.js assert "the page title contains 'Example'" node dist/src/cli.js assert "there is a login button visible" node dist/src/cli.js assert "the navigation menu has 5 items" node dist/src/cli.js close
Frontend Verification
This skill is optimized for frontend verification. When the user asks to verify, check, or validate a page, follow this workflow.
Verification Workflow
-
Navigate to the target URL
-
Identify verification points based on the user's request
-
Execute each check using assert or query commands
-
Take screenshots as evidence for each step
-
Summarize results in a structured format
Result Summary Format
After completing verification, always present results in this format:
Verification Results
| # | Check | Status | Details |
|---|---|---|---|
| 1 | Page title is correct | PASS | - |
| 2 | Login form is visible | PASS | - |
| 3 | Error message on invalid input | FAIL | No error shown |
Result: 2/3 passed
Common Verification Scenarios
Form Validation
node dist/src/cli.js navigate http://localhost:3000/login node dist/src/cli.js act "click the submit button without filling any fields" node dist/src/cli.js assert "error messages are shown for required fields" node dist/src/cli.js act "type 'invalid-email' into the email field" node dist/src/cli.js act "click submit" node dist/src/cli.js assert "email format validation error is displayed"
Page Content Verification
node dist/src/cli.js navigate http://localhost:3000 node dist/src/cli.js assert "the page title is visible" node dist/src/cli.js assert "the navigation menu contains expected items" node dist/src/cli.js assert "the footer contains copyright information"
User Flow Verification
node dist/src/cli.js navigate http://localhost:3000/login node dist/src/cli.js act "type 'user@test.com' into email field" node dist/src/cli.js act "type 'password123' into password field" node dist/src/cli.js act "click login button" node dist/src/cli.js assert "redirected to dashboard page" node dist/src/cli.js assert "welcome message is displayed"
Interactive Feedback Verification
node dist/src/cli.js navigate http://localhost:3000/settings node dist/src/cli.js act "click the save button" node dist/src/cli.js assert "success toast or notification appears" node dist/src/cli.js act "click the delete button" node dist/src/cli.js assert "confirmation dialog is shown"
Verification Best Practices
-
Be specific about what to verify: "the error message says 'Email is required'" is better than "there is an error"
-
Verify one thing at a time: Each assert should check a single condition
-
Always take screenshots: Visual evidence helps when verification fails
-
Check both positive and negative cases: Verify what should appear AND what should not
-
Use query for data verification: When you need to extract and compare values, use query instead of assert
Troubleshooting
Page not loading: Wait a few seconds after navigation. Midscene usually handles this automatically.
Element not found: Be more specific in your natural language description. Instead of "the button", try "the blue Submit button at the bottom of the form"
Action fails: Check the screenshot to see the current page state. The element you're looking for might be hidden, in a different location, or have a different appearance.
Screenshots missing: Check the plugin directory's agent/browser_screenshots/ folder for saved files
Chrome not found: Install Google Chrome or check if it's installed at the standard location
Port 9222 in use: Another Chrome debugging session is running. Close it or the CLI will reuse it.
API Key issues: Make sure your .env file is properly configured with valid API credentials. Check Midscene Model Configuration Guide
Query returns unexpected results: Be more specific in your query. Include context about what data format you want (e.g., "in JSON format", "as a comma-separated list")
For detailed examples, see EXAMPLES.md. For API reference and technical details, see REFERENCE.md.
Dependencies
This skill requires:
pnpm install
Key dependencies:
-
@midscene/web
-
Midscene web automation library
-
@midscene/core
-
Midscene core functionality
-
puppeteer-core
-
Browser control
-
AI Model API (OpenAI, Anthropic, or compatible providers)