โ† Back to Browser & Automation

stealthy-auto-browse

Control a stealthy browser that evades bot detection

0
Source Code

stealthy-auto-browse

A stealth browser running in Docker. It uses Camoufox (a custom Firefox fork) instead of Chromium, so there are zero Chrome DevTools Protocol (CDP) signals for bot detectors to find. Mouse and keyboard input happens at the OS level via PyAutoGUI โ€” the browser itself doesn't know it's being automated, which means behavioral analysis can't detect it either.

Why This Exists

Standard browser automation (Playwright + Chromium, Puppeteer, Selenium) exposes CDP signals that bot detection services (Cloudflare, DataDome, PerimeterX, Akamai) catch instantly. Even with stealth plugins, the CDP protocol is still there and detectable. This skill eliminates that entirely by using Firefox (no CDP at all) and generating input events at the OS level rather than through the browser's automation API.

When To Use This Skill

  • Site has bot detection (Cloudflare challenge pages, DataDome, PerimeterX, Akamai)
  • Site blocks headless browsers or serves CAPTCHAs
  • You need a logged-in session that doesn't get banned
  • Another browser skill is getting 403s or empty/blocked responses
  • You're scraping a site that actively fights automation

When NOT To Use This Skill

  • Simple fetches with no bot protection โ€” use curl or WebFetch
  • Sites that don't care about automation โ€” use a regular browser skill, it's faster to set up
  • You only need static HTML โ€” use curl

Setup

1. Start the container:

docker run -d -p 8080:8080 -p 5900:5900 psyb0t/stealthy-auto-browse

Port 8080 is the HTTP API. Port 5900 is a noVNC web viewer where you can watch the browser in real time.

2. Set the environment variable:

export STEALTHY_AUTO_BROWSE_URL=http://localhost:8080

Or via OpenClaw config (~/.openclaw/openclaw.json):

{
  "skills": {
    "entries": {
      "stealthy-auto-browse": {
        "env": {
          "STEALTHY_AUTO_BROWSE_URL": "http://localhost:8080"
        }
      }
    }
  }
}

3. Verify: curl $STEALTHY_AUTO_BROWSE_URL/health returns ok when the browser is ready.

How It Works

The container runs a virtual X display (Xvfb at 1920x1080), the Camoufox browser, and an HTTP API server. You send JSON commands to the API and get JSON responses back. All commands go to POST $STEALTHY_AUTO_BROWSE_URL/ with {"action": "<name>", ...params}.

Every response has this shape:

{
  "success": true,
  "timestamp": 1234567890.123,
  "data": { ... },
  "error": "only present when success is false"
}

The data field contents vary by action โ€” documented below for each one.

Understanding the Two Input Modes

This is the most important concept. There are two ways to interact with pages:

System Input (Undetectable)

Actions: system_click, mouse_move, mouse_click, system_type, send_key, scroll

These use PyAutoGUI to generate real OS-level mouse movements and keystrokes. The browser receives these as genuine user input โ€” there is no way for any website JavaScript to distinguish these from a real human. Use these for stealth.

System input works with viewport coordinates (x, y pixel positions within the browser content area). Get these coordinates from get_interactive_elements.

Playwright Input (Detectable)

Actions: click, fill, type

These use Playwright's DOM automation to interact with elements by CSS selector or XPath. They're faster and more reliable (no coordinate math), but they inject events through the browser's automation layer. Sophisticated behavioral analysis can potentially detect the timing patterns. Use these when speed matters more than stealth, or when you have a selector but no coordinates.

When to Use Which

  • Stealth-critical sites (Cloudflare, login forms, anything with bot detection): Always use system input.
  • Simple scraping where the site isn't actively fighting you: Playwright input is fine and easier.
  • Form filling: Use system_click to focus the field, then system_type to enter text. This is undetectable. Using fill is faster but detectable.
  • Clicking buttons: If you have coordinates from get_interactive_elements, use system_click. If you only have a CSS selector, use click.

Workflow

This is the typical sequence for interacting with a page:

  1. Navigate: goto to load the URL
  2. Read the page: get_text returns all visible text โ€” usually enough to understand the page
  3. If text isn't clear: get_html gives you the full DOM structure
  4. If still confused: Take a screenshot (GET /screenshot/browser?whLargest=512)
  5. Find interactive elements: get_interactive_elements returns all buttons, links, inputs with their x,y coordinates
  6. Interact: system_click to click, system_type to type, send_key for Enter/Tab/Escape
  7. Wait for results: wait_for_element or wait_for_text instead of sleeping
  8. Verify: get_text again to confirm the page changed as expected

Actions Reference

Navigation

goto

Navigates to a URL. This is how you load pages.

{"action": "goto", "url": "https://example.com"}
{"action": "goto", "url": "https://example.com", "wait_until": "networkidle"}

Parameters:

  • url (required): The URL to navigate to.
  • wait_until (optional, default "domcontentloaded"): When to consider the page loaded. Options: "domcontentloaded" (DOM parsed, fast), "load" (all resources loaded), "networkidle" (no network activity for 500ms, slowest but most complete).

Response data: {"url": "https://example.com/", "title": "Example Domain"}

Note: If a page loader matches the URL (see Page Loaders section), the loader's steps execute instead of the default navigation. The response will include "loader": "loader name" when this happens.

refresh

Reloads the current page.

{"action": "refresh"}
{"action": "refresh", "wait_until": "networkidle"}

Parameters:

  • wait_until (optional, default "domcontentloaded"): Same options as goto.

Response data: {"url": "https://example.com/current-page", "title": "Current Page"}

System Input (Undetectable)

system_click

Moves the mouse to viewport coordinates with a human-like curve (random jitter, eased acceleration), then clicks. This is the primary way to click things stealthily.

{"action": "system_click", "x": 500, "y": 300}
{"action": "system_click", "x": 500, "y": 300, "duration": 0.5}

Parameters:

  • x, y (required): Viewport coordinates โ€” get these from get_interactive_elements.
  • duration (optional): How long the mouse movement takes in seconds. If omitted, a random duration between 0.2-0.6s is used for realism.

Response data: {"system_clicked": {"x": 500, "y": 300}}

How it differs from mouse_click: system_click always moves the mouse first (smooth human-like path), then clicks. mouse_click can click at a position instantly without the smooth movement, or click wherever the mouse currently is.

mouse_move

Moves the mouse to viewport coordinates with human-like movement (jitter, eased curve) but does NOT click. Use this to hover over elements (to trigger hover menus, tooltips) or to simulate natural mouse behavior between actions.

{"action": "mouse_move", "x": 500, "y": 300}
{"action": "mouse_move", "x": 500, "y": 300, "duration": 0.4}

Parameters:

  • x, y (required): Viewport coordinates.
  • duration (optional): Movement time in seconds. Random 0.2-0.6s if omitted.

Response data: {"moved_to": {"x": 500, "y": 300}}

mouse_click

Clicks at a position or at the current mouse location. Unlike system_click, this does NOT do a smooth mouse movement first โ€” it's a direct click via PyAutoGUI.

{"action": "mouse_click"}
{"action": "mouse_click", "x": 500, "y": 300}

Parameters:

  • x, y (optional): If provided, clicks at that viewport position directly. If omitted, clicks wherever the mouse currently is.

Response data: {"clicked_at": {"x": 500, "y": 300}} or {"clicked_at": "current"}

When to use: After a mouse_move when you want to separate the movement and click into two steps. Or when the mouse is already positioned and you just need to click.

system_type

Types text character-by-character via real OS keystrokes. Each keystroke has a randomized delay (jittered around the interval) to mimic human typing speed. Completely undetectable.

{"action": "system_type", "text": "hello world"}
{"action": "system_type", "text": "hello world", "interval": 0.12}

Parameters:

  • text (required): The text to type. Must click/focus an input field first.
  • interval (optional, default 0.08): Base delay between keystrokes in seconds. Actual delay is randomized +-30ms around this value.

Response data: {"typed_len": 11}

Important: You must click on the input field first (using system_click or click) before calling system_type. This action types into whatever is currently focused.

send_key

Sends a single keyboard key or key combination via OS-level input. Use this for pressing Enter to submit forms, Tab to move between fields, Escape to close dialogs, or any key combos like Ctrl+A, Ctrl+C, etc.

{"action": "send_key", "key": "enter"}
{"action": "send_key", "key": "tab"}
{"action": "send_key", "key": "escape"}
{"action": "send_key", "key": "ctrl+a"}
{"action": "send_key", "key": "ctrl+shift+t"}

Parameters:

  • key (required): Key name or combo with + separator. Key names follow PyAutoGUI naming: enter, tab, escape, backspace, delete, up, down, left, right, home, end, pageup, pagedown, f1-f12, ctrl, alt, shift, space, etc.

Response data: {"send_key": "enter"}

scroll

Scrolls the page using the mouse scroll wheel. Generates real OS-level scroll events.

{"action": "scroll", "amount": -3}
{"action": "scroll", "amount": 5, "x": 500, "y": 300}

Parameters:

  • amount (optional, default -3): Scroll amount. Negative = scroll down, positive = scroll up. Each unit is roughly one "click" of a mouse wheel.
  • x, y (optional): If provided, moves the mouse to these viewport coordinates first, then scrolls. Useful for scrolling inside a specific scrollable element rather than the whole page.

Response data: {"scrolled": -3}

Playwright Input (Detectable)

These are faster and more convenient but use Playwright's DOM event injection, which is detectable by sophisticated behavioral analysis.

click

Clicks an element by CSS selector or XPath. Playwright finds the element in the DOM, scrolls it into view if needed, and dispatches click events.

{"action": "click", "selector": "#submit-btn"}
{"action": "click", "selector": "button.primary"}
{"action": "click", "selector": "xpath=//button[@id='submit-btn']"}

Parameters:

  • selector (required): CSS selector or XPath (prefix with xpath=).

Response data: {"clicked": "#submit-btn"}

When to use over system_click: When you have a selector but don't want to bother getting coordinates. When the element might move around and coordinates aren't reliable. When stealth isn't critical.

fill

Fills an input field by selector. Clears any existing content first, then sets the value. This is the fastest way to fill forms but is detectable because it doesn't generate individual keystroke events.

{"action": "fill", "selector": "input[name='email']", "value": "user@example.com"}

Parameters:

  • selector (required): CSS selector or XPath of the input element.
  • value (required): Text to fill in.

Response data: {"filled": "input[name='email']"}

type

Types text into an element character-by-character via Playwright (NOT the OS). Each keystroke has a configurable delay. This is a middle ground between fill (instant but obviously automated) and system_type (OS-level, undetectable). The typing pattern is more realistic than fill but still comes through Playwright's event system.

{"action": "type", "selector": "#search", "text": "query", "delay": 0.05}

Parameters:

  • selector (required): CSS selector or XPath of the element.
  • text (required): Text to type.
  • delay (optional, default 0.05): Delay between keystrokes in seconds.

Response data: {"typed": "#search"}

Screenshots

Screenshots are GET requests (not POST actions).

GET /screenshot/browser

Captures the browser viewport as a PNG image. This is what the page looks like to a user.

curl -s "$STEALTHY_AUTO_BROWSE_URL/screenshot/browser?whLargest=512" -o screenshot.png

Always resize screenshots to avoid huge images. Resize query parameters (all optional):

Parameter What it does
whLargest=512 Scales so the largest dimension is 512px, keeps aspect ratio. Use this by default.
width=800 Scales to 800px wide, keeps aspect ratio
height=300 Scales to 300px tall, keeps aspect ratio
width=400&height=400 Forces exact 400x400 dimensions

GET /screenshot/desktop

Captures the entire virtual desktop (including window chrome, taskbar, etc.) using scrot. Same resize parameters as above. Useful when you need to see things outside the browser viewport.

curl -s "$STEALTHY_AUTO_BROWSE_URL/screenshot/desktop?whLargest=512" -o desktop.png

Page Inspection

get_interactive_elements

Scans the page and returns every interactive element (buttons, links, inputs, selects, textareas, etc.) with their viewport coordinates. This is how you find what to click and where.

{"action": "get_interactive_elements"}
{"action": "get_interactive_elements", "visible_only": true}

Parameters:

  • visible_only (optional, default true): Only return elements that are currently visible on screen.

Response data:

{
  "count": 5,
  "elements": [
    {
      "i": 0,
      "tag": "button",
      "id": "submit-btn",
      "text": "Submit",
      "selector": "#submit-btn",
      "x": 400,
      "y": 250,
      "w": 120,
      "h": 40,
      "visible": true
    },
    {
      "i": 1,
      "tag": "input",
      "id": null,
      "text": "",
      "selector": "input[name='email']",
      "x": 300,
      "y": 180,
      "w": 250,
      "h": 35,
      "visible": true
    }
  ]
}

The x, y are the center of the element โ€” pass these directly to system_click. The selector can be used with Playwright actions like click or fill. The w, h give you the element dimensions.

This is your primary tool for understanding what you can interact with on a page. Call this before clicking anything.

get_text

Returns all visible text content of the page body. Text is truncated to 10,000 characters.

{"action": "get_text"}

Response data: {"text": "Page title\nSome content here...", "length": 1234}

This is usually the first thing to call after navigating โ€” it tells you what's on the page without needing a screenshot.

get_html

Returns the full HTML source of the current page.

{"action": "get_html"}

Response data: {"html": "<!DOCTYPE html>...", "length": 45678}

Use when get_text doesn't give enough structure to understand the page layout, or when you need to find specific elements in the DOM.

eval

Executes arbitrary JavaScript in the page context and returns the result. The expression is evaluated via page.evaluate().

{"action": "eval", "expression": "document.title"}
{"action": "eval", "expression": "document.querySelectorAll('a').length"}
{"action": "eval", "expression": "JSON.stringify(performance.timing)"}

Parameters:

  • expression (required): JavaScript expression to evaluate. Must return a JSON-serializable value.

Response data: {"result": "Example Domain"} โ€” the result is whatever the expression returns.

Wait Conditions

Use these instead of sleep to wait for page content. They're more reliable because they wait for the exact condition rather than an arbitrary time.

wait_for_element

Waits for an element matching a CSS selector or XPath to reach a certain state (visible, hidden, attached to DOM, detached).

{"action": "wait_for_element", "selector": "#results", "timeout": 10}
{"action": "wait_for_element", "selector": "xpath=//div[@class='loaded']", "timeout": 15}
{"action": "wait_for_element", "selector": ".spinner", "state": "hidden", "timeout": 10}

Parameters:

  • selector (required): CSS selector or XPath (prefix with xpath=).
  • state (optional, default "visible"): What state to wait for. Options: "visible" (rendered and not hidden), "hidden" (not visible), "attached" (in DOM regardless of visibility), "detached" (removed from DOM).
  • timeout (optional, default 30): Max wait time in seconds. Throws error if exceeded.

Response data: {"selector": "#results", "state": "visible"}

wait_for_text

Waits for specific text to appear anywhere in the page body.

{"action": "wait_for_text", "text": "Search results", "timeout": 10}

Parameters:

  • text (required): Exact text to look for (substring match on document.body.innerText).
  • timeout (optional, default 30): Max wait time in seconds.

Response data: {"text": "Search results", "found": true}

wait_for_url

Waits for the page URL to match a pattern. Useful after form submissions or redirects.

{"action": "wait_for_url", "url": "**/dashboard", "timeout": 10}
{"action": "wait_for_url", "url": "https://example.com/success*", "timeout": 15}

Parameters:

  • url (required): URL pattern to match. Supports * (any chars except /) and ** (any chars including /) glob patterns. Can also be a full URL for exact match.
  • timeout (optional, default 30): Max wait time in seconds.

Response data: {"url": "https://example.com/dashboard"}

wait_for_network_idle

Waits until there are no network requests in flight for 500ms. Useful for pages that load content dynamically after the initial page load.

{"action": "wait_for_network_idle", "timeout": 30}

Parameters:

  • timeout (optional, default 30): Max wait time in seconds.

Response data: {"idle": true}

Tab Management

The browser can have multiple tabs open. One tab is "active" at a time โ€” all actions operate on the active tab.

list_tabs

Returns all open tabs with their URLs and which one is active.

{"action": "list_tabs"}

Response data:

{
  "count": 2,
  "tabs": [
    {"index": 0, "url": "https://example.com/", "active": false},
    {"index": 1, "url": "https://other.com/", "active": true}
  ]
}

new_tab

Opens a new browser tab. Optionally navigates it to a URL. The new tab becomes the active tab.

{"action": "new_tab"}
{"action": "new_tab", "url": "https://example.com"}

Parameters:

  • url (optional): URL to navigate to in the new tab.
  • wait_until (optional, default "domcontentloaded"): Same as goto.

Response data: {"index": 1, "url": "https://example.com/"}

switch_tab

Switches the active tab by index (0-based). All subsequent actions will operate on this tab.

{"action": "switch_tab", "index": 0}

Parameters:

  • index (required): Tab index from list_tabs.

Response data: {"index": 0, "url": "https://example.com/"}

close_tab

Closes a tab. After closing, the last remaining tab becomes active.

{"action": "close_tab"}
{"action": "close_tab", "index": 1}

Parameters:

  • index (optional): Tab index to close. If omitted, closes the currently active tab.

Response data: {"closed": true, "remaining": 1}

Dialog Handling

Browsers have modal dialogs (alert, confirm, prompt). By default, dialogs are auto-accepted (clicks OK). Use handle_dialog if you need to dismiss a dialog or provide text for a prompt.

handle_dialog

Call BEFORE the action that triggers the dialog if you want to dismiss it or provide prompt text. If you don't call this, the dialog is auto-accepted (clicks OK).

{"action": "handle_dialog", "accept": true}
{"action": "handle_dialog", "accept": false}
{"action": "handle_dialog", "accept": true, "text": "my response"}

Parameters:

  • accept (optional, default true): true clicks OK/Accept, false clicks Cancel/Dismiss.
  • text (optional): Response text for prompt dialogs. Ignored for alert/confirm.

Response data: {"configured": {"accept": true, "text": null}}

Example โ€” handling a confirm dialog:

# Step 1: Tell the browser to accept the next dialog
curl -X POST $API -H 'Content-Type: application/json' -d '{"action": "handle_dialog", "accept": true}'
# Step 2: Now click the button that triggers the confirm
curl -X POST $API -H 'Content-Type: application/json' -d '{"action": "system_click", "x": 300, "y": 200}'

get_last_dialog

Returns information about the most recent dialog that appeared.

{"action": "get_last_dialog"}

Response data:

{
  "dialog": {
    "type": "confirm",
    "message": "Are you sure you want to delete this?",
    "default_value": "",
    "buttons": ["ok", "cancel"]
  }
}

Returns {"dialog": null} if no dialog has appeared yet. The type field is one of: "alert", "confirm", "prompt", "beforeunload".

Cookies

get_cookies

Returns all cookies for the browser context, or cookies for specific URLs.

{"action": "get_cookies"}
{"action": "get_cookies", "urls": ["https://example.com"]}

Parameters:

  • urls (optional): Array of URLs to filter cookies by. If omitted, returns all cookies.

Response data:

{
  "count": 3,
  "cookies": [
    {"name": "session", "value": "abc123", "domain": ".example.com", "path": "/", "httpOnly": true, "secure": true, ...}
  ]
}

set_cookie

Sets a cookie in the browser context.

{"action": "set_cookie", "name": "session", "value": "abc123", "url": "https://example.com"}
{"action": "set_cookie", "name": "pref", "value": "dark", "domain": ".example.com", "path": "/", "httpOnly": false, "secure": true}

Parameters: Any standard cookie fields โ€” name, value, url, domain, path, httpOnly, secure, sameSite, expires. At minimum you need name, value, and either url or domain.

Response data: {"set": "session"}

delete_cookies

Clears all cookies from the browser context.

{"action": "delete_cookies"}

Response data: {"cleared": true}

Storage

Access the page's localStorage and sessionStorage. These are per-origin โ€” you must be on the right page for the storage to be accessible.

get_storage

Returns all items from localStorage or sessionStorage as a key-value object.

{"action": "get_storage", "type": "local"}
{"action": "get_storage", "type": "session"}

Parameters:

  • type (optional, default "local"): "local" for localStorage, "session" for sessionStorage.

Response data: {"items": {"theme": "dark", "lang": "en"}, "type": "local"}

set_storage

Sets a single key-value pair in localStorage or sessionStorage.

{"action": "set_storage", "type": "local", "key": "theme", "value": "dark"}

Parameters:

  • type (optional, default "local"): "local" or "session".
  • key (required): Storage key.
  • value (required): Storage value (string).

Response data: {"set": "theme", "type": "local"}

clear_storage

Clears all items from localStorage or sessionStorage.

{"action": "clear_storage", "type": "local"}
{"action": "clear_storage", "type": "session"}

Response data: {"cleared": "local"}

Downloads

The browser automatically tracks file downloads triggered by page interactions (clicking download links, form submissions that return files, etc.).

get_last_download

Returns information about the most recently downloaded file.

{"action": "get_last_download"}

Response data:

{
  "download": {
    "url": "https://example.com/file.pdf",
    "filename": "file.pdf",
    "path": "/tmp/playwright-downloads/abc123/file.pdf"
  }
}

Returns {"download": null} if nothing has been downloaded yet. The path is the local path inside the container where the file was saved. The filename is what the server suggested as the download name.

Uploads

upload_file

Programmatically sets a file on an <input type="file"> element without opening the OS file picker. The file must exist inside the container โ€” use docker cp to copy files in if needed.

{"action": "upload_file", "selector": "#file-input", "file_path": "/tmp/document.pdf"}

Parameters:

  • selector (required): CSS selector of the file input element.
  • file_path (required): Absolute path to the file inside the container.

Response data: {"selector": "#file-input", "file": "document.pdf", "size": 12345}

Note: After setting the file, you still need to submit the form (click the submit button) for the upload to actually happen.

Network Logging

Capture all HTTP requests and responses the page makes. Useful for debugging, finding API endpoints the page calls, or verifying that certain resources loaded.

enable_network_log

Starts recording all HTTP requests and responses from the active page.

{"action": "enable_network_log"}

Response data: {"enabled": true}

disable_network_log

Stops recording network activity. Already-captured entries remain.

{"action": "disable_network_log"}

Response data: {"enabled": false}

get_network_log

Returns all captured network entries since logging was enabled (or last cleared).

{"action": "get_network_log"}

Response data:

{
  "count": 4,
  "log": [
    {"type": "request", "url": "https://api.example.com/data", "method": "GET", "resource_type": "fetch", "timestamp": 1234567890.123},
    {"type": "response", "url": "https://api.example.com/data", "status": 200, "timestamp": 1234567890.456},
    {"type": "request", "url": "https://cdn.example.com/style.css", "method": "GET", "resource_type": "stylesheet", "timestamp": 1234567890.789},
    {"type": "response", "url": "https://cdn.example.com/style.css", "status": 200, "timestamp": 1234567890.999}
  ]
}

Each entry is either a "request" or "response". Requests include method and resource_type (fetch, document, stylesheet, script, image, etc.). Responses include status code.

clear_network_log

Deletes all captured network entries but keeps logging enabled if it was on.

{"action": "clear_network_log"}

Response data: {"cleared": true}

Scrolling

scroll_to_bottom

Scrolls the entire page from top to bottom using JavaScript window.scrollBy(). Scrolls one viewport height at a time with a fixed delay between scrolls. When it reaches the bottom (scroll position stops changing), it scrolls back to the top. Useful for triggering lazy-loaded content.

{"action": "scroll_to_bottom"}
{"action": "scroll_to_bottom", "delay": 0.6}

Parameters:

  • delay (optional, default 0.4): Seconds to wait between each scroll step.

Response data: {"scrolled": "bottom"}

scroll_to_bottom_humanized

Same as scroll_to_bottom but uses real OS-level mouse wheel scrolling (via PyAutoGUI) with randomized scroll amounts and jittered delays to look like a human scrolling. Undetectable by behavioral analysis.

{"action": "scroll_to_bottom_humanized"}
{"action": "scroll_to_bottom_humanized", "min_clicks": 3, "max_clicks": 8, "delay": 0.7}

Parameters:

  • min_clicks (optional, default 2): Minimum mouse wheel clicks per scroll step.
  • max_clicks (optional, default 6): Maximum mouse wheel clicks per scroll step. A random value between min and max is chosen each time.
  • delay (optional, default 0.5): Base delay between scroll steps. Actual delay is jittered +-30%.

Response data: {"scrolled": "bottom_humanized"}

Display

calibrate

Recalculates the mapping between viewport coordinates (what get_interactive_elements returns) and screen coordinates (what PyAutoGUI uses). The browser has window chrome (title bar, address bar) that offsets the viewport from the screen origin.

{"action": "calibrate"}

Response data: {"window_offset": {"x": 0, "y": 74}}

When to call this: After entering/exiting fullscreen, after the browser window is resized, or if system_click coordinates seem off. The offset is auto-calculated at startup, so you rarely need this.

get_resolution

Returns the virtual display resolution (from the XVFB_RESOLUTION environment variable).

{"action": "get_resolution"}

Response data: {"width": 1920, "height": 1080}

enter_fullscreen / exit_fullscreen

Toggles browser fullscreen mode (hides address bar and window chrome). In fullscreen, the viewport takes up the entire screen, so coordinates map differently.

{"action": "enter_fullscreen"}
{"action": "exit_fullscreen"}

Response data: {"fullscreen": true, "changed": true} โ€” changed is false if already in the requested state.

Important: Call calibrate after entering/exiting fullscreen to update the coordinate mapping.

Utility

ping

Health check that returns the current page URL. Use to verify the API is responding and the browser is alive.

{"action": "ping"}

Response data: {"message": "pong", "url": "https://example.com/"}

sleep

Pauses execution for a specified duration. Prefer wait_for_element or wait_for_text when waiting for page content โ€” use sleep only for fixed timing needs.

{"action": "sleep", "duration": 2}

Parameters:

  • duration (optional, default 1): Seconds to sleep.

Response data: {"slept": 2}

close

Shuts down the browser. The container will stop after this.

{"action": "close"}

Response data: {"message": "closing"}

State Endpoints (GET)

GET /state

Returns the current browser state.

curl -s "$STEALTHY_AUTO_BROWSE_URL/state"

Response:

{
  "status": "ready",
  "url": "https://example.com/",
  "title": "Example Domain",
  "window_offset": {"x": 0, "y": 74}
}

GET /health

Simple health check. Returns ok as plain text when the API is ready.

curl -s "$STEALTHY_AUTO_BROWSE_URL/health"

Container Options

# Custom display resolution
docker run -d -p 8080:8080 -e XVFB_RESOLUTION=1280x720 psyb0t/stealthy-auto-browse

# Match timezone to your IP's geographic location (important for stealth โ€” mismatched
# timezone is a common bot detection signal)
docker run -d -p 8080:8080 -e TZ=Europe/Bucharest psyb0t/stealthy-auto-browse

# Route browser traffic through an HTTP proxy
docker run -d -p 8080:8080 -e PROXY_URL=http://user:pass@proxy:8888 psyb0t/stealthy-auto-browse

# Persistent browser profile โ€” cookies, sessions, and fingerprint survive container restarts
docker run -d -p 8080:8080 -v ./profile:/userdata psyb0t/stealthy-auto-browse

# Open a URL automatically on startup
docker run -d -p 8080:8080 psyb0t/stealthy-auto-browse https://example.com

Page Loaders (URL-Triggered Automation)

Page loaders are like Greasemonkey/Tampermonkey userscripts but for the HTTP API. You define a set of actions that automatically run whenever the browser navigates to a matching URL. Instead of manually sending a sequence of commands every time you visit a site, you write it once as a YAML file and the container handles it.

This is useful for things like: removing cookie popups, dismissing overlays, waiting for dynamic content, cleaning up pages before scraping, or any repetitive setup you'd otherwise do manually every time.

How They Work

  1. You create YAML files that define URL patterns and a list of steps
  2. Mount those files into the container at /loaders
  3. Whenever goto navigates to a URL that matches a loader's pattern, the loader's steps run automatically instead of the default navigation

The steps are the exact same actions as the HTTP API. Every action you can send via POST / (goto, eval, click, system_click, sleep, scroll, wait_for_element, etc.) works as a loader step. Same names, same parameters.

Setup

docker run -d -p 8080:8080 -p 5900:5900 \
  -v ./my-loaders:/loaders \
  psyb0t/stealthy-auto-browse

Loader Format

name: Human-readable name for this loader
match:
  domain: example.com         # Exact hostname match (www. is stripped automatically)
  path_prefix: /articles      # URL path must start with this
  regex: "article/\\d+"       # Full URL must match this regex
steps:
  - action: goto              # Same actions as the HTTP API
    url: "${url}"             # ${url} is replaced with the original URL
    wait_until: networkidle
  - action: eval
    expression: "document.querySelector('.cookie-banner')?.remove()"
  - action: wait_for_element
    selector: "#main-content"
    timeout: 10

Match Rules

All match fields are optional, but at least one is required. If you specify multiple fields, all of them must match for the loader to trigger:

  • domain: Exact hostname. www. is stripped from both sides before comparing, so domain: example.com matches www.example.com too.
  • path_prefix: The URL path must start with this string. path_prefix: /blog matches /blog, /blog/post-1, /blog/archive, etc.
  • regex: The full URL is tested against this regular expression.

The ${url} Placeholder

In any string value within a step, ${url} is replaced with the original URL that was passed to goto. This lets you navigate to the URL with custom wait settings, or pass it to JavaScript:

steps:
  - action: goto
    url: "${url}"
    wait_until: networkidle
  - action: eval
    expression: "console.log('Loaded:', '${url}')"

Practical Example: Clean Scraping

Say you're scraping a news site that has cookie popups, newsletter modals, and lazy-loaded content. Without a loader, you'd send 5+ commands after every goto. With a loader:

# loaders/news_site.yaml
name: News Site Cleanup
match:
  domain: news-site.com
steps:
  # Navigate with full network wait so everything loads
  - action: goto
    url: "${url}"
    wait_until: networkidle

  # Wait for the main content to be there
  - action: wait_for_element
    selector: "article"
    timeout: 10

  # Kill the cookie popup
  - action: eval
    expression: "document.querySelector('.cookie-consent')?.remove()"

  # Kill the newsletter modal
  - action: eval
    expression: "document.querySelector('.newsletter-overlay')?.remove()"

  # Scroll to trigger lazy-loaded images
  - action: scroll_to_bottom
    delay: 0.3

  # Small pause for everything to settle
  - action: sleep
    duration: 1

Now when you goto any URL on news-site.com, all of this happens automatically. Your response includes "loader": "News Site Cleanup" so you know it triggered.

Response When a Loader Triggers

{
  "success": true,
  "data": {
    "loader": "News Site Cleanup",
    "steps_executed": 6,
    "last_result": { "success": true, "timestamp": 1234567890.456, "data": { "slept": 1 } }
  }
}

Pre-installed Extensions

The browser comes with these extensions pre-installed:

  • uBlock Origin: Ad and tracker blocking
  • LocalCDN: Serves common CDN resources locally to prevent tracking
  • ClearURLs: Strips tracking parameters from URLs
  • Consent-O-Matic: Automatically handles cookie consent popups (clicks "reject all" or minimal consent)

Example: Full Login Flow (Undetectable)

API=$STEALTHY_AUTO_BROWSE_URL

# Navigate to login page
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "goto", "url": "https://example.com/login"}'

# See what's on the page
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "get_text"}'

# Find all interactive elements and their coordinates
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "get_interactive_elements"}'

# Click the email field (coordinates from get_interactive_elements)
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "system_click", "x": 400, "y": 200}'

# Type email with human-like keystrokes
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "system_type", "text": "user@example.com"}'

# Tab to password field
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "send_key", "key": "tab"}'

# Type password
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "system_type", "text": "secretpassword"}'

# Press Enter to submit
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "send_key", "key": "enter"}'

# Wait for redirect to dashboard
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "wait_for_url", "url": "**/dashboard", "timeout": 15}'

# Verify we're logged in
curl -s -X POST $API -H 'Content-Type: application/json' \
  -d '{"action": "get_text"}'

Tips

  1. Always call get_interactive_elements before clicking โ€” don't guess coordinates
  2. Use system methods for stealth โ€” system_click, system_type, send_key are undetectable
  3. Use get_text first, screenshots second โ€” text is faster and smaller
  4. Match TZ to your IP location โ€” timezone mismatch is a common bot detection signal
  5. Resize screenshots with ?whLargest=512 โ€” full resolution is unnecessarily large
  6. Mount /userdata for persistent sessions โ€” cookies, fingerprint, and profile survive restarts
  7. Use wait conditions instead of sleep โ€” wait_for_element, wait_for_text, wait_for_url
  8. Call handle_dialog BEFORE the action that triggers it โ€” if you need to dismiss or provide prompt text (dialogs are auto-accepted otherwise)
  9. Call calibrate after fullscreen changes โ€” coordinate mapping shifts
  10. Add slight delays between actions for realism โ€” sleep with 0.5-1.5s between clicks looks more human