stealthy-auto-browse
A stealth browser running in Docker. It uses Camoufox (a custom Firefox fork) instead of Chromium, so there are zero Chrome DevTools Protocol (CDP) signals for bot detectors to find. Mouse and keyboard input happens at the OS level via PyAutoGUI โ the browser itself doesn't know it's being automated, which means behavioral analysis can't detect it either.
Why This Exists
Standard browser automation (Playwright + Chromium, Puppeteer, Selenium) exposes CDP signals that bot detection services (Cloudflare, DataDome, PerimeterX, Akamai) catch instantly. Even with stealth plugins, the CDP protocol is still there and detectable. This skill eliminates that entirely by using Firefox (no CDP at all) and generating input events at the OS level rather than through the browser's automation API.
When To Use This Skill
- Site has bot detection (Cloudflare challenge pages, DataDome, PerimeterX, Akamai)
- Site blocks headless browsers or serves CAPTCHAs
- You need a logged-in session that doesn't get banned
- Another browser skill is getting 403s or empty/blocked responses
- You're scraping a site that actively fights automation
When NOT To Use This Skill
- Simple fetches with no bot protection โ use
curlorWebFetch - Sites that don't care about automation โ use a regular browser skill, it's faster to set up
- You only need static HTML โ use
curl
Setup
1. Start the container:
docker run -d -p 8080:8080 -p 5900:5900 psyb0t/stealthy-auto-browse
Port 8080 is the HTTP API. Port 5900 is a noVNC web viewer where you can watch the browser in real time.
2. Set the environment variable:
export STEALTHY_AUTO_BROWSE_URL=http://localhost:8080
Or via OpenClaw config (~/.openclaw/openclaw.json):
{
"skills": {
"entries": {
"stealthy-auto-browse": {
"env": {
"STEALTHY_AUTO_BROWSE_URL": "http://localhost:8080"
}
}
}
}
}
3. Verify: curl $STEALTHY_AUTO_BROWSE_URL/health returns ok when the browser is ready.
How It Works
The container runs a virtual X display (Xvfb at 1920x1080), the Camoufox browser, and an HTTP API server. You send JSON commands to the API and get JSON responses back. All commands go to POST $STEALTHY_AUTO_BROWSE_URL/ with {"action": "<name>", ...params}.
Every response has this shape:
{
"success": true,
"timestamp": 1234567890.123,
"data": { ... },
"error": "only present when success is false"
}
The data field contents vary by action โ documented below for each one.
Understanding the Two Input Modes
This is the most important concept. There are two ways to interact with pages:
System Input (Undetectable)
Actions: system_click, mouse_move, mouse_click, system_type, send_key, scroll
These use PyAutoGUI to generate real OS-level mouse movements and keystrokes. The browser receives these as genuine user input โ there is no way for any website JavaScript to distinguish these from a real human. Use these for stealth.
System input works with viewport coordinates (x, y pixel positions within the browser content area). Get these coordinates from get_interactive_elements.
Playwright Input (Detectable)
Actions: click, fill, type
These use Playwright's DOM automation to interact with elements by CSS selector or XPath. They're faster and more reliable (no coordinate math), but they inject events through the browser's automation layer. Sophisticated behavioral analysis can potentially detect the timing patterns. Use these when speed matters more than stealth, or when you have a selector but no coordinates.
When to Use Which
- Stealth-critical sites (Cloudflare, login forms, anything with bot detection): Always use system input.
- Simple scraping where the site isn't actively fighting you: Playwright input is fine and easier.
- Form filling: Use
system_clickto focus the field, thensystem_typeto enter text. This is undetectable. Usingfillis faster but detectable. - Clicking buttons: If you have coordinates from
get_interactive_elements, usesystem_click. If you only have a CSS selector, useclick.
Workflow
This is the typical sequence for interacting with a page:
- Navigate:
gototo load the URL - Read the page:
get_textreturns all visible text โ usually enough to understand the page - If text isn't clear:
get_htmlgives you the full DOM structure - If still confused: Take a screenshot (
GET /screenshot/browser?whLargest=512) - Find interactive elements:
get_interactive_elementsreturns all buttons, links, inputs with their x,y coordinates - Interact:
system_clickto click,system_typeto type,send_keyfor Enter/Tab/Escape - Wait for results:
wait_for_elementorwait_for_textinstead of sleeping - Verify:
get_textagain to confirm the page changed as expected
Actions Reference
Navigation
goto
Navigates to a URL. This is how you load pages.
{"action": "goto", "url": "https://example.com"}
{"action": "goto", "url": "https://example.com", "wait_until": "networkidle"}
Parameters:
url(required): The URL to navigate to.wait_until(optional, default"domcontentloaded"): When to consider the page loaded. Options:"domcontentloaded"(DOM parsed, fast),"load"(all resources loaded),"networkidle"(no network activity for 500ms, slowest but most complete).
Response data: {"url": "https://example.com/", "title": "Example Domain"}
Note: If a page loader matches the URL (see Page Loaders section), the loader's steps execute instead of the default navigation. The response will include "loader": "loader name" when this happens.
refresh
Reloads the current page.
{"action": "refresh"}
{"action": "refresh", "wait_until": "networkidle"}
Parameters:
wait_until(optional, default"domcontentloaded"): Same options asgoto.
Response data: {"url": "https://example.com/current-page", "title": "Current Page"}
System Input (Undetectable)
system_click
Moves the mouse to viewport coordinates with a human-like curve (random jitter, eased acceleration), then clicks. This is the primary way to click things stealthily.
{"action": "system_click", "x": 500, "y": 300}
{"action": "system_click", "x": 500, "y": 300, "duration": 0.5}
Parameters:
x,y(required): Viewport coordinates โ get these fromget_interactive_elements.duration(optional): How long the mouse movement takes in seconds. If omitted, a random duration between 0.2-0.6s is used for realism.
Response data: {"system_clicked": {"x": 500, "y": 300}}
How it differs from mouse_click: system_click always moves the mouse first (smooth human-like path), then clicks. mouse_click can click at a position instantly without the smooth movement, or click wherever the mouse currently is.
mouse_move
Moves the mouse to viewport coordinates with human-like movement (jitter, eased curve) but does NOT click. Use this to hover over elements (to trigger hover menus, tooltips) or to simulate natural mouse behavior between actions.
{"action": "mouse_move", "x": 500, "y": 300}
{"action": "mouse_move", "x": 500, "y": 300, "duration": 0.4}
Parameters:
x,y(required): Viewport coordinates.duration(optional): Movement time in seconds. Random 0.2-0.6s if omitted.
Response data: {"moved_to": {"x": 500, "y": 300}}
mouse_click
Clicks at a position or at the current mouse location. Unlike system_click, this does NOT do a smooth mouse movement first โ it's a direct click via PyAutoGUI.
{"action": "mouse_click"}
{"action": "mouse_click", "x": 500, "y": 300}
Parameters:
x,y(optional): If provided, clicks at that viewport position directly. If omitted, clicks wherever the mouse currently is.
Response data: {"clicked_at": {"x": 500, "y": 300}} or {"clicked_at": "current"}
When to use: After a mouse_move when you want to separate the movement and click into two steps. Or when the mouse is already positioned and you just need to click.
system_type
Types text character-by-character via real OS keystrokes. Each keystroke has a randomized delay (jittered around the interval) to mimic human typing speed. Completely undetectable.
{"action": "system_type", "text": "hello world"}
{"action": "system_type", "text": "hello world", "interval": 0.12}
Parameters:
text(required): The text to type. Must click/focus an input field first.interval(optional, default0.08): Base delay between keystrokes in seconds. Actual delay is randomized +-30ms around this value.
Response data: {"typed_len": 11}
Important: You must click on the input field first (using system_click or click) before calling system_type. This action types into whatever is currently focused.
send_key
Sends a single keyboard key or key combination via OS-level input. Use this for pressing Enter to submit forms, Tab to move between fields, Escape to close dialogs, or any key combos like Ctrl+A, Ctrl+C, etc.
{"action": "send_key", "key": "enter"}
{"action": "send_key", "key": "tab"}
{"action": "send_key", "key": "escape"}
{"action": "send_key", "key": "ctrl+a"}
{"action": "send_key", "key": "ctrl+shift+t"}
Parameters:
key(required): Key name or combo with+separator. Key names follow PyAutoGUI naming:enter,tab,escape,backspace,delete,up,down,left,right,home,end,pageup,pagedown,f1-f12,ctrl,alt,shift,space, etc.
Response data: {"send_key": "enter"}
scroll
Scrolls the page using the mouse scroll wheel. Generates real OS-level scroll events.
{"action": "scroll", "amount": -3}
{"action": "scroll", "amount": 5, "x": 500, "y": 300}
Parameters:
amount(optional, default-3): Scroll amount. Negative = scroll down, positive = scroll up. Each unit is roughly one "click" of a mouse wheel.x,y(optional): If provided, moves the mouse to these viewport coordinates first, then scrolls. Useful for scrolling inside a specific scrollable element rather than the whole page.
Response data: {"scrolled": -3}
Playwright Input (Detectable)
These are faster and more convenient but use Playwright's DOM event injection, which is detectable by sophisticated behavioral analysis.
click
Clicks an element by CSS selector or XPath. Playwright finds the element in the DOM, scrolls it into view if needed, and dispatches click events.
{"action": "click", "selector": "#submit-btn"}
{"action": "click", "selector": "button.primary"}
{"action": "click", "selector": "xpath=//button[@id='submit-btn']"}
Parameters:
selector(required): CSS selector or XPath (prefix withxpath=).
Response data: {"clicked": "#submit-btn"}
When to use over system_click: When you have a selector but don't want to bother getting coordinates. When the element might move around and coordinates aren't reliable. When stealth isn't critical.
fill
Fills an input field by selector. Clears any existing content first, then sets the value. This is the fastest way to fill forms but is detectable because it doesn't generate individual keystroke events.
{"action": "fill", "selector": "input[name='email']", "value": "user@example.com"}
Parameters:
selector(required): CSS selector or XPath of the input element.value(required): Text to fill in.
Response data: {"filled": "input[name='email']"}
type
Types text into an element character-by-character via Playwright (NOT the OS). Each keystroke has a configurable delay. This is a middle ground between fill (instant but obviously automated) and system_type (OS-level, undetectable). The typing pattern is more realistic than fill but still comes through Playwright's event system.
{"action": "type", "selector": "#search", "text": "query", "delay": 0.05}
Parameters:
selector(required): CSS selector or XPath of the element.text(required): Text to type.delay(optional, default0.05): Delay between keystrokes in seconds.
Response data: {"typed": "#search"}
Screenshots
Screenshots are GET requests (not POST actions).
GET /screenshot/browser
Captures the browser viewport as a PNG image. This is what the page looks like to a user.
curl -s "$STEALTHY_AUTO_BROWSE_URL/screenshot/browser?whLargest=512" -o screenshot.png
Always resize screenshots to avoid huge images. Resize query parameters (all optional):
| Parameter | What it does |
|---|---|
whLargest=512 |
Scales so the largest dimension is 512px, keeps aspect ratio. Use this by default. |
width=800 |
Scales to 800px wide, keeps aspect ratio |
height=300 |
Scales to 300px tall, keeps aspect ratio |
width=400&height=400 |
Forces exact 400x400 dimensions |
GET /screenshot/desktop
Captures the entire virtual desktop (including window chrome, taskbar, etc.) using scrot. Same resize parameters as above. Useful when you need to see things outside the browser viewport.
curl -s "$STEALTHY_AUTO_BROWSE_URL/screenshot/desktop?whLargest=512" -o desktop.png
Page Inspection
get_interactive_elements
Scans the page and returns every interactive element (buttons, links, inputs, selects, textareas, etc.) with their viewport coordinates. This is how you find what to click and where.
{"action": "get_interactive_elements"}
{"action": "get_interactive_elements", "visible_only": true}
Parameters:
visible_only(optional, defaulttrue): Only return elements that are currently visible on screen.
Response data:
{
"count": 5,
"elements": [
{
"i": 0,
"tag": "button",
"id": "submit-btn",
"text": "Submit",
"selector": "#submit-btn",
"x": 400,
"y": 250,
"w": 120,
"h": 40,
"visible": true
},
{
"i": 1,
"tag": "input",
"id": null,
"text": "",
"selector": "input[name='email']",
"x": 300,
"y": 180,
"w": 250,
"h": 35,
"visible": true
}
]
}
The x, y are the center of the element โ pass these directly to system_click. The selector can be used with Playwright actions like click or fill. The w, h give you the element dimensions.
This is your primary tool for understanding what you can interact with on a page. Call this before clicking anything.
get_text
Returns all visible text content of the page body. Text is truncated to 10,000 characters.
{"action": "get_text"}
Response data: {"text": "Page title\nSome content here...", "length": 1234}
This is usually the first thing to call after navigating โ it tells you what's on the page without needing a screenshot.
get_html
Returns the full HTML source of the current page.
{"action": "get_html"}
Response data: {"html": "<!DOCTYPE html>...", "length": 45678}
Use when get_text doesn't give enough structure to understand the page layout, or when you need to find specific elements in the DOM.
eval
Executes arbitrary JavaScript in the page context and returns the result. The expression is evaluated via page.evaluate().
{"action": "eval", "expression": "document.title"}
{"action": "eval", "expression": "document.querySelectorAll('a').length"}
{"action": "eval", "expression": "JSON.stringify(performance.timing)"}
Parameters:
expression(required): JavaScript expression to evaluate. Must return a JSON-serializable value.
Response data: {"result": "Example Domain"} โ the result is whatever the expression returns.
Wait Conditions
Use these instead of sleep to wait for page content. They're more reliable because they wait for the exact condition rather than an arbitrary time.
wait_for_element
Waits for an element matching a CSS selector or XPath to reach a certain state (visible, hidden, attached to DOM, detached).
{"action": "wait_for_element", "selector": "#results", "timeout": 10}
{"action": "wait_for_element", "selector": "xpath=//div[@class='loaded']", "timeout": 15}
{"action": "wait_for_element", "selector": ".spinner", "state": "hidden", "timeout": 10}
Parameters:
selector(required): CSS selector or XPath (prefix withxpath=).state(optional, default"visible"): What state to wait for. Options:"visible"(rendered and not hidden),"hidden"(not visible),"attached"(in DOM regardless of visibility),"detached"(removed from DOM).timeout(optional, default30): Max wait time in seconds. Throws error if exceeded.
Response data: {"selector": "#results", "state": "visible"}
wait_for_text
Waits for specific text to appear anywhere in the page body.
{"action": "wait_for_text", "text": "Search results", "timeout": 10}
Parameters:
text(required): Exact text to look for (substring match ondocument.body.innerText).timeout(optional, default30): Max wait time in seconds.
Response data: {"text": "Search results", "found": true}
wait_for_url
Waits for the page URL to match a pattern. Useful after form submissions or redirects.
{"action": "wait_for_url", "url": "**/dashboard", "timeout": 10}
{"action": "wait_for_url", "url": "https://example.com/success*", "timeout": 15}
Parameters:
url(required): URL pattern to match. Supports*(any chars except/) and**(any chars including/) glob patterns. Can also be a full URL for exact match.timeout(optional, default30): Max wait time in seconds.
Response data: {"url": "https://example.com/dashboard"}
wait_for_network_idle
Waits until there are no network requests in flight for 500ms. Useful for pages that load content dynamically after the initial page load.
{"action": "wait_for_network_idle", "timeout": 30}
Parameters:
timeout(optional, default30): Max wait time in seconds.
Response data: {"idle": true}
Tab Management
The browser can have multiple tabs open. One tab is "active" at a time โ all actions operate on the active tab.
list_tabs
Returns all open tabs with their URLs and which one is active.
{"action": "list_tabs"}
Response data:
{
"count": 2,
"tabs": [
{"index": 0, "url": "https://example.com/", "active": false},
{"index": 1, "url": "https://other.com/", "active": true}
]
}
new_tab
Opens a new browser tab. Optionally navigates it to a URL. The new tab becomes the active tab.
{"action": "new_tab"}
{"action": "new_tab", "url": "https://example.com"}
Parameters:
url(optional): URL to navigate to in the new tab.wait_until(optional, default"domcontentloaded"): Same asgoto.
Response data: {"index": 1, "url": "https://example.com/"}
switch_tab
Switches the active tab by index (0-based). All subsequent actions will operate on this tab.
{"action": "switch_tab", "index": 0}
Parameters:
index(required): Tab index fromlist_tabs.
Response data: {"index": 0, "url": "https://example.com/"}
close_tab
Closes a tab. After closing, the last remaining tab becomes active.
{"action": "close_tab"}
{"action": "close_tab", "index": 1}
Parameters:
index(optional): Tab index to close. If omitted, closes the currently active tab.
Response data: {"closed": true, "remaining": 1}
Dialog Handling
Browsers have modal dialogs (alert, confirm, prompt). By default, dialogs are auto-accepted (clicks OK). Use handle_dialog if you need to dismiss a dialog or provide text for a prompt.
handle_dialog
Call BEFORE the action that triggers the dialog if you want to dismiss it or provide prompt text. If you don't call this, the dialog is auto-accepted (clicks OK).
{"action": "handle_dialog", "accept": true}
{"action": "handle_dialog", "accept": false}
{"action": "handle_dialog", "accept": true, "text": "my response"}
Parameters:
accept(optional, defaulttrue):trueclicks OK/Accept,falseclicks Cancel/Dismiss.text(optional): Response text for prompt dialogs. Ignored for alert/confirm.
Response data: {"configured": {"accept": true, "text": null}}
Example โ handling a confirm dialog:
# Step 1: Tell the browser to accept the next dialog
curl -X POST $API -H 'Content-Type: application/json' -d '{"action": "handle_dialog", "accept": true}'
# Step 2: Now click the button that triggers the confirm
curl -X POST $API -H 'Content-Type: application/json' -d '{"action": "system_click", "x": 300, "y": 200}'
get_last_dialog
Returns information about the most recent dialog that appeared.
{"action": "get_last_dialog"}
Response data:
{
"dialog": {
"type": "confirm",
"message": "Are you sure you want to delete this?",
"default_value": "",
"buttons": ["ok", "cancel"]
}
}
Returns {"dialog": null} if no dialog has appeared yet. The type field is one of: "alert", "confirm", "prompt", "beforeunload".
Cookies
get_cookies
Returns all cookies for the browser context, or cookies for specific URLs.
{"action": "get_cookies"}
{"action": "get_cookies", "urls": ["https://example.com"]}
Parameters:
urls(optional): Array of URLs to filter cookies by. If omitted, returns all cookies.
Response data:
{
"count": 3,
"cookies": [
{"name": "session", "value": "abc123", "domain": ".example.com", "path": "/", "httpOnly": true, "secure": true, ...}
]
}
set_cookie
Sets a cookie in the browser context.
{"action": "set_cookie", "name": "session", "value": "abc123", "url": "https://example.com"}
{"action": "set_cookie", "name": "pref", "value": "dark", "domain": ".example.com", "path": "/", "httpOnly": false, "secure": true}
Parameters: Any standard cookie fields โ name, value, url, domain, path, httpOnly, secure, sameSite, expires. At minimum you need name, value, and either url or domain.
Response data: {"set": "session"}
delete_cookies
Clears all cookies from the browser context.
{"action": "delete_cookies"}
Response data: {"cleared": true}
Storage
Access the page's localStorage and sessionStorage. These are per-origin โ you must be on the right page for the storage to be accessible.
get_storage
Returns all items from localStorage or sessionStorage as a key-value object.
{"action": "get_storage", "type": "local"}
{"action": "get_storage", "type": "session"}
Parameters:
type(optional, default"local"):"local"for localStorage,"session"for sessionStorage.
Response data: {"items": {"theme": "dark", "lang": "en"}, "type": "local"}
set_storage
Sets a single key-value pair in localStorage or sessionStorage.
{"action": "set_storage", "type": "local", "key": "theme", "value": "dark"}
Parameters:
type(optional, default"local"):"local"or"session".key(required): Storage key.value(required): Storage value (string).
Response data: {"set": "theme", "type": "local"}
clear_storage
Clears all items from localStorage or sessionStorage.
{"action": "clear_storage", "type": "local"}
{"action": "clear_storage", "type": "session"}
Response data: {"cleared": "local"}
Downloads
The browser automatically tracks file downloads triggered by page interactions (clicking download links, form submissions that return files, etc.).
get_last_download
Returns information about the most recently downloaded file.
{"action": "get_last_download"}
Response data:
{
"download": {
"url": "https://example.com/file.pdf",
"filename": "file.pdf",
"path": "/tmp/playwright-downloads/abc123/file.pdf"
}
}
Returns {"download": null} if nothing has been downloaded yet. The path is the local path inside the container where the file was saved. The filename is what the server suggested as the download name.
Uploads
upload_file
Programmatically sets a file on an <input type="file"> element without opening the OS file picker. The file must exist inside the container โ use docker cp to copy files in if needed.
{"action": "upload_file", "selector": "#file-input", "file_path": "/tmp/document.pdf"}
Parameters:
selector(required): CSS selector of the file input element.file_path(required): Absolute path to the file inside the container.
Response data: {"selector": "#file-input", "file": "document.pdf", "size": 12345}
Note: After setting the file, you still need to submit the form (click the submit button) for the upload to actually happen.
Network Logging
Capture all HTTP requests and responses the page makes. Useful for debugging, finding API endpoints the page calls, or verifying that certain resources loaded.
enable_network_log
Starts recording all HTTP requests and responses from the active page.
{"action": "enable_network_log"}
Response data: {"enabled": true}
disable_network_log
Stops recording network activity. Already-captured entries remain.
{"action": "disable_network_log"}
Response data: {"enabled": false}
get_network_log
Returns all captured network entries since logging was enabled (or last cleared).
{"action": "get_network_log"}
Response data:
{
"count": 4,
"log": [
{"type": "request", "url": "https://api.example.com/data", "method": "GET", "resource_type": "fetch", "timestamp": 1234567890.123},
{"type": "response", "url": "https://api.example.com/data", "status": 200, "timestamp": 1234567890.456},
{"type": "request", "url": "https://cdn.example.com/style.css", "method": "GET", "resource_type": "stylesheet", "timestamp": 1234567890.789},
{"type": "response", "url": "https://cdn.example.com/style.css", "status": 200, "timestamp": 1234567890.999}
]
}
Each entry is either a "request" or "response". Requests include method and resource_type (fetch, document, stylesheet, script, image, etc.). Responses include status code.
clear_network_log
Deletes all captured network entries but keeps logging enabled if it was on.
{"action": "clear_network_log"}
Response data: {"cleared": true}
Scrolling
scroll_to_bottom
Scrolls the entire page from top to bottom using JavaScript window.scrollBy(). Scrolls one viewport height at a time with a fixed delay between scrolls. When it reaches the bottom (scroll position stops changing), it scrolls back to the top. Useful for triggering lazy-loaded content.
{"action": "scroll_to_bottom"}
{"action": "scroll_to_bottom", "delay": 0.6}
Parameters:
delay(optional, default0.4): Seconds to wait between each scroll step.
Response data: {"scrolled": "bottom"}
scroll_to_bottom_humanized
Same as scroll_to_bottom but uses real OS-level mouse wheel scrolling (via PyAutoGUI) with randomized scroll amounts and jittered delays to look like a human scrolling. Undetectable by behavioral analysis.
{"action": "scroll_to_bottom_humanized"}
{"action": "scroll_to_bottom_humanized", "min_clicks": 3, "max_clicks": 8, "delay": 0.7}
Parameters:
min_clicks(optional, default2): Minimum mouse wheel clicks per scroll step.max_clicks(optional, default6): Maximum mouse wheel clicks per scroll step. A random value between min and max is chosen each time.delay(optional, default0.5): Base delay between scroll steps. Actual delay is jittered +-30%.
Response data: {"scrolled": "bottom_humanized"}
Display
calibrate
Recalculates the mapping between viewport coordinates (what get_interactive_elements returns) and screen coordinates (what PyAutoGUI uses). The browser has window chrome (title bar, address bar) that offsets the viewport from the screen origin.
{"action": "calibrate"}
Response data: {"window_offset": {"x": 0, "y": 74}}
When to call this: After entering/exiting fullscreen, after the browser window is resized, or if system_click coordinates seem off. The offset is auto-calculated at startup, so you rarely need this.
get_resolution
Returns the virtual display resolution (from the XVFB_RESOLUTION environment variable).
{"action": "get_resolution"}
Response data: {"width": 1920, "height": 1080}
enter_fullscreen / exit_fullscreen
Toggles browser fullscreen mode (hides address bar and window chrome). In fullscreen, the viewport takes up the entire screen, so coordinates map differently.
{"action": "enter_fullscreen"}
{"action": "exit_fullscreen"}
Response data: {"fullscreen": true, "changed": true} โ changed is false if already in the requested state.
Important: Call calibrate after entering/exiting fullscreen to update the coordinate mapping.
Utility
ping
Health check that returns the current page URL. Use to verify the API is responding and the browser is alive.
{"action": "ping"}
Response data: {"message": "pong", "url": "https://example.com/"}
sleep
Pauses execution for a specified duration. Prefer wait_for_element or wait_for_text when waiting for page content โ use sleep only for fixed timing needs.
{"action": "sleep", "duration": 2}
Parameters:
duration(optional, default1): Seconds to sleep.
Response data: {"slept": 2}
close
Shuts down the browser. The container will stop after this.
{"action": "close"}
Response data: {"message": "closing"}
State Endpoints (GET)
GET /state
Returns the current browser state.
curl -s "$STEALTHY_AUTO_BROWSE_URL/state"
Response:
{
"status": "ready",
"url": "https://example.com/",
"title": "Example Domain",
"window_offset": {"x": 0, "y": 74}
}
GET /health
Simple health check. Returns ok as plain text when the API is ready.
curl -s "$STEALTHY_AUTO_BROWSE_URL/health"
Container Options
# Custom display resolution
docker run -d -p 8080:8080 -e XVFB_RESOLUTION=1280x720 psyb0t/stealthy-auto-browse
# Match timezone to your IP's geographic location (important for stealth โ mismatched
# timezone is a common bot detection signal)
docker run -d -p 8080:8080 -e TZ=Europe/Bucharest psyb0t/stealthy-auto-browse
# Route browser traffic through an HTTP proxy
docker run -d -p 8080:8080 -e PROXY_URL=http://user:pass@proxy:8888 psyb0t/stealthy-auto-browse
# Persistent browser profile โ cookies, sessions, and fingerprint survive container restarts
docker run -d -p 8080:8080 -v ./profile:/userdata psyb0t/stealthy-auto-browse
# Open a URL automatically on startup
docker run -d -p 8080:8080 psyb0t/stealthy-auto-browse https://example.com
Page Loaders (URL-Triggered Automation)
Page loaders are like Greasemonkey/Tampermonkey userscripts but for the HTTP API. You define a set of actions that automatically run whenever the browser navigates to a matching URL. Instead of manually sending a sequence of commands every time you visit a site, you write it once as a YAML file and the container handles it.
This is useful for things like: removing cookie popups, dismissing overlays, waiting for dynamic content, cleaning up pages before scraping, or any repetitive setup you'd otherwise do manually every time.
How They Work
- You create YAML files that define URL patterns and a list of steps
- Mount those files into the container at
/loaders - Whenever
gotonavigates to a URL that matches a loader's pattern, the loader's steps run automatically instead of the default navigation
The steps are the exact same actions as the HTTP API. Every action you can send via POST / (goto, eval, click, system_click, sleep, scroll, wait_for_element, etc.) works as a loader step. Same names, same parameters.
Setup
docker run -d -p 8080:8080 -p 5900:5900 \
-v ./my-loaders:/loaders \
psyb0t/stealthy-auto-browse
Loader Format
name: Human-readable name for this loader
match:
domain: example.com # Exact hostname match (www. is stripped automatically)
path_prefix: /articles # URL path must start with this
regex: "article/\\d+" # Full URL must match this regex
steps:
- action: goto # Same actions as the HTTP API
url: "${url}" # ${url} is replaced with the original URL
wait_until: networkidle
- action: eval
expression: "document.querySelector('.cookie-banner')?.remove()"
- action: wait_for_element
selector: "#main-content"
timeout: 10
Match Rules
All match fields are optional, but at least one is required. If you specify multiple fields, all of them must match for the loader to trigger:
domain: Exact hostname.www.is stripped from both sides before comparing, sodomain: example.commatcheswww.example.comtoo.path_prefix: The URL path must start with this string.path_prefix: /blogmatches/blog,/blog/post-1,/blog/archive, etc.regex: The full URL is tested against this regular expression.
The ${url} Placeholder
In any string value within a step, ${url} is replaced with the original URL that was passed to goto. This lets you navigate to the URL with custom wait settings, or pass it to JavaScript:
steps:
- action: goto
url: "${url}"
wait_until: networkidle
- action: eval
expression: "console.log('Loaded:', '${url}')"
Practical Example: Clean Scraping
Say you're scraping a news site that has cookie popups, newsletter modals, and lazy-loaded content. Without a loader, you'd send 5+ commands after every goto. With a loader:
# loaders/news_site.yaml
name: News Site Cleanup
match:
domain: news-site.com
steps:
# Navigate with full network wait so everything loads
- action: goto
url: "${url}"
wait_until: networkidle
# Wait for the main content to be there
- action: wait_for_element
selector: "article"
timeout: 10
# Kill the cookie popup
- action: eval
expression: "document.querySelector('.cookie-consent')?.remove()"
# Kill the newsletter modal
- action: eval
expression: "document.querySelector('.newsletter-overlay')?.remove()"
# Scroll to trigger lazy-loaded images
- action: scroll_to_bottom
delay: 0.3
# Small pause for everything to settle
- action: sleep
duration: 1
Now when you goto any URL on news-site.com, all of this happens automatically. Your response includes "loader": "News Site Cleanup" so you know it triggered.
Response When a Loader Triggers
{
"success": true,
"data": {
"loader": "News Site Cleanup",
"steps_executed": 6,
"last_result": { "success": true, "timestamp": 1234567890.456, "data": { "slept": 1 } }
}
}
Pre-installed Extensions
The browser comes with these extensions pre-installed:
- uBlock Origin: Ad and tracker blocking
- LocalCDN: Serves common CDN resources locally to prevent tracking
- ClearURLs: Strips tracking parameters from URLs
- Consent-O-Matic: Automatically handles cookie consent popups (clicks "reject all" or minimal consent)
Example: Full Login Flow (Undetectable)
API=$STEALTHY_AUTO_BROWSE_URL
# Navigate to login page
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "goto", "url": "https://example.com/login"}'
# See what's on the page
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "get_text"}'
# Find all interactive elements and their coordinates
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "get_interactive_elements"}'
# Click the email field (coordinates from get_interactive_elements)
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "system_click", "x": 400, "y": 200}'
# Type email with human-like keystrokes
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "system_type", "text": "user@example.com"}'
# Tab to password field
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "send_key", "key": "tab"}'
# Type password
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "system_type", "text": "secretpassword"}'
# Press Enter to submit
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "send_key", "key": "enter"}'
# Wait for redirect to dashboard
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "wait_for_url", "url": "**/dashboard", "timeout": 15}'
# Verify we're logged in
curl -s -X POST $API -H 'Content-Type: application/json' \
-d '{"action": "get_text"}'
Tips
- Always call
get_interactive_elementsbefore clicking โ don't guess coordinates - Use system methods for stealth โ
system_click,system_type,send_keyare undetectable - Use
get_textfirst, screenshots second โ text is faster and smaller - Match TZ to your IP location โ timezone mismatch is a common bot detection signal
- Resize screenshots with
?whLargest=512โ full resolution is unnecessarily large - Mount
/userdatafor persistent sessions โ cookies, fingerprint, and profile survive restarts - Use wait conditions instead of
sleepโwait_for_element,wait_for_text,wait_for_url - Call
handle_dialogBEFORE the action that triggers it โ if you need to dismiss or provide prompt text (dialogs are auto-accepted otherwise) - Call
calibrateafter fullscreen changes โ coordinate mapping shifts - Add slight delays between actions for realism โ
sleepwith 0.5-1.5s between clicks looks more human