Browser Agents
RadarOS supports autonomous browser automation through theBrowserAgent class in @radaros/browser. The agent uses a vision-capable LLM (GPT-4o, Gemini) to interpret screenshots of a browser and decide what actions to take — clicking, typing, scrolling, navigating — until the task is complete.
Browser agents use Playwright under the hood. After installing the package, run
npx playwright install chromium to download the browser binary.Installation
Quick Start
How It Works
Launch browser
Playwright opens a Chromium browser (headless by default) and navigates to the start URL.
Take screenshot
A PNG screenshot of the viewport is captured. If
useDOM is enabled, a simplified accessibility tree is also extracted.Send to vision model
The screenshot (and DOM tree if enabled) and task description are sent to a vision-capable LLM.
Receive action
The model returns a structured JSON action: click at coordinates, type text, scroll, navigate, etc.
BrowserAgentConfig
Name of the browser agent.
Vision-capable model. Must support image inputs (e.g.,
openai("gpt-4o"), google("gemini-2.5-flash")).Extra instructions appended to the system prompt. Use for task-specific guidance.
Maximum number of vision loop iterations before the agent gives up.
Run browser without a visible window. Set to
false for debugging and demos.Browser viewport size in pixels. The model sees screenshots at this resolution.
Initial URL to navigate to before starting the task.
Milliseconds to wait after each action for the page to settle.
Max consecutive identical actions before the agent auto-fails (loop detection).
Include a simplified DOM/accessibility tree alongside the screenshot. This hybrid approach gives the model both visual context and precise element coordinates for better targeting.
Path to a Playwright storageState JSON file. Restores cookies, localStorage, and sessionStorage from a previous session. Use this to maintain login state across runs.
Enable video recording of the browser session. Pass
true for the default directory (./browser-videos) or { dir: "/path" } for a custom location.Enable anti-bot-detection mode. Patches
navigator.webdriver, spoofs plugins, languages, WebGL renderer, and more. Pass true for sensible defaults or a StealthConfig object for fine control (custom user-agent, locale, timezone, geolocation, proxy).Simulate human-like behavior — variable typing speed, jittered click coordinates, Bézier mouse movement curves, random micro-pauses. Pass
true for defaults or a HumanizeConfig for fine control.Secure credential store. The LLM only sees named placeholders — real values are injected at execution time and scrubbed from all output.
Logging level:
"debug", "info", "warn", "error", "silent".run()
Natural language description of what the agent should do in the browser.
Override the config’s
startUrl for this run.Per-run API key override for the vision model.
Path to save cookies/auth state after the run completes. Load it back on the next run via
storageState in config.BrowserRunOutput
| Field | Type | Description |
|---|---|---|
result | string | Final text result or failure reason |
success | boolean | Whether the task completed successfully |
steps | BrowserStep[] | Full action history with screenshots |
finalUrl | string | URL at completion |
finalScreenshot | Buffer | Last screenshot (PNG) |
durationMs | number | Total time taken |
videoPath | string? | Video file path (if recordVideo was enabled) |
Available Actions
The vision model can choose from these actions at each step:| Action | Parameters | Description |
|---|---|---|
click | x, y, description | Click at viewport coordinates |
type | text, x?, y? | Type text (optionally click a position first) |
scroll | direction, amount? | Scroll up or down |
navigate | url | Go to a specific URL |
back | — | Go back to the previous page |
wait | ms | Wait for page to load |
done | result | Task is complete |
fail | reason | Task cannot be completed |
DOM Extraction (Hybrid Mode)
By default, the agent relies purely on vision — the model interprets screenshots to locate elements. EnablinguseDOM: true adds a hybrid mode where a simplified accessibility tree is also extracted and sent alongside the screenshot.
extractDOM() directly on a BrowserProvider:
Cookie & Auth Persistence
Maintain login sessions across agent runs using Playwright’s storage state.Stealth Mode (Anti-Detection)
Many websites detect and block headless browsers. Stealth mode patches common detection vectors so the browser appears as a normal user session.What stealth patches
| Vector | What it does |
|---|---|
navigator.webdriver | Removed (normally true in automation) |
navigator.plugins | Spoofed with realistic Chrome plugins |
navigator.languages | Set to ["en-US", "en"] |
navigator.permissions | Notifications return "prompt" instead of "denied" |
window.chrome.runtime | Stubbed to appear like a real Chrome extension API |
| WebGL renderer | Reports “Intel Iris OpenGL Engine” instead of “SwiftShader” |
| DOM markers | Removes cdc_ and __playwright attributes |
| Chrome launch flags | --disable-blink-features=AutomationControlled |
| User-Agent | Rotated from a pool of realistic Chrome/Safari strings |
Fine-grained StealthConfig
HumanizeConfig
Makes the browser behave like a real person — variable timing, imprecise clicks, curved mouse paths.| Option | Default | Description |
|---|---|---|
typingDelay | [40, 120] | Min/max ms delay between keystrokes |
clickJitter | 3 | Random pixel offset added to click coordinates |
actionDelay | [200, 800] | Random pause after each interaction |
mouseMovement | true | Simulate smoothstep mouse curves to target |
Video Recording
Record the agent’s entire browser session as a video for debugging, auditing, or demos.result.videoPath when the run completes.
Parallel Browsing (Multi-Tab)
BrowserProvider supports multiple tabs for advanced workflows:
Tab API
| Method | Returns | Description |
|---|---|---|
newTab(url?) | string | Open a new tab, optionally navigate |
switchTab(tabId) | void | Make a tab active |
closeTab(tabId) | void | Close a tab (can’t close the last one) |
listTabs() | TabInfo[] | List all open tabs with URL and active status |
currentTabId | string | Get the active tab’s ID |
Browser Gateway (Socket.IO)
Stream browser agent execution over Socket.IO for live observation UIs, dashboards, or remote monitoring.Client Usage
Gateway Events
| Direction | Event | Payload |
|---|---|---|
| Client → Server | browser.start | { agentName, task, startUrl?, apiKey? } |
| Client → Server | browser.stop | — |
| Server → Client | browser.started | { agentName, task } |
| Server → Client | browser.screenshot | { data: base64, mimeType } |
| Server → Client | browser.action | { action } |
| Server → Client | browser.step | { index, action, pageUrl, screenshot? } |
| Server → Client | browser.done | { result, success, finalUrl, durationMs, totalSteps, videoPath? } |
| Server → Client | browser.error | { error: string } |
| Server → Client | browser.stopped | — |
BrowserGatewayOptions
Named BrowserAgent instances. Clients pick one via
agentName.Socket.IO server instance.
Socket.IO namespace for the gateway.
Stream live screenshots to clients. Disable for bandwidth-constrained connections.
Optional authentication middleware applied to the namespace.
Loop Detection
The agent detects when it’s stuck repeating the same action:maxRepeats times, it stops and returns success: false with a descriptive error. This prevents infinite loops caused by popups, consent banners, or ambiguous page states.
asTool() — Browser as an Agent Tool
The most powerful pattern: give a regular text agent the ability to browse the web.Events
Browser agents emit events viaEventBus:
| Event | Payload | When |
|---|---|---|
browser.screenshot | { data: Buffer } | Screenshot captured |
browser.action | { action } | Action decided by model |
browser.step | { index, action, pageUrl, screenshot } | Each loop iteration |
browser.done | { result, success, steps } | Task completed |
browser.error | { error: Error } | Error occurred |
Tips
Use headless: false
Set
headless: false during development to watch the agent navigate in real time.Enable useDOM
Turn on
useDOM: true for pages with many small or overlapping interactive elements.Be specific
Clear, specific task descriptions produce better results than vague ones.
Set a start URL
Always provide a
startUrl when possible. Starting from a blank page wastes steps.Record videos
Use
recordVideo: true during development to replay agent sessions.Persist auth
Use
storageState + saveStorageState to avoid re-logging-in every run.Go stealth
Use
stealth: true + humanize: true to bypass bot detection on protected sites.Secure credentials
Use
CredentialVault so the LLM never sees passwords — only placeholders.Examples
| Example | Description |
|---|---|
examples/browser/30-browser-agent.ts | Standalone browser agent — Hacker News search |
examples/browser/31-browser-as-tool.ts | Browser as a tool inside a research agent |