Browser Agents

RadarOS supports autonomous browser automation through the BrowserAgent class in @radaros/browser. The agent uses a vision-capable LLM (GPT-4o, Gemini) to interpret screenshots of a browser and decide what actions to take — clicking, typing, scrolling, navigating — until the task is complete.

Browser agents use Playwright under the hood. After installing the package, run npx playwright install chromium to download the browser binary.

Installation

npm install @radaros/browser playwright
npx playwright install chromium

Quick Start

import { BrowserAgent } from "@radaros/browser";
import { openai } from "@radaros/core";

const browser = new BrowserAgent({
  name: "web-navigator",
  model: openai("gpt-4o"),
  startUrl: "https://www.google.com",
  maxSteps: 20,
});

const result = await browser.run(
  "Search for 'TypeScript agent framework' and tell me the first 3 results"
);

console.log(result.success); // true
console.log(result.result);  // "1. LangChain.js — ..."
console.log(result.steps.length); // number of actions taken

How It Works

Launch browser

Playwright opens a Chromium browser (headless by default) and navigates to the start URL.

Take screenshot

A PNG screenshot of the viewport is captured. If useDOM is enabled, a simplified accessibility tree is also extracted.

Send to vision model

The screenshot (and DOM tree if enabled) and task description are sent to a vision-capable LLM.

Receive action

The model returns a structured JSON action: click at coordinates, type text, scroll, navigate, etc.

Execute action

The action is executed via Playwright’s browser API.

Repeat or finish

Steps 2-5 repeat until the model returns “done” (task complete) or “fail” (task impossible), or the max step limit is reached.

BrowserAgentConfig

const agent = new BrowserAgent(config: BrowserAgentConfig);

name

string

required

Name of the browser agent.

model

ModelProvider

required

Vision-capable model. Must support image inputs (e.g., openai("gpt-4o"), google("gemini-2.5-flash")).

instructions

string

Extra instructions appended to the system prompt. Use for task-specific guidance.

maxSteps

number

default:"30"

Maximum number of vision loop iterations before the agent gives up.

headless

boolean

default:"true"

Run browser without a visible window. Set to false for debugging and demos.

viewport

{ width: number; height: number }

default:"1280x720"

Browser viewport size in pixels. The model sees screenshots at this resolution.

startUrl

string

Initial URL to navigate to before starting the task.

waitAfterAction

number

default:"1500"

Milliseconds to wait after each action for the page to settle.

maxRepeats

number

default:"3"

Max consecutive identical actions before the agent auto-fails (loop detection).

useDOM

boolean

default:"false"

Include a simplified DOM/accessibility tree alongside the screenshot. This hybrid approach gives the model both visual context and precise element coordinates for better targeting.

storageState

string

Path to a Playwright storageState JSON file. Restores cookies, localStorage, and sessionStorage from a previous session. Use this to maintain login state across runs.

recordVideo

boolean | { dir: string }

default:"false"

Enable video recording of the browser session. Pass true for the default directory (./browser-videos) or { dir: "/path" } for a custom location.

stealth

boolean | StealthConfig

default:"false"

Enable anti-bot-detection mode. Patches navigator.webdriver, spoofs plugins, languages, WebGL renderer, and more. Pass true for sensible defaults or a StealthConfig object for fine control (custom user-agent, locale, timezone, geolocation, proxy).

humanize

boolean | HumanizeConfig

default:"false"

Simulate human-like behavior — variable typing speed, jittered click coordinates, Bézier mouse movement curves, random micro-pauses. Pass true for defaults or a HumanizeConfig for fine control.

credentials

CredentialVault

Secure credential store. The LLM only sees named placeholders — real values are injected at execution time and scrubbed from all output.

costTracker

CostTracker

Track vision model token usage and enforce budgets across browser runs. Each vision loop step (screenshot → LLM → action) records its token usage. The same tracker can be shared with text and voice agents for unified cost monitoring.

logLevel

string

default:"silent"

Logging level: "debug", "info", "warn", "error", "silent".

run()

const result = await agent.run(task: string, opts?: BrowserRunOpts);

task

string

required

Natural language description of what the agent should do in the browser.

opts.startUrl

string

Override the config’s startUrl for this run.

opts.apiKey

string

Per-run API key override for the vision model.

opts.saveStorageState

string

Path to save cookies/auth state after the run completes. Load it back on the next run via storageState in config.

BrowserRunOutput

Field	Type	Description
`result`	`string`	Final text result or failure reason
`success`	`boolean`	Whether the task completed successfully
`steps`	`BrowserStep[]`	Full action history with screenshots
`finalUrl`	`string`	URL at completion
`finalScreenshot`	`Buffer`	Last screenshot (PNG)
`durationMs`	`number`	Total time taken
`videoPath`	`string?`	Video file path (if `recordVideo` was enabled)

Available Actions

The vision model can choose from these actions at each step:

Action	Parameters	Description
`click`	`x`, `y`, `description`	Click at viewport coordinates
`type`	`text`, `x?`, `y?`	Type text (optionally click a position first)
`scroll`	`direction`, `amount?`	Scroll up or down
`navigate`	`url`	Go to a specific URL
`back`	—	Go back to the previous page
`wait`	`ms`	Wait for page to load
`done`	`result`	Task is complete
`fail`	`reason`	Task cannot be completed

DOM Extraction (Hybrid Mode)

By default, the agent relies purely on vision — the model interprets screenshots to locate elements. Enabling useDOM: true adds a hybrid mode where a simplified accessibility tree is also extracted and sent alongside the screenshot.

const agent = new BrowserAgent({
  name: "hybrid-navigator",
  model: openai("gpt-4o"),
  useDOM: true, // enables hybrid vision + DOM mode
});

The DOM snapshot lists interactive elements with their center coordinates:

[640,300] input(text): "Search..."
[960,45] button: "Sign In"
[120,680] a: "Contact Us"

This helps the model target elements more precisely, especially when text is small or overlapping. You can also call extractDOM() directly on a BrowserProvider:

import { BrowserProvider } from "@radaros/browser";

const browser = new BrowserProvider();
await browser.launch();
await browser.navigate("https://example.com");

const dom = await browser.extractDOM({ maxElements: 50 });
console.log(dom);

Maintain login sessions across agent runs using Playwright’s storage state.

const agent = new BrowserAgent({
  name: "auth-agent",
  model: openai("gpt-4o"),
  storageState: "./auth-state.json", // load saved cookies
});

// First run: log in and save the state
const result = await agent.run("Log in with test@example.com", {
  saveStorageState: "./auth-state.json",
});

// Second run: starts already logged in
const result2 = await agent.run("Go to dashboard and get stats");

The storage state file includes cookies, localStorage, and sessionStorage — everything needed to resume an authenticated session.

Stealth Mode (Anti-Detection)

Many websites detect and block headless browsers. Stealth mode patches common detection vectors so the browser appears as a normal user session.

const agent = new BrowserAgent({
  name: "stealth-agent",
  model: openai("gpt-4o"),
  stealth: true,  // sensible defaults
  humanize: true,  // human-like behavior
});

What stealth patches

Vector	What it does
`navigator.webdriver`	Removed (normally `true` in automation)
`navigator.plugins`	Spoofed with realistic Chrome plugins
`navigator.languages`	Set to `["en-US", "en"]`
`navigator.permissions`	Notifications return `"prompt"` instead of `"denied"`
`window.chrome.runtime`	Stubbed to appear like a real Chrome extension API
WebGL renderer	Reports “Intel Iris OpenGL Engine” instead of “SwiftShader”
DOM markers	Removes `cdc_` and `__playwright` attributes
Chrome launch flags	`--disable-blink-features=AutomationControlled`
User-Agent	Rotated from a pool of realistic Chrome/Safari strings

Fine-grained StealthConfig

const agent = new BrowserAgent({
  name: "stealth-agent",
  model: openai("gpt-4o"),
  stealth: {
    userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...",
    locale: "de-DE",
    timezone: "Europe/Berlin",
    geolocation: { latitude: 52.52, longitude: 13.405 },
    proxy: {
      server: "http://proxy.example.com:8080",
      username: "user",
      password: "pass",
    },
  },
});

HumanizeConfig

Makes the browser behave like a real person — variable timing, imprecise clicks, curved mouse paths.

const agent = new BrowserAgent({
  name: "human-agent",
  model: openai("gpt-4o"),
  humanize: {
    typingDelay: [50, 150],   // ms per character (random in range)
    clickJitter: 4,           // ±4px random offset on clicks
    actionDelay: [300, 1000], // random pause between actions
    mouseMovement: true,      // Bézier curve mouse movement
  },
});

Option	Default	Description
`typingDelay`	`[40, 120]`	Min/max ms delay between keystrokes
`clickJitter`	`3`	Random pixel offset added to click coordinates
`actionDelay`	`[200, 800]`	Random pause after each interaction
`mouseMovement`	`true`	Simulate smoothstep mouse curves to target

Video Recording

Record the agent’s entire browser session as a video for debugging, auditing, or demos.

const agent = new BrowserAgent({
  name: "recorded-agent",
  model: openai("gpt-4o"),
  recordVideo: true, // saves to ./browser-videos/
  // or: recordVideo: { dir: "./my-recordings" }
});

const result = await agent.run("Navigate to example.com and take notes");

if (result.videoPath) {
  console.log("Video saved at:", result.videoPath);
}

Playwright generates one video file per browser page. The path is returned in result.videoPath when the run completes.

Parallel Browsing (Multi-Tab)

BrowserProvider supports multiple tabs for advanced workflows:

import { BrowserProvider } from "@radaros/browser";

const browser = new BrowserProvider();
await browser.launch();

// Navigate first tab
await browser.navigate("https://site-a.com");

// Open a second tab
const tab2 = await browser.newTab("https://site-b.com");

// Switch between tabs
await browser.switchTab(tab2);
const screenshot2 = await browser.screenshot();

await browser.switchTab("tab-0"); // back to first tab
const screenshot1 = await browser.screenshot();

// List all open tabs
const tabs = browser.listTabs();
// [{ id: "tab-0", url: "https://site-a.com", active: true },
//  { id: "tab-1", url: "https://site-b.com", active: false }]

// Close a tab
await browser.closeTab(tab2);

await browser.close();

Tab API

Method	Returns	Description
`newTab(url?)`	`string`	Open a new tab, optionally navigate
`switchTab(tabId)`	`void`	Make a tab active
`closeTab(tabId)`	`void`	Close a tab (can’t close the last one)
`listTabs()`	`TabInfo[]`	List all open tabs with URL and active status
`currentTabId`	`string`	Get the active tab’s ID

Browser Gateway (Socket.IO)

Stream browser agent execution over Socket.IO for live observation UIs, dashboards, or remote monitoring.

import express from "express";
import { createServer } from "http";
import { Server } from "socket.io";
import { BrowserAgent } from "@radaros/browser";
import { createBrowserGateway } from "@radaros/transport";
import { openai } from "@radaros/core";

const app = express();
const server = createServer(app);
const io = new Server(server, { cors: { origin: "*" } });

const browserAgent = new BrowserAgent({
  name: "web-scraper",
  model: openai("gpt-4o"),
  headless: true,
  logLevel: "info",
});

createBrowserGateway({
  agents: { scraper: browserAgent },
  io,
  // namespace: "/radaros-browser",     // default
  // streamScreenshots: true,           // default
});

server.listen(3002, () => console.log("Browser gateway on :3002"));

Client Usage

import { io } from "socket.io-client";

const socket = io("http://localhost:3002/radaros-browser");

// Start a browser task
socket.emit("browser.start", {
  agentName: "scraper",
  task: "Go to Hacker News and list the top 5 stories",
  startUrl: "https://news.ycombinator.com",
});

// Live screenshots (base64 PNG)
socket.on("browser.screenshot", ({ data, mimeType }) => {
  const img = document.getElementById("live-view");
  img.src = `data:${mimeType};base64,${data}`;
});

// Each action decided by the model
socket.on("browser.action", ({ action }) => {
  console.log("Agent decided:", action);
});

// Step-by-step progress
socket.on("browser.step", ({ index, action, pageUrl }) => {
  console.log(`Step ${index}: ${action.action} at ${pageUrl}`);
});

// Task complete
socket.on("browser.done", ({ result, success, durationMs, totalSteps }) => {
  console.log(success ? "Done!" : "Failed", result);
  console.log(`Took ${totalSteps} steps in ${durationMs}ms`);
});

// Cancel a running task
socket.emit("browser.stop");

Gateway Events

Direction	Event	Payload
Client → Server	`browser.start`	`{ agentName, task, startUrl?, apiKey? }`
Client → Server	`browser.stop`	—
Server → Client	`browser.started`	`{ agentName, task }`
Server → Client	`browser.screenshot`	`{ data: base64, mimeType }`
Server → Client	`browser.action`	`{ action }`
Server → Client	`browser.step`	`{ index, action, pageUrl, screenshot? }`
Server → Client	`browser.done`	`{ result, success, finalUrl, durationMs, totalSteps, videoPath? }`
Server → Client	`browser.error`	`{ error: string }`
Server → Client	`browser.stopped`	—

BrowserGatewayOptions

agents

Record<string, BrowserAgent>

required

Named BrowserAgent instances. Clients pick one via agentName.

Server

required

Socket.IO server instance.

namespace

string

default:"/radaros-browser"

Socket.IO namespace for the gateway.

streamScreenshots

boolean

default:"true"

Stream live screenshots to clients. Disable for bandwidth-constrained connections.

authMiddleware

(socket, next) => void

Optional authentication middleware applied to the namespace.

Security

URL Validation

The BrowserProvider validates URLs before navigation. Only http:// and https:// schemes are allowed — file://, javascript:, and data: URLs are rejected to prevent local file access and code injection.

TLS Defaults

Stealth mode defaults ignoreHTTPSErrors to false. This means TLS certificate errors are not silently bypassed unless you explicitly configure ignoreHTTPSErrors: true in the StealthConfig. This prevents man-in-the-middle attacks on production deployments.

Memory Safety

Background memory operations (memoryManager.afterRun) include .catch() handlers to prevent unhandled promise rejections from crashing the process.

Loop Detection

The agent detects when it’s stuck repeating the same action:

const agent = new BrowserAgent({
  name: "safe-agent",
  model: openai("gpt-4o"),
  maxRepeats: 3, // auto-fail after 3 identical consecutive actions
});

When the agent repeats the same action more than maxRepeats times, it stops and returns success: false with a descriptive error. This prevents infinite loops caused by popups, consent banners, or ambiguous page states.

Cost Tracking

Browser agents make repeated vision model calls (one per step), which can accumulate significant cost. Use CostTracker to monitor and limit spending:

import { BrowserAgent } from "@radaros/browser";
import { openai, CostTracker } from "@radaros/core";

const tracker = new CostTracker({
  budget: { maxCostPerRun: 2.0 },  // $2 max per browser run
});

const agent = new BrowserAgent({
  name: "web-scraper",
  model: openai("gpt-4o"),
  costTracker: tracker,
  maxSteps: 30,
});

const result = await agent.run("Search for flights from NYC to London");

const summary = tracker.getSummary();
console.log(`Browser run cost: $${summary.totalCost.toFixed(4)}`);
console.log(`Total tokens: ${summary.totalTokens.totalTokens}`);
console.log(`Steps taken: ${result.steps.length}`);

Each step’s vision model call is tracked individually, so you get per-step token granularity in the cost entries.

asTool() — Browser as an Agent Tool

The most powerful pattern: give a regular text agent the ability to browse the web.

import { Agent, openai } from "@radaros/core";
import { BrowserAgent } from "@radaros/browser";

const browser = new BrowserAgent({
  name: "browser",
  model: openai("gpt-4o"),
  headless: true,
});

const agent = new Agent({
  name: "research-assistant",
  model: openai("gpt-4o"),
  instructions: "You help with research. Use the browser tool to look things up.",
  tools: [browser.asTool()],
});

const result = await agent.run(
  "Go to Hacker News and summarize the top 5 stories"
);

The text agent decides when to use the browser and what task to give it. The BrowserAgent handles all the visual navigation autonomously and returns a text result.

browser.asTool({
  name: "browse_web",        // tool name (default)
  description: "...",        // custom description
});

Events

Browser agents emit events via EventBus:

Event	Payload	When
`browser.screenshot`	`{ data: Buffer }`	Screenshot captured
`browser.action`	`{ action }`	Action decided by model
`browser.step`	`{ index, action, pageUrl, screenshot }`	Each loop iteration
`browser.done`	`{ result, success, steps }`	Task completed
`browser.error`	`{ error: Error }`	Error occurred

browser.eventBus.on("browser.action", ({ action }) => {
  console.log("Action:", JSON.stringify(action));
});

browser.eventBus.on("browser.done", ({ result, success }) => {
  console.log(success ? "Completed" : "Failed", result);
});

Tips

Use headless: false

Set headless: false during development to watch the agent navigate in real time.

Enable useDOM

Turn on useDOM: true for pages with many small or overlapping interactive elements.

Be specific

Clear, specific task descriptions produce better results than vague ones.

Set a start URL

Always provide a startUrl when possible. Starting from a blank page wastes steps.

Record videos

Use recordVideo: true during development to replay agent sessions.

Persist auth

Use storageState + saveStorageState to avoid re-logging-in every run.

Go stealth

Use stealth: true + humanize: true to bypass bot detection on protected sites.

Secure credentials

Use CredentialVault so the LLM never sees passwords — only placeholders.

Examples

Example	Description
`examples/browser/30-browser-agent.ts`	Standalone browser agent — Hacker News search
`examples/browser/31-browser-as-tool.ts`	Browser as a tool inside a research agent

Getting Started

Agents

Memory

Skills

Handoff

Cost Tracking

Semantic Cache

Eval Framework

Compliance & Audit

Culture System

Webhooks

Capacity Planning

Observability

Voice Agents

Browser Agents

Models

Teams

Workflows

Storage

Knowledge & RAG

Toolkits

MCP (Model Context Protocol)

A2A (Agent-to-Agent)

Edge & IoT

Transport

Queue

Scheduling

Advanced Features

​Browser Agents

​Installation

​Quick Start

​How It Works

​BrowserAgentConfig

​run()

​BrowserRunOutput

​Available Actions

​DOM Extraction (Hybrid Mode)

​Cookie & Auth Persistence

​Stealth Mode (Anti-Detection)

​What stealth patches

​Fine-grained StealthConfig

​HumanizeConfig

​Video Recording

​Parallel Browsing (Multi-Tab)

​Tab API

​Browser Gateway (Socket.IO)

​Client Usage

​Gateway Events

​BrowserGatewayOptions

​Security

​URL Validation

​TLS Defaults

​Memory Safety

​Loop Detection

​Cost Tracking

​asTool() — Browser as an Agent Tool

​Events

​Tips

Use headless: false

Enable useDOM

Be specific

Set a start URL

Record videos

Persist auth

Go stealth

Secure credentials

​Examples

Browser Agents

Installation

Quick Start

How It Works

BrowserAgentConfig

run()

BrowserRunOutput

Available Actions

DOM Extraction (Hybrid Mode)

Cookie & Auth Persistence

Stealth Mode (Anti-Detection)

What stealth patches

Fine-grained StealthConfig

HumanizeConfig

Video Recording

Parallel Browsing (Multi-Tab)

Tab API

Browser Gateway (Socket.IO)

Client Usage

Gateway Events

BrowserGatewayOptions

Security

URL Validation

TLS Defaults

Memory Safety

Loop Detection

Cost Tracking

asTool() — Browser as an Agent Tool

Events

Tips

Examples