Skip to main content

Web Scraper

Extract text content and links from any web page. Uses native fetch and lightweight HTML stripping — no browser or heavy dependencies needed.

Quick Start

import { Agent, openai, ScraperToolkit } from "@radaros/core";

const scraper = new ScraperToolkit({ maxLength: 10_000 });

const agent = new Agent({
  name: "reader",
  model: openai("gpt-4o"),
  instructions: "Read web pages and summarize their content.",
  tools: [...scraper.getTools()],
});

const result = await agent.run("Summarize the content of https://radaros.dev");

Config

maxLength
number
default:"15000"
Max characters of extracted text to return.
userAgent
string
Custom User-Agent header for requests.
timeout
number
default:"15000"
Request timeout in milliseconds.

Tools

ToolDescription
scrape_urlFetch a URL and extract text content. Scripts, styles, nav, and footer are stripped.
scrape_linksExtract all links from a page. Returns link text and absolute URLs.

Tool Usage Examples

Scrape and Summarize

const result = await agent.run(
  "Read https://radaros.dev/docs/getting-started and give me a quick summary"
);

// The agent calls scrape_url with:
// { url: "https://radaros.dev/docs/getting-started" }
//
// Returns extracted text (HTML stripped, max 15000 chars):
// "Getting Started\n\nRadarOS is an open-source framework for building AI agents..."
//
// The agent then summarizes the content for the user
const result = await agent.run(
  "What documentation pages are linked from https://radaros.dev/docs?"
);

// The agent calls scrape_links with:
// { url: "https://radaros.dev/docs" }
//
// Returns:
// [
//   { text: "Getting Started", url: "https://radaros.dev/docs/getting-started" },
//   { text: "Agents", url: "https://radaros.dev/docs/agents/overview" },
//   { text: "Tools", url: "https://radaros.dev/docs/agents/tools" },
//   ...
// ]

Research Agent

Combine the scraper with other toolkits for a research agent:
import { Agent, openai, ScraperToolkit, HttpToolkit } from "@radaros/core";

const agent = new Agent({
  name: "researcher",
  model: openai("gpt-4o"),
  tools: [
    ...new ScraperToolkit({ maxLength: 15_000 }).getTools(),
    ...new HttpToolkit({ baseUrl: "https://api.example.com" }).getTools(),
  ],
  instructions: `You are a research assistant. Use the scraper to read web pages
and the HTTP toolkit to query APIs. Synthesize information from multiple sources.`,
  toolResultLimit: { maxChars: 20_000, strategy: "summarize", model: openai("gpt-4o-mini") },
});

const result = await agent.run(
  "Research the latest trends in AI agents and summarize the key findings"
);