Multi-Modal Input

RadarOS agents accept not only text but also images, audio, and files. Use the MessageContent type and ContentPart[] to send multi-modal input to vision and audio-capable models.

MessageContent Type

Input to agent.run() or agent.stream() can be:

type MessageContent = string | ContentPart[];

string — Plain text (most common)
ContentPart[] — Array of text, image, audio, or file parts

ContentPart Types

TextPart

{ type: "text", text: string }

ImagePart

{ type: "image", data: string, mimeType? }

AudioPart

{ type: "audio", data: string, mimeType? }

FilePart

{ type: "file", data: string, mimeType, filename? }

Image Input

Images can be provided as base64 or URL:

import { Agent, openai, type ContentPart } from "@radaros/core";

const agent = new Agent({
  name: "VisionAgent",
  model: openai("gpt-4o"),
  instructions: "Describe and analyze images in detail.",
});

// Image via URL
const input: ContentPart[] = [
  { type: "text", text: "What's in this image?" },
  {
    type: "image",
    data: "https://example.com/image.png",
    mimeType: "image/png",
  },
];

// Image via base64
const base64Image = "data:image/png;base64,iVBORw0KGgo...";
const inputBase64: ContentPart[] = [
  { type: "text", text: "Analyze this." },
  { type: "image", data: base64Image, mimeType: "image/png" },
];

const result = await agent.run(input);

Supported mimeType values: image/png, image/jpeg, image/gif, image/webp.

Audio Input

Audio is provided as base64-encoded data:

import { Agent, google, type ContentPart } from "@radaros/core";
import { readFileSync } from "node:fs";

const agent = new Agent({
  name: "AudioAnalyzer",
  model: google("gemini-2.5-flash"),
  instructions: "Transcribe and analyze audio content.",
  structuredOutput: AudioAnalysisSchema,
});

const audioData = readFileSync("sample.mp3");
const base64Audio = audioData.toString("base64");

const result = await agent.run([
  { type: "text", text: "Transcribe and summarize this audio." },
  { type: "audio", data: base64Audio, mimeType: "audio/mp3" },
] as ContentPart[]);

Supported mimeType values: audio/mp3, audio/wav, audio/ogg, audio/webm.

File Input

Generic files (PDFs, documents, etc.) use FilePart:

const input: ContentPart[] = [
  { type: "text", text: "Summarize this document." },
  {
    type: "file",
    data: "https://example.com/doc.pdf",
    mimeType: "application/pdf",
    filename: "document.pdf",
  },
];

data can be a URL or base64-encoded content.

Example: Vision Agent Analyzing an Image

import { Agent, openai, type ContentPart } from "@radaros/core";
import { z } from "zod";

const ImageAnalysis = z.object({
  description: z.string().describe("Detailed description of the image"),
  objects: z.array(z.string()).describe("Objects detected"),
  dominantColors: z.array(z.string()).describe("Dominant colors"),
  mood: z.string().describe("Overall mood"),
});

const analyzer = new Agent({
  name: "ImageAnalyzer",
  model: openai("gpt-4o"),
  instructions: "Analyze images and return structured JSON.",
  structuredOutput: ImageAnalysis,
});

const multiModalInput: ContentPart[] = [
  { type: "text", text: "Analyze this image in detail." },
  {
    type: "image",
    data: "https://upload.wikimedia.org/wikipedia/commons/4/47/PNG_transparency_demonstration_1.png",
    mimeType: "image/png",
  },
];

const result = await analyzer.run(multiModalInput);
console.log(result.structured);

Example: Audio Analysis with Gemini

import { Agent, google, type ContentPart } from "@radaros/core";
import { readFileSync } from "node:fs";
import { z } from "zod";

const AudioAnalysis = z.object({
  transcription: z.string(),
  language: z.string(),
  speakerCount: z.number(),
  summary: z.string(),
  mood: z.string(),
  topics: z.array(z.string()),
});

const agent = new Agent({
  name: "AudioAnalyzer",
  model: google("gemini-2.5-flash"),
  instructions: "Analyze audio: transcribe, detect language, summarize.",
  structuredOutput: AudioAnalysis,
});

const audioData = readFileSync("audio/sample.mp3");
const base64Audio = audioData.toString("base64");

const result = await agent.run([
  { type: "text", text: "Analyze this audio clip in detail." },
  { type: "audio", data: base64Audio, mimeType: "audio/mp3" },
] as ContentPart[]);

console.log(result.structured);

Provider Support Matrix

Not all providers support all content types. When an unsupported type is passed, the provider logs a warning and either skips the content or substitutes a placeholder.

Content Type	OpenAI	Anthropic	Google/Vertex	AWS Claude	AWS Bedrock	Azure OpenAI	Azure Foundry	Ollama
Image (URL)	Yes	Yes	Yes	Yes	No	Yes	Model-dependent	No
Image (base64)	Yes	Yes	Yes	Yes	Yes*	Yes	Model-dependent	Yes
Audio (base64)	Yes	No	Yes	No	No	Yes	No	No
File (URL)	Yes	Yes	Yes	Yes	No	Yes	No	No
File (base64)	Yes	Yes	Yes	Yes	Yes*	Yes	No	No

Ollama image support requires a vision-capable model (e.g., llava, bakllava, llama3.2-vision).
AWS Bedrock multi-modal support (*) depends on the specific model. Amazon Nova supports images; document support varies by model.
AWS Claude supports the same multi-modal features as the direct Anthropic provider.
Azure OpenAI supports the same multi-modal features as the direct OpenAI provider.
Azure AI Foundry vision support depends on the model (e.g., Phi-3.5-vision-instruct supports images).

Reading CSV Data

CSV files can be sent to Anthropic and OpenAI as file input. The model reads and analyzes the data directly:

import { Agent, anthropic, type ContentPart } from "@radaros/core";
import { readFileSync } from "node:fs";

const agent = new Agent({
  name: "DataAnalyst",
  model: anthropic("claude-sonnet-4-6"),
  instructions: "Analyze data files. Provide insights with specific numbers.",
});

// From a local CSV file
const csvData = readFileSync("sales-data.csv").toString("base64");
const result = await agent.run([
  { type: "text", text: "Analyze this sales data. What are the top 3 products by revenue?" },
  { type: "file", data: csvData, mimeType: "text/csv", filename: "sales-data.csv" },
] as ContentPart[]);

console.log(result.text);
// "Based on the sales data, the top 3 products by revenue are:
//  1. Widget Pro - $142,500 (1,425 units)
//  2. Gadget Plus - $98,200 (982 units)
//  3. Tool Basic - $67,800 (2,260 units)"

Analyzing PDFs

PDF documents can be sent via URL (no download needed) or base64:

import { Agent, anthropic, type ContentPart } from "@radaros/core";

const agent = new Agent({
  name: "DocumentReader",
  model: anthropic("claude-sonnet-4-6"),
  instructions: "Extract key information from documents. Be thorough but concise.",
});

// PDF via URL — Anthropic fetches it directly
const result = await agent.run([
  { type: "text", text: "Summarize the key findings in this research paper." },
  {
    type: "file",
    data: "https://example.com/research-paper.pdf",
    mimeType: "application/pdf",
    filename: "paper.pdf",
  },
] as ContentPart[]);

// PDF via base64
import { readFileSync } from "node:fs";
const pdfData = readFileSync("contract.pdf").toString("base64");

const contractResult = await agent.run([
  { type: "text", text: "What are the payment terms and termination clauses?" },
  { type: "file", data: pdfData, mimeType: "application/pdf", filename: "contract.pdf" },
] as ContentPart[]);

XLSX and Binary Formats

Most providers cannot process Excel (.xlsx) files directly. Google Gemini is the exception — it handles XLSX natively via inlineData. For other providers, convert to CSV first:

import { parse } from "xlsx"; // npm install xlsx

function xlsxToCsv(filePath: string): string {
  const workbook = parse(readFileSync(filePath));
  const sheet = workbook.Sheets[workbook.SheetNames[0]];
  return XLSX.utils.sheet_to_csv(sheet);
}

const csvContent = xlsxToCsv("report.xlsx");
const csvBase64 = Buffer.from(csvContent).toString("base64");

const result = await agent.run([
  { type: "text", text: "Analyze this spreadsheet data." },
  { type: "file", data: csvBase64, mimeType: "text/csv", filename: "report.csv" },
] as ContentPart[]);

When exposing agents via Express, you can accept file uploads and convert them to ContentPart[]. The transport layer provides buildMultiModalInput for this: See File Upload for how to handle multipart/form-data and build multi-modal input from uploaded files.

Getting Started

Agents

Memory

Skills

Handoff

Cost Tracking

Semantic Cache

Eval Framework

Compliance & Audit

Culture System

Webhooks

Capacity Planning

Observability

Voice Agents

Browser Agents

Models

Teams

Workflows

Storage

Knowledge & RAG

Toolkits

MCP (Model Context Protocol)

A2A (Agent-to-Agent)

Edge & IoT

Transport

Queue

Scheduling

Advanced Features

​Multi-Modal Input

​MessageContent Type

​ContentPart Types

TextPart

ImagePart

AudioPart

FilePart

​Image Input

​Audio Input

​File Input

​Example: Vision Agent Analyzing an Image

​Example: Audio Analysis with Gemini

​Provider Support Matrix

​Reading CSV Data

​Analyzing PDFs

​XLSX and Binary Formats

​Multi-Modal via HTTP File Upload