Multi-Modal Input
RadarOS agents accept not only text but also images, audio, and files. Use theMessageContent type and ContentPart[] to send multi-modal input to vision and audio-capable models.
MessageContent Type
Input toagent.run() or agent.stream() can be:
- string — Plain text (most common)
- ContentPart[] — Array of text, image, audio, or file parts
ContentPart Types
TextPart
{ type: "text", text: string }ImagePart
{ type: "image", data: string, mimeType? }AudioPart
{ type: "audio", data: string, mimeType? }FilePart
{ type: "file", data: string, mimeType, filename? }Image Input
Images can be provided as base64 or URL:mimeType values: image/png, image/jpeg, image/gif, image/webp.
Audio Input
Audio is provided as base64-encoded data:mimeType values: audio/mp3, audio/wav, audio/ogg, audio/webm.
File Input
Generic files (PDFs, documents, etc.) useFilePart:
data can be a URL or base64-encoded content.
Example: Vision Agent Analyzing an Image
Example: Audio Analysis with Gemini
Provider Support Matrix
Not all providers support all content types. When an unsupported type is passed, the provider logs a warning and either skips the content or substitutes a placeholder.| Content Type | OpenAI | Anthropic | Google/Vertex | AWS Claude | AWS Bedrock | Azure OpenAI | Azure Foundry | Ollama |
|---|---|---|---|---|---|---|---|---|
| Image (URL) | Yes | Yes | Yes | Yes | No | Yes | Model-dependent | No |
| Image (base64) | Yes | Yes | Yes | Yes | Yes* | Yes | Model-dependent | Yes |
| Audio (base64) | Yes | No | Yes | No | No | Yes | No | No |
| File (URL) | Yes | Yes | Yes | Yes | No | Yes | No | No |
| File (base64) | Yes | Yes | Yes | Yes | Yes* | Yes | No | No |
- Ollama image support requires a vision-capable model (e.g.,
llava,bakllava,llama3.2-vision). - AWS Bedrock multi-modal support (*) depends on the specific model. Amazon Nova supports images; document support varies by model.
- AWS Claude supports the same multi-modal features as the direct Anthropic provider.
- Azure OpenAI supports the same multi-modal features as the direct OpenAI provider.
- Azure AI Foundry vision support depends on the model (e.g.,
Phi-3.5-vision-instructsupports images).
Reading CSV Data
CSV files can be sent to Anthropic and OpenAI as file input. The model reads and analyzes the data directly:Analyzing PDFs
PDF documents can be sent via URL (no download needed) or base64:XLSX and Binary Formats
Most providers cannot process Excel (.xlsx) files directly. Google Gemini is the exception — it handles XLSX natively via inlineData.
For other providers, convert to CSV first:
Multi-Modal via HTTP File Upload
When exposing agents via Express, you can accept file uploads and convert them toContentPart[]. The transport layer provides buildMultiModalInput for this:
See File Upload for how to handle multipart/form-data and build multi-modal input from uploaded files.