Multi-Modal Input
RadarOS agents accept not only text but also images, audio, and files. Use theMessageContent type and ContentPart[] to send multi-modal input to vision and audio-capable models.
MessageContent Type
Input toagent.run() or agent.stream() can be:
- string — Plain text (most common)
- ContentPart[] — Array of text, image, audio, or file parts
ContentPart Types
TextPart
{ type: "text", text: string }ImagePart
{ type: "image", data: string, mimeType? }AudioPart
{ type: "audio", data: string, mimeType? }FilePart
{ type: "file", data: string, mimeType, filename? }Image Input
Images can be provided as base64 or URL:mimeType values: image/png, image/jpeg, image/gif, image/webp.
Audio Input
Audio is provided as base64-encoded data:mimeType values: audio/mp3, audio/wav, audio/ogg, audio/webm.
File Input
Generic files (PDFs, documents, etc.) useFilePart:
data can be a URL or base64-encoded content.
Example: Vision Agent Analyzing an Image
Example: Audio Analysis with Gemini
Multi-Modal via HTTP File Upload
When exposing agents via Express, you can accept file uploads and convert them toContentPart[]. The transport layer provides buildMultiModalInput for this:
See File Upload for how to handle multipart/form-data and build multi-modal input from uploaded files.