PageIndex
Reasoning-based RAG for complex, long-form documents. Unlike vector search, PageIndex builds a hierarchical tree index and uses LLM reasoning to navigate it — delivering significantly better accuracy on financial reports, legal filings, technical manuals, and research papers.Uses the PageIndex cloud API — no vector database or embedding pipeline needed.
Quick Start
Config
PageIndex API key. Falls back to
PAGEINDEX_API_KEY env var. Get yours at dash.pageindex.ai.API base URL. Override for self-hosted PageIndex deployments.
Request timeout in milliseconds. PDF processing can take time — the default is 2 minutes.
Max response characters returned per tool call.
Tools
| Tool | Description |
|---|---|
pageindex_submit | Submit a PDF document for tree indexing. Returns a doc_id for subsequent operations. |
pageindex_status | Check document processing status — returns tree structure when complete. |
pageindex_tree | Get the hierarchical tree structure of a processed document (semantic table of contents). |
pageindex_list | List all documents with IDs, names, statuses, and page counts. |
pageindex_chat | Ask questions about documents using reasoning-based RAG with optional citations. |
pageindex_retrieve | Retrieve specific sections from a document using tree-based search. |
pageindex_delete | Delete a document and all associated data. |
How It Works
PageIndex takes a fundamentally different approach from traditional vector RAG:- Tree Indexing — Documents are parsed into a hierarchical tree of sections, subsections, and paragraphs with summaries at each level
- LLM Tree Search — At query time, an LLM navigates the tree from root to relevant leaves, using reasoning instead of embedding similarity
- No Vectors Needed — No embedding model, no vector database, no chunking strategy to tune
Use Cases
Document Q&A
Multi-Document Analysis
Structured Extraction
Environment Variables
| Variable | Description |
|---|---|
PAGEINDEX_API_KEY | PageIndex API key from dash.pageindex.ai |