AI RAG Plugin
Retrieval-Augmented Generation — upload documents, ask questions in natural language.
AI RAG Plugin
The AI RAG (Retrieval-Augmented Generation) plugin lets you upload documents, automatically chunk and embed them, then ask questions in natural language. The LLM answers based on your actual data — no hallucinations.
Requires: AI Core plugin + AI Vectors plugin
Installation
cp -r ai_rag/ plugins/ai_rag/
cp -r ai_vectors/ plugins/ai_vectors/
cp -r ai_core/ plugins/ai_core/How It Works
Ingestion
- You send document text to
/ingest, or upload a file to/upload - For file uploads, text is automatically extracted (supports .txt, .md, .html, .csv, .json, .pdf)
- The text is split into overlapping chunks (sentence-aware)
- Each chunk is embedded via the AI provider
- Chunks and embeddings are stored in the vector database
Querying
- You ask a question via the
/askendpoint - The question is embedded
- The most relevant chunks are found via cosine similarity
- The chunks are sent to the LLM as context
- The LLM generates an answer grounded in your documents
API Endpoints
All endpoints are mounted at /api/v1/plugins/ai/rag/.
POST /ai/rag/ingest
Ingest a document — chunk it, embed it, store it.
curl -X POST http://localhost:8000/api/v1/plugins/ai/rag/ingest \
-H "Content-Type: application/json" \
-d '{
"collection": "knowledge-base",
"text": "FastCMS is a Backend-as-a-Service built with FastAPI. It provides authentication, collections, file storage, real-time features, and more. The plugin system allows extending functionality...",
"source": "readme.md",
"chunk_size": 500,
"chunk_overlap": 50
}'Response:
{
"document_id": "doc-uuid",
"chunks": 12,
"collection": "knowledge-base",
"source": "readme.md"
}Parameters:
| Parameter | Default | Description |
|---|---|---|
collection | required | Collection name for organizing documents |
text | required | The document text to ingest |
source | "" | Source identifier (filename, URL, etc.) |
chunk_size | 500 | Target characters per chunk (100–5000) |
chunk_overlap | 50 | Characters of overlap between chunks (0–500) |
POST /ai/rag/upload
Upload a file and ingest it into the RAG pipeline. The file content is automatically extracted based on format.
curl -X POST http://localhost:8000/api/v1/plugins/ai/rag/upload \
-F "file=@readme.md" \
-F "collection=knowledge-base" \
-F "chunk_size=500" \
-F "chunk_overlap=50"Response (same as /ingest):
{
"document_id": "doc-uuid",
"chunks": 8,
"collection": "knowledge-base",
"source": "readme.md"
}Parameters (multipart form):
| Parameter | Default | Description |
|---|---|---|
file | required | File to upload (max 10MB) |
collection | "knowledge-base" | Collection name |
chunk_size | 500 | Target characters per chunk (100–5000) |
chunk_overlap | 50 | Characters of overlap (0–500) |
Supported file formats:
| Extension | Extraction Method |
|---|---|
.txt, .text, .log | Plain text (as-is) |
.md, .markdown | Markdown → plain text (strips headers, bold, links, code blocks) |
.html, .htm | HTML → plain text (strips tags, ignores script/style) |
.csv | Rows converted to "header: value" readable text |
.json | Pretty-printed JSON |
.pdf | Text extraction via PyPDF2 (requires pip install PyPDF2) |
POST /ai/rag/ask
Ask a question against ingested documents.
curl -X POST http://localhost:8000/api/v1/plugins/ai/rag/ask \
-H "Content-Type: application/json" \
-d '{
"collection": "knowledge-base",
"question": "How does authentication work in FastCMS?",
"limit": 5
}'Response:
{
"answer": "FastCMS uses JWT-based authentication with bcrypt password hashing. It supports passwordless OTP login, email change flows, session management, and account lockout after failed attempts.",
"sources": [
{
"text": "FastCMS provides JWT authentication with bcrypt...",
"score": 0.9231,
"record_id": "doc-uuid:3",
"metadata": {"source": "readme.md", "chunk_index": 3}
}
],
"model": "gpt-4o-mini"
}DELETE /ai/rag/collection/{name}
Delete all ingested documents for a collection.
Chunking Strategy
The chunker splits text intelligently:
- Sentence-aware: Breaks at sentence boundaries (
.!?), not mid-word - Overlapping: Configurable overlap ensures context isn't lost at chunk boundaries
- Long sentences: Sentences exceeding
chunk_sizeare split at word boundaries
Tuning Chunk Size
| Use Case | chunk_size | chunk_overlap | Why |
|---|---|---|---|
| Short FAQ docs | 200 | 20 | Small, precise answers |
| General docs | 500 | 50 | Good balance (default) |
| Technical docs | 1000 | 100 | More context per chunk |
| Legal/dense text | 300 | 50 | Precise retrieval needed |
Example: Build a Knowledge Base
import httpx
from pathlib import Path
client = httpx.Client(base_url="http://localhost:8000")
# Option A: Upload files directly
for path in Path("docs/").glob("*.md"):
with open(path, "rb") as f:
client.post("/api/v1/plugins/ai/rag/upload", files={
"file": (path.name, f, "text/markdown"),
}, data={"collection": "docs"})
# Option B: Ingest text programmatically
client.post("/api/v1/plugins/ai/rag/ingest", json={
"collection": "docs",
"text": "FastCMS supports JWT auth, OAuth, and OTP login...",
"source": "auth-notes.txt",
})
# Ask questions
response = client.post("/api/v1/plugins/ai/rag/ask", json={
"collection": "docs",
"question": "How do I configure webhooks?",
})
print(response.json()["answer"])Admin UI
The AI Playground page (/admin/ai → RAG tab) provides a visual interface for ingestion and querying:
- Paste Text mode — paste document text directly
- Upload File mode — upload a file (.txt, .md, .html, .csv, .json, .pdf) via the file picker
- Ask a Question — query your ingested documents
- Vector Store — see collection stats
Dependencies
- Required: AI Core plugin, AI Vectors plugin
- Optional:
pip install PyPDF2— for PDF file upload support - No other additional pip packages — all text extractors use Python stdlib