Retrieval-Augmented Generation — upload documents, ask questions in natural language.

AI RAG Plugin

The AI RAG (Retrieval-Augmented Generation) plugin lets you upload documents, automatically chunk and embed them, then ask questions in natural language. The LLM answers based on your actual data — no hallucinations.

Requires: AI Core plugin + AI Vectors plugin

Installation

cp -r ai_rag/ plugins/ai_rag/
cp -r ai_vectors/ plugins/ai_vectors/
cp -r ai_core/ plugins/ai_core/

How It Works

Ingestion

You send document text to /ingest, or upload a file to /upload
For file uploads, text is automatically extracted (supports .txt, .md, .html, .csv, .json, .pdf)
The text is split into overlapping chunks (sentence-aware)
Each chunk is embedded via the AI provider
Chunks and embeddings are stored in the vector database

Querying

You ask a question via the /ask endpoint
The question is embedded
The most relevant chunks are found via cosine similarity
The chunks are sent to the LLM as context
The LLM generates an answer grounded in your documents

API Endpoints

All endpoints are mounted at /api/v1/plugins/ai/rag/.

POST /ai/rag/ingest

Ingest a document — chunk it, embed it, store it.

curl -X POST http://localhost:8000/api/v1/plugins/ai/rag/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "knowledge-base",
    "text": "FastCMS is a Backend-as-a-Service built with FastAPI. It provides authentication, collections, file storage, real-time features, and more. The plugin system allows extending functionality...",
    "source": "readme.md",
    "chunk_size": 500,
    "chunk_overlap": 50
  }'

Response:

{
  "document_id": "doc-uuid",
  "chunks": 12,
  "collection": "knowledge-base",
  "source": "readme.md"
}

Parameters:

Parameter	Default	Description
`collection`	required	Collection name for organizing documents
`text`	required	The document text to ingest
`source`	`""`	Source identifier (filename, URL, etc.)
`chunk_size`	`500`	Target characters per chunk (100–5000)
`chunk_overlap`	`50`	Characters of overlap between chunks (0–500)

POST /ai/rag/upload

Upload a file and ingest it into the RAG pipeline. The file content is automatically extracted based on format.

curl -X POST http://localhost:8000/api/v1/plugins/ai/rag/upload \
  -F "file=@readme.md" \
  -F "collection=knowledge-base" \
  -F "chunk_size=500" \
  -F "chunk_overlap=50"

Response (same as /ingest):

{
  "document_id": "doc-uuid",
  "chunks": 8,
  "collection": "knowledge-base",
  "source": "readme.md"
}

Parameters (multipart form):

Parameter	Default	Description
`file`	required	File to upload (max 10MB)
`collection`	`"knowledge-base"`	Collection name
`chunk_size`	`500`	Target characters per chunk (100–5000)
`chunk_overlap`	`50`	Characters of overlap (0–500)

Supported file formats:

Extension	Extraction Method
`.txt`, `.text`, `.log`	Plain text (as-is)
`.md`, `.markdown`	Markdown → plain text (strips headers, bold, links, code blocks)
`.html`, `.htm`	HTML → plain text (strips tags, ignores script/style)
`.csv`	Rows converted to "header: value" readable text
`.json`	Pretty-printed JSON
`.pdf`	Text extraction via PyPDF2 (requires `uv pip install PyPDF2`)

POST /ai/rag/ask

Ask a question against ingested documents.

curl -X POST http://localhost:8000/api/v1/plugins/ai/rag/ask \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "knowledge-base",
    "question": "How does authentication work in FastCMS?",
    "limit": 5
  }'

Response:

{
  "answer": "FastCMS uses JWT-based authentication with bcrypt password hashing. It supports passwordless OTP login, email change flows, session management, and account lockout after failed attempts.",
  "sources": [
    {
      "text": "FastCMS provides JWT authentication with bcrypt...",
      "score": 0.9231,
      "record_id": "doc-uuid:3",
      "metadata": {"source": "readme.md", "chunk_index": 3}
    }
  ],
  "model": "gpt-4o-mini"
}

DELETE /ai/rag/collection/{name}

Delete all ingested documents for a collection.

Chunking Strategy

The chunker splits text intelligently:

Sentence-aware: Breaks at sentence boundaries (. ! ?), not mid-word
Overlapping: Configurable overlap ensures context isn't lost at chunk boundaries
Long sentences: Sentences exceeding chunk_size are split at word boundaries

Tuning Chunk Size

Use Case	chunk_size	chunk_overlap	Why
Short FAQ docs	200	20	Small, precise answers
General docs	500	50	Good balance (default)
Technical docs	1000	100	More context per chunk
Legal/dense text	300	50	Precise retrieval needed

Example: Build a Knowledge Base

import httpx
from pathlib import Path

client = httpx.Client(base_url="http://localhost:8000")

# Option A: Upload files directly
for path in Path("docs/").glob("*.md"):
    with open(path, "rb") as f:
        client.post("/api/v1/plugins/ai/rag/upload", files={
            "file": (path.name, f, "text/markdown"),
        }, data={"collection": "docs"})

# Option B: Ingest text programmatically
client.post("/api/v1/plugins/ai/rag/ingest", json={
    "collection": "docs",
    "text": "FastCMS supports JWT auth, OAuth, and OTP login...",
    "source": "auth-notes.txt",
})

# Ask questions
response = client.post("/api/v1/plugins/ai/rag/ask", json={
    "collection": "docs",
    "question": "How do I configure webhooks?",
})
print(response.json()["answer"])

Admin UI

The AI Playground page (/admin/ai → RAG tab) provides a visual interface for ingestion and querying:

Paste Text mode — paste document text directly
Upload File mode — upload a file (.txt, .md, .html, .csv, .json, .pdf) via the file picker
Ask a Question — query your ingested documents
Vector Store — see collection stats

Dependencies

Required: AI Core plugin, AI Vectors plugin
Optional: uv pip install PyPDF2 — for PDF file upload support
No other additional packages — all text extractors use Python stdlib

AI RAG Plugin

On this page