Knowledge Base & RAG Guide

Give your voice agents domain-specific knowledge without retraining the underlying model.

1. Overview — What Is RAG?

RAG stands for Retrieval-Augmented Generation. It is a technique that lets an AI model answer questions using external documents you provide, rather than relying solely on its built-in training data.

For voice agents this means you can give your agent accurate, up-to-date information about your company’s products, pricing, policies, and procedures. The agent will reference this material during live calls and provide correct, specific answers to caller questions — without you needing to fine-tune or retrain any model.

How It Works at a High Level

Caller asks a question | v +-----------+ +------------------+ +-----------+ | Caller's | ----> | Embed query + | ----> | Top 3 | | question | | search KB chunks | | chunks | +-----------+ +------------------+ +-----------+ | v +------------------+ | System prompt + | | KB context + | | conversation | +------------------+ | v +------------------+ | LLM generates | | informed answer | +------------------+
  1. You upload documents into a Knowledge Base.
  2. Each document is split into small chunks and converted into numerical vectors (embeddings).
  3. When a caller asks a question, the system converts the question into a vector and finds the most relevant chunks using cosine similarity.
  4. Those chunks are injected into the LLM system prompt as reference material.
  5. The LLM generates a response informed by both the conversation and your knowledge base.

2. Creating a Knowledge Base

Via the API

curl -X POST https://your-domain.com/api/knowledge \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product FAQ",
    "description": "Frequently asked questions about our products"
  }'

Response:

{
  "knowledge_base": {
    "id": "a1b2c3d4-...",
    "tenant_id": "...",
    "name": "Product FAQ",
    "description": "Frequently asked questions about our products",
    "created_at": "2026-02-22T12:00:00.000Z"
  }
}

Via the Dashboard

  1. Navigate to Knowledge Bases in the sidebar.
  2. Click Create Knowledge Base.
  3. Enter a name and optional description.
  4. Click Save.

Naming Best Practices

3. Uploading Documents

Supported Formats

FormatContent TypeNotes
Plain texttext/plain.txt files
Markdowntext/markdown.md files
PDFapplication/pdfText extracted via pdf-parse
HTMLtext/htmlTags stripped, text preserved
JSONapplication/jsonPretty-printed then chunked
CSVtext/csvTreated as raw text

Maximum file size: 10 MB per upload.

Three Upload Methods

1. File upload (multipart form) — ideal for the dashboard file picker:

curl -X POST https://your-domain.com/api/knowledge/KB_ID/upload-file \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@product-faq.pdf"

2. Raw body upload — pass file bytes directly with a filename header:

curl -X POST https://your-domain.com/api/knowledge/KB_ID/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: text/plain" \
  -H "X-Filename: faq.txt" \
  --data-binary @faq.txt

3. Paste text directly — no file needed:

curl -X POST https://your-domain.com/api/knowledge/KB_ID/text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Return Policy",
    "content": "Our return policy allows returns within 30 days..."
  }'

4. Scrape a URL — fetch a web page and index its text:

curl -X POST https://your-domain.com/api/knowledge/KB_ID/url \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com/pricing" }'

What Happens During Processing

  1. Extract — Raw text is extracted from the uploaded file.
  2. Chunk — The text is split into overlapping chunks of ~500 tokens each.
  3. Embed — Each chunk is sent to the OpenAI text-embedding-3-small model to generate a 1536-dimension vector.
  4. Store — Chunks, their text search vectors, and embedding vectors are saved to the knowledge_chunks table.

Processing happens in the background. The upload endpoint returns immediately with a document record whose status is "processing". Once complete, the status changes to "ready".

Tip: You can check a document's processing status by fetching the knowledge base detail endpoint: GET /api/knowledge/KB_ID. Each document in the response includes its status and chunk_count.

4. How Chunking Works

Documents are split into chunks using a sliding window approach with the following parameters:

ParameterValueDescription
Chunk size500 tokens (~2,000 characters)Maximum number of tokens per chunk
Chunk overlap50 tokensTokens shared between consecutive chunks

Why Overlap Matters

Without overlap, a sentence that falls on a chunk boundary gets split across two chunks. Neither chunk has the full context, which hurts retrieval accuracy. The 50-token overlap ensures that content near boundaries appears in both chunks.

Document: [-------- Chunk 1 --------][-- overlap --][-------- Chunk 2 --------] ^ ^ These 50 tokens appear in BOTH chunks

How Many Chunks Will My Document Produce?

chunks = ceil( total_words / (500 - 50) )
       = ceil( total_words / 450 )
Document SizeApproximate WordsApproximate Chunks
1 page (~500 words)5002
5 pages (~2,500 words)2,5006
20 pages (~10,000 words)10,00023
100 pages (~50,000 words)50,000112
Tip: The chunking algorithm treats tokens as whitespace-delimited words. This is an approximation — actual OpenAI token counts may differ slightly, but it works well in practice.

5. How Retrieval Works

When a caller speaks during a call, the system searches attached knowledge bases before every LLM call to find relevant context. The search uses a hybrid approach combining two strategies.

Hybrid Search (Default)

ComponentWeightHow It Works
Semantic similarity70%The caller's utterance is embedded using text-embedding-3-small. The resulting vector is compared against all chunk embeddings using cosine distance.
Keyword matching30%PostgreSQL full-text search ranks chunks by keyword relevance using ts_rank.

The combined score determines ranking. The top 5 chunks (configurable via top_k) are returned.

Keyword-Only Fallback

If pgvector is not available or no embedding API key is configured, the system falls back to keyword-only search. If that also fails, a final ILIKE fallback performs a simple substring match.

Injection into the System Prompt

REFERENCE MATERIAL (from your knowledge base -- use this to answer questions):

[Source: product-faq.txt]
Q: What is your return policy?
A: We accept returns within 30 days of purchase...

[Source: pricing.txt]
Our Pro plan costs $49/month and includes...

Use the reference material above to inform your responses when relevant.
If the material doesn't cover what was asked, say you're not sure.

The system retrieves the top 3 chunks during calls. Each chunk includes its source filename for traceability.

Important: Knowledge search runs on every caller utterance. Chunks are not cached across turns — each turn searches fresh so the most relevant context is always used.

6. Attaching Knowledge Bases to Agents

Link a Knowledge Base to an Agent

curl -X POST https://your-domain.com/api/knowledge/KB_ID/link \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "AGENT_UUID" }'

Unlink a Knowledge Base from an Agent

curl -X POST https://your-domain.com/api/knowledge/KB_ID/unlink \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "AGENT_UUID" }'

Multiple Knowledge Bases per Agent

You can attach multiple knowledge bases to a single agent. When the agent searches for context, it queries all attached KBs simultaneously and returns the top-scoring chunks across all of them.

The link is stored in the agent_knowledge_bases join table with a composite primary key of (agent_id, knowledge_base_id). Duplicate links are silently ignored.

How Chunks Are Selected During Calls

  1. When a call starts, the system loads the agent's linked knowledge base IDs.
  2. On each conversational turn, the caller's utterance is used as the search query.
  3. The search runs across all linked KBs and returns the top 3 chunks by relevance.
  4. Those chunks are formatted and appended to the system prompt for that single LLM call.
Tip: If you hot-swap agents mid-call (e.g., via a squad transfer), the new agent's knowledge bases are loaded automatically.

7. Best Practices for Documents

Write in Q&A Format

Documents structured as question-and-answer pairs produce the best retrieval accuracy.

Q: What are your business hours?
A: We are open Monday through Friday, 9 AM to 6 PM Eastern Time.
We are closed on weekends and federal holidays.

Q: How do I reset my password?
A: Visit account.example.com/reset, enter your email address,
and click "Send Reset Link."

Keep Documents Focused

One topic per document works better than a monolithic catch-all.

Include Common Customer Questions

Seed your knowledge base with the questions your support team hears most often.

Use Clear Headings

Headings help the chunking algorithm produce cleaner chunks.

Avoid Huge Monolithic Documents

Very large documents (50+ pages) produce hundreds of chunks. Breaking large documents into smaller, topic-specific files yields better results.

Watch out: Chunk input text is capped at 8,000 characters per embedding call. If a single chunk exceeds this, the text is silently truncated before embedding.

8. Cost Considerations

OperationCostFrequency
Embedding (upload)$0.02 per 1M tokensOne-time per document upload
Embedding (query)~$0.000002 per queryOnce per conversational turn
Vector storageFree (PostgreSQL)Ongoing
Retrieval searchFree (pgvector query)Once per conversational turn

Practical Cost Estimate

A 10-page document (~5,000 words / ~6,500 tokens) costs roughly $0.00013 to embed. You would need to upload approximately 150 such documents to spend one cent.

Tip: The platform uses OPENAI_API_KEY for embeddings when available (direct to OpenAI), falling back to the tenant's OpenRouter key. OpenAI direct is both cheaper and faster.

9. Examples

Example 1: Product FAQ Knowledge Base

Step 1 — Create the knowledge base:

curl -X POST https://your-domain.com/api/knowledge \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "Product FAQ", "description": "Common questions about our SaaS product" }'

Step 2 — Upload the FAQ document:

curl -X POST https://your-domain.com/api/knowledge/$KB_ID/text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "product-faq.txt",
    "content": "Q: What is Acme Cloud?\nA: Acme Cloud is a cloud-based project management platform designed for teams of 5 to 500.\n\nQ: Is there a free trial?\nA: Yes. Every new account gets a 14-day free trial of the Business plan with no credit card required.\n\nQ: How do I cancel my subscription?\nA: Go to Settings, then Billing, then click Cancel Subscription."
  }'

Step 3 — Attach to your agent:

curl -X POST https://your-domain.com/api/knowledge/$KB_ID/link \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "YOUR_AGENT_ID" }'

Example 2: Pricing & Plans Knowledge Base

# Create KB
curl -X POST https://your-domain.com/api/knowledge \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "Pricing Plans", "description": "Current plan details and pricing" }'

# Upload pricing document
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "pricing-plans.txt",
    "content": "# Acme Cloud Pricing Plans\n\n## Free Plan - $0/month\n- Up to 3 users\n- 5 projects\n\n## Starter Plan - $12/user/month\n- Unlimited projects\n- Email support\n\n## Business Plan - $29/user/month\n- Phone and email support\n- Advanced integrations\n\n## Enterprise Plan - Custom pricing\n- Dedicated account manager\n- 99.99% uptime SLA"
  }'

# Link to agent
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/link \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "YOUR_AGENT_ID" }'

Example 3: Company Policy Document

# Create KB
curl -X POST https://your-domain.com/api/knowledge \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "Company Policies", "description": "Shipping, returns, and warranty info" }'

# Upload policy document
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "policies.txt",
    "content": "# Shipping Policy\n\nQ: How long does shipping take?\nA: Standard shipping takes 5-7 business days. Express (2-day) is $12.99.\n\n# Return Policy\n\nQ: What is your return policy?\nA: We accept returns within 30 days of delivery. Items must be unused.\n\n# Warranty\n\nQ: What does the warranty cover?\nA: All hardware products include a 1-year limited warranty."
  }'

# Link to agent
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/link \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "YOUR_AGENT_ID" }'

10. Troubleshooting

Agent does not use knowledge base information

CheckHow to VerifyFix
KB is linked to the agentGET /api/knowledge/KB_ID — check the agents arrayCall POST /api/knowledge/KB_ID/link
Documents are processedCheck that document status is "ready"Wait for processing or re-upload if "failed"
Documents have chunksCheck chunk_count > 0Re-upload; empty files produce zero chunks
Embedding API key is setEnsure OPENAI_API_KEY is in your environmentAdd the key; without it, only keyword search is available

Wrong or irrelevant information retrieved

Document processing is stuck

Note on reindexing: If you enabled pgvector after uploading documents, existing chunks will not have embeddings. Contact support to trigger a reindex.

Search returns no results