Knowledge Base & RAG Guide
Give your voice agents domain-specific knowledge without retraining the underlying model.
1. Overview — What Is RAG?
RAG stands for Retrieval-Augmented Generation. It is a technique that lets an AI model answer questions using external documents you provide, rather than relying solely on its built-in training data.
For voice agents this means you can give your agent accurate, up-to-date information about your company’s products, pricing, policies, and procedures. The agent will reference this material during live calls and provide correct, specific answers to caller questions — without you needing to fine-tune or retrain any model.
How It Works at a High Level
- You upload documents into a Knowledge Base.
- Each document is split into small chunks and converted into numerical vectors (embeddings).
- When a caller asks a question, the system converts the question into a vector and finds the most relevant chunks using cosine similarity.
- Those chunks are injected into the LLM system prompt as reference material.
- The LLM generates a response informed by both the conversation and your knowledge base.
2. Creating a Knowledge Base
Via the API
curl -X POST https://your-domain.com/api/knowledge \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Product FAQ",
"description": "Frequently asked questions about our products"
}'
Response:
{
"knowledge_base": {
"id": "a1b2c3d4-...",
"tenant_id": "...",
"name": "Product FAQ",
"description": "Frequently asked questions about our products",
"created_at": "2026-02-22T12:00:00.000Z"
}
}
Via the Dashboard
- Navigate to Knowledge Bases in the sidebar.
- Click Create Knowledge Base.
- Enter a name and optional description.
- Click Save.
Naming Best Practices
- Use descriptive, specific names:
Pricing Plans Q3 2026rather thanInfo. - Group related documents under one knowledge base.
- Avoid stuffing everything into a single knowledge base — smaller, focused KBs produce better retrieval accuracy.
3. Uploading Documents
Supported Formats
| Format | Content Type | Notes |
|---|---|---|
| Plain text | text/plain | .txt files |
| Markdown | text/markdown | .md files |
application/pdf | Text extracted via pdf-parse | |
| HTML | text/html | Tags stripped, text preserved |
| JSON | application/json | Pretty-printed then chunked |
| CSV | text/csv | Treated as raw text |
Maximum file size: 10 MB per upload.
Three Upload Methods
1. File upload (multipart form) — ideal for the dashboard file picker:
curl -X POST https://your-domain.com/api/knowledge/KB_ID/upload-file \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@product-faq.pdf"
2. Raw body upload — pass file bytes directly with a filename header:
curl -X POST https://your-domain.com/api/knowledge/KB_ID/upload \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: text/plain" \
-H "X-Filename: faq.txt" \
--data-binary @faq.txt
3. Paste text directly — no file needed:
curl -X POST https://your-domain.com/api/knowledge/KB_ID/text \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"title": "Return Policy",
"content": "Our return policy allows returns within 30 days..."
}'
4. Scrape a URL — fetch a web page and index its text:
curl -X POST https://your-domain.com/api/knowledge/KB_ID/url \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "url": "https://example.com/pricing" }'
What Happens During Processing
- Extract — Raw text is extracted from the uploaded file.
- Chunk — The text is split into overlapping chunks of ~500 tokens each.
- Embed — Each chunk is sent to the OpenAI
text-embedding-3-smallmodel to generate a 1536-dimension vector. - Store — Chunks, their text search vectors, and embedding vectors are saved to the
knowledge_chunkstable.
Processing happens in the background. The upload endpoint returns immediately with a document record whose status is "processing". Once complete, the status changes to "ready".
GET /api/knowledge/KB_ID. Each document in the response includes its status and chunk_count.
4. How Chunking Works
Documents are split into chunks using a sliding window approach with the following parameters:
| Parameter | Value | Description |
|---|---|---|
| Chunk size | 500 tokens (~2,000 characters) | Maximum number of tokens per chunk |
| Chunk overlap | 50 tokens | Tokens shared between consecutive chunks |
Why Overlap Matters
Without overlap, a sentence that falls on a chunk boundary gets split across two chunks. Neither chunk has the full context, which hurts retrieval accuracy. The 50-token overlap ensures that content near boundaries appears in both chunks.
How Many Chunks Will My Document Produce?
chunks = ceil( total_words / (500 - 50) )
= ceil( total_words / 450 )
| Document Size | Approximate Words | Approximate Chunks |
|---|---|---|
| 1 page (~500 words) | 500 | 2 |
| 5 pages (~2,500 words) | 2,500 | 6 |
| 20 pages (~10,000 words) | 10,000 | 23 |
| 100 pages (~50,000 words) | 50,000 | 112 |
5. How Retrieval Works
When a caller speaks during a call, the system searches attached knowledge bases before every LLM call to find relevant context. The search uses a hybrid approach combining two strategies.
Hybrid Search (Default)
| Component | Weight | How It Works |
|---|---|---|
| Semantic similarity | 70% | The caller's utterance is embedded using text-embedding-3-small. The resulting vector is compared against all chunk embeddings using cosine distance. |
| Keyword matching | 30% | PostgreSQL full-text search ranks chunks by keyword relevance using ts_rank. |
The combined score determines ranking. The top 5 chunks (configurable via top_k) are returned.
Keyword-Only Fallback
If pgvector is not available or no embedding API key is configured, the system falls back to keyword-only search. If that also fails, a final ILIKE fallback performs a simple substring match.
Injection into the System Prompt
REFERENCE MATERIAL (from your knowledge base -- use this to answer questions):
[Source: product-faq.txt]
Q: What is your return policy?
A: We accept returns within 30 days of purchase...
[Source: pricing.txt]
Our Pro plan costs $49/month and includes...
Use the reference material above to inform your responses when relevant.
If the material doesn't cover what was asked, say you're not sure.
The system retrieves the top 3 chunks during calls. Each chunk includes its source filename for traceability.
6. Attaching Knowledge Bases to Agents
Link a Knowledge Base to an Agent
curl -X POST https://your-domain.com/api/knowledge/KB_ID/link \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "agent_id": "AGENT_UUID" }'
Unlink a Knowledge Base from an Agent
curl -X POST https://your-domain.com/api/knowledge/KB_ID/unlink \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "agent_id": "AGENT_UUID" }'
Multiple Knowledge Bases per Agent
You can attach multiple knowledge bases to a single agent. When the agent searches for context, it queries all attached KBs simultaneously and returns the top-scoring chunks across all of them.
The link is stored in the agent_knowledge_bases join table with a composite primary key of (agent_id, knowledge_base_id). Duplicate links are silently ignored.
How Chunks Are Selected During Calls
- When a call starts, the system loads the agent's linked knowledge base IDs.
- On each conversational turn, the caller's utterance is used as the search query.
- The search runs across all linked KBs and returns the top 3 chunks by relevance.
- Those chunks are formatted and appended to the system prompt for that single LLM call.
7. Best Practices for Documents
Write in Q&A Format
Documents structured as question-and-answer pairs produce the best retrieval accuracy.
Q: What are your business hours?
A: We are open Monday through Friday, 9 AM to 6 PM Eastern Time.
We are closed on weekends and federal holidays.
Q: How do I reset my password?
A: Visit account.example.com/reset, enter your email address,
and click "Send Reset Link."
Keep Documents Focused
One topic per document works better than a monolithic catch-all.
- Good:
shipping-policy.txt,return-policy.txt,product-specs.txt - Bad:
everything-about-our-company.txt
Include Common Customer Questions
Seed your knowledge base with the questions your support team hears most often.
Use Clear Headings
Headings help the chunking algorithm produce cleaner chunks.
Avoid Huge Monolithic Documents
Very large documents (50+ pages) produce hundreds of chunks. Breaking large documents into smaller, topic-specific files yields better results.
8. Cost Considerations
| Operation | Cost | Frequency |
|---|---|---|
| Embedding (upload) | $0.02 per 1M tokens | One-time per document upload |
| Embedding (query) | ~$0.000002 per query | Once per conversational turn |
| Vector storage | Free (PostgreSQL) | Ongoing |
| Retrieval search | Free (pgvector query) | Once per conversational turn |
Practical Cost Estimate
A 10-page document (~5,000 words / ~6,500 tokens) costs roughly $0.00013 to embed. You would need to upload approximately 150 such documents to spend one cent.
OPENAI_API_KEY for embeddings when available (direct to OpenAI), falling back to the tenant's OpenRouter key. OpenAI direct is both cheaper and faster.9. Examples
Example 1: Product FAQ Knowledge Base
Step 1 — Create the knowledge base:
curl -X POST https://your-domain.com/api/knowledge \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "name": "Product FAQ", "description": "Common questions about our SaaS product" }'
Step 2 — Upload the FAQ document:
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/text \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"title": "product-faq.txt",
"content": "Q: What is Acme Cloud?\nA: Acme Cloud is a cloud-based project management platform designed for teams of 5 to 500.\n\nQ: Is there a free trial?\nA: Yes. Every new account gets a 14-day free trial of the Business plan with no credit card required.\n\nQ: How do I cancel my subscription?\nA: Go to Settings, then Billing, then click Cancel Subscription."
}'
Step 3 — Attach to your agent:
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/link \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "agent_id": "YOUR_AGENT_ID" }'
Example 2: Pricing & Plans Knowledge Base
# Create KB
curl -X POST https://your-domain.com/api/knowledge \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "name": "Pricing Plans", "description": "Current plan details and pricing" }'
# Upload pricing document
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/text \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"title": "pricing-plans.txt",
"content": "# Acme Cloud Pricing Plans\n\n## Free Plan - $0/month\n- Up to 3 users\n- 5 projects\n\n## Starter Plan - $12/user/month\n- Unlimited projects\n- Email support\n\n## Business Plan - $29/user/month\n- Phone and email support\n- Advanced integrations\n\n## Enterprise Plan - Custom pricing\n- Dedicated account manager\n- 99.99% uptime SLA"
}'
# Link to agent
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/link \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "agent_id": "YOUR_AGENT_ID" }'
Example 3: Company Policy Document
# Create KB
curl -X POST https://your-domain.com/api/knowledge \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "name": "Company Policies", "description": "Shipping, returns, and warranty info" }'
# Upload policy document
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/text \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"title": "policies.txt",
"content": "# Shipping Policy\n\nQ: How long does shipping take?\nA: Standard shipping takes 5-7 business days. Express (2-day) is $12.99.\n\n# Return Policy\n\nQ: What is your return policy?\nA: We accept returns within 30 days of delivery. Items must be unused.\n\n# Warranty\n\nQ: What does the warranty cover?\nA: All hardware products include a 1-year limited warranty."
}'
# Link to agent
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/link \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "agent_id": "YOUR_AGENT_ID" }'
10. Troubleshooting
Agent does not use knowledge base information
| Check | How to Verify | Fix |
|---|---|---|
| KB is linked to the agent | GET /api/knowledge/KB_ID — check the agents array | Call POST /api/knowledge/KB_ID/link |
| Documents are processed | Check that document status is "ready" | Wait for processing or re-upload if "failed" |
| Documents have chunks | Check chunk_count > 0 | Re-upload; empty files produce zero chunks |
| Embedding API key is set | Ensure OPENAI_API_KEY is in your environment | Add the key; without it, only keyword search is available |
Wrong or irrelevant information retrieved
- Improve document structure. Use Q&A format instead of long narrative paragraphs.
- Split large documents. Break 50-page documents into topic-specific files.
- Test with the search endpoint:
curl -X POST https://your-domain.com/api/knowledge/KB_ID/search \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "query": "what is the return policy", "top_k": 5 }' - Check hybrid scoring. If semantic results are poor, embeddings may need regeneration.
Document processing is stuck
- File too large. Dense text near the 10 MB limit can produce thousands of chunks.
- Embedding API errors. If the OpenAI API key is invalid, chunks are stored without embeddings.
- PDF extraction failure. Some PDFs (scanned images, encrypted) cannot be text-extracted. Use OCR first.
Search returns no results
- Verify the knowledge base ID is correct and belongs to your tenant.
- Ensure at least one document has
status: "ready"withchunk_count > 0. - Try a broader query. Very short or highly specific queries may not match.
- As a last resort, the system falls back to a simple
ILIKEsubstring match.