Knowledge Base & RAG Guide

Give your voice agents domain-specific knowledge without retraining the underlying model.

1. Overview — What Is RAG?

RAG stands for Retrieval-Augmented Generation. It is a technique that lets an AI model answer questions using external documents you provide, rather than relying solely on its built-in training data.

For voice agents this means you can give your agent accurate, up-to-date information about your company’s products, pricing, policies, and procedures. The agent will reference this material during live calls and provide correct, specific answers to caller questions — without you needing to fine-tune or retrain any model.

How It Works at a High Level

Caller asks a question | v +-----------+ +------------------+ +-----------+ | Caller's | ----> | Embed query + | ----> | Top 3 | | question | | search KB chunks | | chunks | +-----------+ +------------------+ +-----------+ | v +------------------+ | System prompt + | | KB context + | | conversation | +------------------+ | v +------------------+ | LLM generates | | informed answer | +------------------+

You upload documents into a Knowledge Base.
Each document is split into small chunks and converted into numerical vectors (embeddings).
When a caller asks a question, the system converts the question into a vector and finds the most relevant chunks using cosine similarity.
Those chunks are injected into the LLM system prompt as reference material.
The LLM generates a response informed by both the conversation and your knowledge base.

2. Creating a Knowledge Base

Via the API

curl -X POST https://your-domain.com/api/knowledge \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product FAQ",
    "description": "Frequently asked questions about our products"
  }'

Response:

{
  "knowledge_base": {
    "id": "a1b2c3d4-...",
    "tenant_id": "...",
    "name": "Product FAQ",
    "description": "Frequently asked questions about our products",
    "created_at": "2026-02-22T12:00:00.000Z"
  }
}

Via the Dashboard

Navigate to Knowledge Bases in the sidebar.
Click Create Knowledge Base.
Enter a name and optional description.
Click Save.

Naming Best Practices

Use descriptive, specific names: Pricing Plans Q3 2026 rather than Info.
Group related documents under one knowledge base.
Avoid stuffing everything into a single knowledge base — smaller, focused KBs produce better retrieval accuracy.

3. Uploading Documents

Supported Formats

Format	Content Type	Notes
Plain text	`text/plain`	.txt files
Markdown	`text/markdown`	.md files
PDF	`application/pdf`	Text extracted via pdf-parse
HTML	`text/html`	Tags stripped, text preserved
JSON	`application/json`	Pretty-printed then chunked
CSV	`text/csv`	Treated as raw text

Maximum file size: 10 MB per upload.

Three Upload Methods

1. File upload (multipart form) — ideal for the dashboard file picker:

curl -X POST https://your-domain.com/api/knowledge/KB_ID/upload-file \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@product-faq.pdf"

2. Raw body upload — pass file bytes directly with a filename header:

curl -X POST https://your-domain.com/api/knowledge/KB_ID/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: text/plain" \
  -H "X-Filename: faq.txt" \
  --data-binary @faq.txt

3. Paste text directly — no file needed:

curl -X POST https://your-domain.com/api/knowledge/KB_ID/text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Return Policy",
    "content": "Our return policy allows returns within 30 days..."
  }'

4. Scrape a URL — fetch a web page and index its text:

curl -X POST https://your-domain.com/api/knowledge/KB_ID/url \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com/pricing" }'

What Happens During Processing

Extract — Raw text is extracted from the uploaded file.
Chunk — The text is split into overlapping chunks of ~500 tokens each.
Embed — Each chunk is sent to the OpenAI text-embedding-3-small model to generate a 1536-dimension vector.
Store — Chunks, their text search vectors, and embedding vectors are saved to the knowledge_chunks table.

Processing happens in the background. The upload endpoint returns immediately with a document record whose status is "processing". Once complete, the status changes to "ready".

Tip: You can check a document's processing status by fetching the knowledge base detail endpoint: GET /api/knowledge/KB_ID. Each document in the response includes its status and chunk_count.

4. How Chunking Works

Documents are split into chunks using a sliding window approach with the following parameters:

Parameter	Value	Description
Chunk size	500 tokens (~2,000 characters)	Maximum number of tokens per chunk
Chunk overlap	50 tokens	Tokens shared between consecutive chunks

Why Overlap Matters

Without overlap, a sentence that falls on a chunk boundary gets split across two chunks. Neither chunk has the full context, which hurts retrieval accuracy. The 50-token overlap ensures that content near boundaries appears in both chunks.

Document: [-------- Chunk 1 --------][-- overlap --][-------- Chunk 2 --------] ^ ^ These 50 tokens appear in BOTH chunks

How Many Chunks Will My Document Produce?

chunks = ceil( total_words / (500 - 50) )
       = ceil( total_words / 450 )

Document Size	Approximate Words	Approximate Chunks
1 page (~500 words)	500	2
5 pages (~2,500 words)	2,500	6
20 pages (~10,000 words)	10,000	23
100 pages (~50,000 words)	50,000	112

Tip: The chunking algorithm treats tokens as whitespace-delimited words. This is an approximation — actual OpenAI token counts may differ slightly, but it works well in practice.

5. How Retrieval Works

When a caller speaks during a call, the system searches attached knowledge bases before every LLM call to find relevant context. The search uses a hybrid approach combining two strategies.

Hybrid Search (Default)

Component	Weight	How It Works
Semantic similarity	70%	The caller's utterance is embedded using `text-embedding-3-small`. The resulting vector is compared against all chunk embeddings using cosine distance.
Keyword matching	30%	PostgreSQL full-text search ranks chunks by keyword relevance using `ts_rank`.

The combined score determines ranking. The top 5 chunks (configurable via top_k) are returned.

Keyword-Only Fallback

If pgvector is not available or no embedding API key is configured, the system falls back to keyword-only search. If that also fails, a final ILIKE fallback performs a simple substring match.

Injection into the System Prompt

REFERENCE MATERIAL (from your knowledge base -- use this to answer questions):

[Source: product-faq.txt]
Q: What is your return policy?
A: We accept returns within 30 days of purchase...

[Source: pricing.txt]
Our Pro plan costs $49/month and includes...

Use the reference material above to inform your responses when relevant.
If the material doesn't cover what was asked, say you're not sure.

The system retrieves the top 3 chunks during calls. Each chunk includes its source filename for traceability.

Important: Knowledge search runs on every caller utterance. Chunks are not cached across turns — each turn searches fresh so the most relevant context is always used.

6. Attaching Knowledge Bases to Agents

Link a Knowledge Base to an Agent

curl -X POST https://your-domain.com/api/knowledge/KB_ID/link \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "AGENT_UUID" }'

Unlink a Knowledge Base from an Agent

curl -X POST https://your-domain.com/api/knowledge/KB_ID/unlink \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "AGENT_UUID" }'

Multiple Knowledge Bases per Agent

You can attach multiple knowledge bases to a single agent. When the agent searches for context, it queries all attached KBs simultaneously and returns the top-scoring chunks across all of them.

The link is stored in the agent_knowledge_bases join table with a composite primary key of (agent_id, knowledge_base_id). Duplicate links are silently ignored.

How Chunks Are Selected During Calls

When a call starts, the system loads the agent's linked knowledge base IDs.
On each conversational turn, the caller's utterance is used as the search query.
The search runs across all linked KBs and returns the top 3 chunks by relevance.
Those chunks are formatted and appended to the system prompt for that single LLM call.

Tip: If you hot-swap agents mid-call (e.g., via a squad transfer), the new agent's knowledge bases are loaded automatically.

7. Best Practices for Documents

Write in Q&A Format

Documents structured as question-and-answer pairs produce the best retrieval accuracy.

Q: What are your business hours?
A: We are open Monday through Friday, 9 AM to 6 PM Eastern Time.
We are closed on weekends and federal holidays.

Q: How do I reset my password?
A: Visit account.example.com/reset, enter your email address,
and click "Send Reset Link."

Keep Documents Focused

One topic per document works better than a monolithic catch-all.

Good: shipping-policy.txt, return-policy.txt, product-specs.txt
Bad: everything-about-our-company.txt

Include Common Customer Questions

Seed your knowledge base with the questions your support team hears most often.

Use Clear Headings

Headings help the chunking algorithm produce cleaner chunks.

Avoid Huge Monolithic Documents

Very large documents (50+ pages) produce hundreds of chunks. Breaking large documents into smaller, topic-specific files yields better results.

Watch out: Chunk input text is capped at 8,000 characters per embedding call. If a single chunk exceeds this, the text is silently truncated before embedding.

8. Cost Considerations

Operation	Cost	Frequency
Embedding (upload)	$0.02 per 1M tokens	One-time per document upload
Embedding (query)	~$0.000002 per query	Once per conversational turn
Vector storage	Free (PostgreSQL)	Ongoing
Retrieval search	Free (pgvector query)	Once per conversational turn

Practical Cost Estimate

A 10-page document (~5,000 words / ~6,500 tokens) costs roughly $0.00013 to embed. You would need to upload approximately 150 such documents to spend one cent.

Tip: The platform uses OPENAI_API_KEY for embeddings when available (direct to OpenAI), falling back to the tenant's OpenRouter key. OpenAI direct is both cheaper and faster.

9. Examples

Example 1: Product FAQ Knowledge Base

Step 1 — Create the knowledge base:

curl -X POST https://your-domain.com/api/knowledge \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "Product FAQ", "description": "Common questions about our SaaS product" }'

Step 2 — Upload the FAQ document:

curl -X POST https://your-domain.com/api/knowledge/$KB_ID/text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "product-faq.txt",
    "content": "Q: What is Acme Cloud?\nA: Acme Cloud is a cloud-based project management platform designed for teams of 5 to 500.\n\nQ: Is there a free trial?\nA: Yes. Every new account gets a 14-day free trial of the Business plan with no credit card required.\n\nQ: How do I cancel my subscription?\nA: Go to Settings, then Billing, then click Cancel Subscription."
  }'

Step 3 — Attach to your agent:

curl -X POST https://your-domain.com/api/knowledge/$KB_ID/link \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "YOUR_AGENT_ID" }'

Example 2: Pricing & Plans Knowledge Base

# Create KB
curl -X POST https://your-domain.com/api/knowledge \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "Pricing Plans", "description": "Current plan details and pricing" }'

# Upload pricing document
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "pricing-plans.txt",
    "content": "# Acme Cloud Pricing Plans\n\n## Free Plan - $0/month\n- Up to 3 users\n- 5 projects\n\n## Starter Plan - $12/user/month\n- Unlimited projects\n- Email support\n\n## Business Plan - $29/user/month\n- Phone and email support\n- Advanced integrations\n\n## Enterprise Plan - Custom pricing\n- Dedicated account manager\n- 99.99% uptime SLA"
  }'

# Link to agent
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/link \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "YOUR_AGENT_ID" }'

Example 3: Company Policy Document

# Create KB
curl -X POST https://your-domain.com/api/knowledge \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "Company Policies", "description": "Shipping, returns, and warranty info" }'

# Upload policy document
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/text \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "policies.txt",
    "content": "# Shipping Policy\n\nQ: How long does shipping take?\nA: Standard shipping takes 5-7 business days. Express (2-day) is $12.99.\n\n# Return Policy\n\nQ: What is your return policy?\nA: We accept returns within 30 days of delivery. Items must be unused.\n\n# Warranty\n\nQ: What does the warranty cover?\nA: All hardware products include a 1-year limited warranty."
  }'

# Link to agent
curl -X POST https://your-domain.com/api/knowledge/$KB_ID/link \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "agent_id": "YOUR_AGENT_ID" }'

10. Troubleshooting

Agent does not use knowledge base information

Check	How to Verify	Fix
KB is linked to the agent	`GET /api/knowledge/KB_ID` — check the `agents` array	Call `POST /api/knowledge/KB_ID/link`
Documents are processed	Check that document `status` is `"ready"`	Wait for processing or re-upload if `"failed"`
Documents have chunks	Check `chunk_count` > 0	Re-upload; empty files produce zero chunks
Embedding API key is set	Ensure `OPENAI_API_KEY` is in your environment	Add the key; without it, only keyword search is available

Wrong or irrelevant information retrieved

Improve document structure. Use Q&A format instead of long narrative paragraphs.
Split large documents. Break 50-page documents into topic-specific files.

Test with the search endpoint:

curl -X POST https://your-domain.com/api/knowledge/KB_ID/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "query": "what is the return policy", "top_k": 5 }'

Check hybrid scoring. If semantic results are poor, embeddings may need regeneration.

Document processing is stuck

File too large. Dense text near the 10 MB limit can produce thousands of chunks.
Embedding API errors. If the OpenAI API key is invalid, chunks are stored without embeddings.
PDF extraction failure. Some PDFs (scanned images, encrypted) cannot be text-extracted. Use OCR first.

Note on reindexing: If you enabled pgvector after uploading documents, existing chunks will not have embeddings. Contact support to trigger a reindex.

Search returns no results

Verify the knowledge base ID is correct and belongs to your tenant.
Ensure at least one document has status: "ready" with chunk_count > 0.
Try a broader query. Very short or highly specific queries may not match.
As a last resort, the system falls back to a simple ILIKE substring match.

Knowledge Base & RAG Guide

1. Overview — What Is RAG?

How It Works at a High Level

2. Creating a Knowledge Base

Via the API

Via the Dashboard

Naming Best Practices

3. Uploading Documents

Supported Formats

Three Upload Methods

What Happens During Processing

4. How Chunking Works

Why Overlap Matters

How Many Chunks Will My Document Produce?

5. How Retrieval Works

Hybrid Search (Default)

Keyword-Only Fallback

Injection into the System Prompt

6. Attaching Knowledge Bases to Agents

Link a Knowledge Base to an Agent

Unlink a Knowledge Base from an Agent

Multiple Knowledge Bases per Agent

How Chunks Are Selected During Calls

7. Best Practices for Documents

Write in Q&A Format

Keep Documents Focused

Include Common Customer Questions

Use Clear Headings

Avoid Huge Monolithic Documents

8. Cost Considerations

Practical Cost Estimate

9. Examples

Example 1: Product FAQ Knowledge Base

Example 2: Pricing & Plans Knowledge Base

Example 3: Company Policy Document

10. Troubleshooting

Agent does not use knowledge base information

Wrong or irrelevant information retrieved

Document processing is stuck

Search returns no results

Related Articles