Applicability
When to Use
✓When AI needs access to private or current data
✓When reducing hallucinations in AI responses
✓When building knowledge-base chatbots
Overview
How It Works
RAG combines retrieval from MCP servers with AI generation. When a user asks a question, the agent first searches for relevant documents using vector search (Pinecone, Qdrant, or ChromaDB MCP Server), retrieves the most relevant chunks, and includes them as context for the AI model.
This pattern is fundamental to building AI agents that can answer questions about your specific data. The MCP architecture makes it natural: the vector database MCP server handles retrieval, while the LLM MCP server handles generation. The agent orchestrates the flow.
Implementation
Code Example
typescript
async function ragAnswer(question) {
// Retrieve relevant context
const embedding = await openai.embed({ input: question });
const matches = await pinecone.query({ vector: embedding, topK: 5, includeMetadata: true });
const context = matches.map(m => m.metadata.text).join("\n\n");
// Generate answer with context
const answer = await openai.chat({
messages: [
{ role: "system", content: `Answer based on the following context. If the context doesn't contain the answer, say so.\n\nContext:\n${context}` },
{ role: "user", content: question }
]
});
return { answer, sources: matches.map(m => m.metadata.source) };
}Quick Info
Categoryai-agent
ComplexityMedium