Skip to content
GlossaryAI / LLM

What is RAG (Retrieval-Augmented Generation)?

RAG is a technique that lets an LLM answer questions using your private documents by retrieving relevant chunks before generating the response.

Definition

Retrieval-Augmented Generation (RAG) is the standard production pattern for letting an LLM answer questions using your data — wikis, PDFs, support tickets, code — without retraining the model. The flow: documents are split into chunks, embedded as numerical vectors, stored in a vector database (Pinecone, pgvector, Weaviate). When a user asks something, the system retrieves the most semantically similar chunks and passes them to the LLM as context. The LLM grounds its response in retrieved content — sharply reducing hallucinations and allowing source citations. RAG is preferred over fine-tuning when knowledge changes weekly, citation is required, or hallucination control matters.

Want to ship something with this technology?

Free 30-minute strategy call with a senior engineer. We'll quote your project in writing within 48 hours.

Book My Strategy Call
100% free No sales pitch 30 minutes Fixed-price quote in 48 hrs