Introduction
Retrieval-Augmented Generation (RAG) has revolutionized how we build AI applications that need to answer questions based on specific knowledge bases. However, traditional RAG systems often suffer from a critical limitation: they blindly trust retrieved documents and generate answers without verifying relevance or accuracy. This is where Agentic RAG comes inβa self-reflective, adaptive approach that makes intelligent decisions about information retrieval, validates document relevance, detects hallucinations, and dynamically switches between local knowledge and web search.
In this blog post, we'll explore a production-ready implementation of an Agentic RAG system built with LangGraph, demonstrating how to create a sophisticated AI agent that doesn't just retrieve and generateβit thinks, evaluates, and adapts.
What is Agentic RAG?
Agentic RAG is an advanced form of RAG that incorporates autonomous decision-making capabilities into the retrieval and generation pipeline. Unlike traditional RAG systems that follow a linear path (retrieve β generate), Agentic RAG implements:
- π― Intelligent Routing: Determines whether to use local knowledge or web search based on the query type
- π Document Relevance Grading: Evaluates whether retrieved documents actually answer the question
- π Hallucination Detection: Verifies that generated answers are grounded in retrieved facts
- π Adaptive Behavior: Dynamically adjusts its strategy based on quality checks
- π Multi-Source Retrieval: Seamlessly combines local vectorstore and web search results
System Architecture
Our Agentic RAG system is built on LangGraph, a framework for creating stateful, multi-actor applications with LLMs. The architecture consists of several key components:
Core Components
1. State Management
The system maintains a typed state throughout the workflow:
class GraphState(TypedDict): question: str # User's query generation: str # LLM-generated answer web_search: bool # Flag indicating if web search is needed documents: List[str] # Retrieved or searched documents
This state flows through the entire pipeline, ensuring each node has access to the information it needs.
2. Graph Nodes
The system implements four primary nodes, each with a specific responsibility:
Retrieve Node
Fetches relevant documents from a local vector database (Chroma) using OpenAI embeddings.
def retrieve(state: GraphState) -> Dict[str, Any]: question = state["question"] documents = retriever.invoke(question) return {"documents": documents, "question": question}
Grade Documents Node
Evaluates each retrieved document for relevance using an LLM-based grader. If any document is irrelevant, it sets a flag to trigger web search.
def grade_documents(state: GraphState) -> Dict[str, Any]: question = state["question"] documents = state["documents"] filtered_docs = [] web_search = False for doc in documents: score = retrieval_grader.invoke( {"question": question, "document": doc.page_content} ) if score.binary_score.lower() == "yes": filtered_docs.append(doc) else: web_search = True return { "documents": filtered_docs, "question": question, "web_search": web_search }
Generate Node
Creates an answer based on the context (documents) and question using a RAG prompt template.
def generate(state: GraphState) -> Dict[str, Any]: question = state["question"] documents = state["documents"] generation = generation_chain.invoke( {"context": documents, "question": question} ) return { "documents": documents, "question": question, "generation": generation }
Web Search Node
Performs web search using Tavily API when local knowledge is insufficient.
def web_search(state: GraphState) -> Dict[str, Any]: question = state["question"] documents = state.get("documents", []) tavily_results = web_search_tool.invoke({"query": question})["results"] joined_result = "\n".join([r["content"] for r in tavily_results]) web_results = Document(page_content=joined_result) documents.append(web_results) return {"documents": documents, "question": question}
3. LLM-Powered Chains
The system uses specialized LLM chains for various evaluation tasks:
Question Router
Intelligently routes questions to either the vectorstore or web search based on content.
class RouteQuery(BaseModel): datasource: Literal["vectorstore", "websearch"] = Field( description="Route to web search or vectorstore" ) system_prompt = """You are an expert at routing a user question to a vectorstore or web search. The vectorstore contains documents related to agents, prompt engineering, and adversarial attacks. Use the vectorstore for questions on these topics. For all else, use web-search."""
Retrieval Grader
Assesses whether a retrieved document is relevant to the question.
class GradeDocuments(BaseModel): binary_score: str = Field( description="Documents are relevant to the question, 'yes' or 'no'" ) system_prompt = """You are a grader assessing relevance of a retrieved document to a user question. If the document contains keywords or semantic meaning related to the question, grade it as relevant."""
Hallucination Grader
Verifies that the generated answer is grounded in the retrieved facts.
class GradeHallucinations(BaseModel): binary_score: bool = Field( description="Answer is grounded in the facts, 'yes' or 'no'" ) system_prompt = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. Give a binary score 'yes' or 'no'."""
Answer Grader
Checks whether the answer actually addresses the user's question.
class GradeAnswer(BaseModel): binary_score: bool = Field( description="Answer addresses the question, 'yes' or 'no'" ) system_prompt = """You are a grader assessing whether an answer addresses / resolves a question. Give a binary score 'yes' or 'no'."""
The Workflow: How It All Comes Together
The beauty of this system lies in its conditional workflow. Here's how the graph executes:
1. Entry Point: Intelligent Routing
def route_question(state: GraphState) -> str: question = state["question"] source = question_router.invoke({"question": question}) if source.datasource == "websearch": return WEBSEARCH else: return RETRIEVE
When a question arrives, the router analyzes it and decides:
- Vectorstore: For domain-specific questions (agents, prompt engineering, adversarial attacks)
- Web Search: For general knowledge or current events
2. Retrieval & Grading Path
If routed to the vectorstore:
- Retrieve documents from the vector database
- Grade Documents for relevance
- Decision Point:
- If all documents are relevant β proceed to Generate
- If any document is irrelevant β trigger Web Search
 
def decide_to_generate(state): if state["web_search"]: return WEBSEARCH # Need more information else: return GENERATE # Have enough relevant docs
3. Generation & Quality Control
After generating an answer, the system performs rigorous quality checks:
def grade_generation_grounded_in_documents_and_question(state: GraphState) -> str: question = state["question"] documents = state["documents"] generation = state["generation"] # Check for hallucinations hallucination_score = hallucination_grader.invoke({ "documents": documents, "generation": generation }) if hallucination_score.binary_score: # Answer is grounded, now check if it addresses the question answer_score = answer_grader.invoke({ "question": question, "generation": generation }) if answer_score.binary_score: return "useful" # Perfect answer else: return "not useful" # Grounded but doesn't answer else: return "not supported" # Hallucination detected
Based on the quality checks:
- "useful": Answer is perfect β END
- "not useful": Need better context β Web Search
- "not supported": Hallucination detected β Regenerate
4. Building the Workflow Graph
LangGraph's StateGraph provides a powerful API for constructing the workflow. Here's how we wire everything together:
from langgraph.graph import StateGraph, END workflow = StateGraph(GraphState) # Add all nodes to the graph workflow.add_node(RETRIEVE, retrieve) workflow.add_node(GRADE_DOCUMENTS, grade_documents) workflow.add_node(GENERATE, generate) workflow.add_node(WEBSEARCH, web_search) # Set conditional entry point - routes to either retrieve or websearch workflow.set_conditional_entry_point( route_question, { WEBSEARCH: WEBSEARCH, RETRIEVE: RETRIEVE, }, ) # Retrieve always flows to grade_documents workflow.add_edge(RETRIEVE, GRADE_DOCUMENTS) # Grade documents conditionally routes to generate or websearch workflow.add_conditional_edges( GRADE_DOCUMENTS, decide_to_generate, { WEBSEARCH: WEBSEARCH, GENERATE: GENERATE, }, ) # Generate has three possible outcomes workflow.add_conditional_edges( GENERATE, grade_generation_grounded_in_documents_and_question, { "not supported": GENERATE, # Regenerate if hallucination "useful": END, # End if answer is good "not useful": WEBSEARCH, # Search web if not useful }, ) # Websearch always flows to generate workflow.add_edge(WEBSEARCH, GENERATE) # Compile the graph into an executable application app = workflow.compile()
Key Concepts:
- add_node(): Registers a function as a node in the graph
- set_conditional_entry_point(): Defines how the graph starts based on a routing function
- add_edge(): Creates a deterministic transition between nodes
- add_conditional_edges(): Creates dynamic transitions based on a decision function
- compile(): Converts the graph definition into an executable application
Note the elegant handling of the self-correcting loop: when GENERATE detects a hallucination ("not supported"), it routes back to itself, triggering regeneration with the same context but potentially different LLM sampling.
5. Visual Workflow
The complete workflow can be visualized as:
The diagram shows the complete flow of the Agentic RAG system:
- Start β Routes to either retrieve(vectorstore) orwebsearchbased on question type
- Retrieve β Fetches documents from vectorstore
- Grade Documents β Evaluates relevance and decides whether to generate or search web
- Web Search β Gathers additional context when needed
- Generate β Creates answer from available documents
- Quality Checks β Validates answer with three possible outcomes:
- useful: Perfect answer β End
- not useful: Needs more context β Web Search
- not supported: Hallucination detected β Regenerate
 
Key Features & Benefits
1. Self-Correcting Architecture
The system doesn't just fail when it encounters poor resultsβit adapts:
- Irrelevant documents? β Trigger web search
- Hallucinated answer? β Regenerate with better prompts
- Answer doesn't address question? β Search for more context
2. Multi-Layered Quality Assurance
Three independent grading mechanisms ensure high-quality outputs:
- Document relevance grading before generation
- Hallucination detection after generation
- Answer usefulness evaluation before returning to user
3. Hybrid Knowledge Access
Seamlessly combines:
- Local vectorstore: Fast, domain-specific knowledge
- Web search (Tavily): Current information and broader coverage
4. Production-Ready Design
- Typed state management with TypedDict
- Modular architecture with clear separation of concerns
- Structured outputs using Pydantic models
- Comprehensive logging for debugging and monitoring
- Error handling at each node
5. Observable & Debuggable
Each step prints its decision-making process:
---ROUTE QUESTION--- ---ROUTE QUESTION TO RAG--- ---RETRIEVE--- ---CHECK DOCUMENT RELEVANCE TO QUESTION--- ---GRADE: DOCUMENT RELEVANT--- ---GENERATE--- ---CHECK HALLUCINATIONS--- ---DECISION: GENERATION IS GROUNDED IN DOCUMENTS--- ---GRADE GENERATION vs QUESTION--- ---DECISION: GENERATION ADDRESSES QUESTION---
Data Ingestion & Vector Store
The system uses a carefully curated knowledge base:
urls = [ "https://lilianweng.github.io/posts/2023-06-23-agent/", "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/", "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/", ] # Load and split documents docs = [WebBaseLoader(url).load() for url in urls] docs_list = [item for sublist in docs for item in sublist] text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder( chunk_size=250, chunk_overlap=0 ) doc_splits = text_splitter.split_documents(docs_list) # Create vectorstore with Chroma and OpenAI embeddings vectorstore = Chroma.from_documents( documents=doc_splits, collection_name="rag-chroma", embedding=OpenAIEmbeddings(), persist_directory="./.chroma" )
Implementation Stack
The system leverages modern LLM tooling:
- LangGraph: State management and workflow orchestration
- LangChain: LLM chains, prompts, and integrations
- OpenAI: LLM for generation and structured outputs
- Chroma: Vector database for document storage
- Tavily: Web search API for external knowledge
- Pydantic: Data validation and structured outputs
Real-World Applications
This Agentic RAG architecture is ideal for:
- Customer Support Systems: Answer queries using documentation while falling back to web search for unknown topics
- Research Assistants: Verify claims against source documents and detect hallucinations
- Knowledge Management: Route questions to the right knowledge source (internal docs vs. public web)
- Educational Platforms: Ensure factual accuracy in tutoring applications
- Enterprise Search: Combine private knowledge bases with public information
Performance Considerations
Optimization Strategies
- Parallel Grading: Document relevance can be assessed in parallel
- Caching: Cache embeddings and frequent queries
- Early Termination: Stop processing irrelevant documents early
- Batch Processing: Process multiple questions efficiently
Trade-offs
- Latency vs. Quality: Multiple LLM calls increase latency but ensure quality
- Cost vs. Accuracy: Each grading step adds cost but reduces errors
- Local vs. Web: Balancing fast local retrieval with comprehensive web search
Getting Started
Running the system is straightforward:
# Install dependencies poetry install # Set up environment variables export OPENAI_API_KEY="your_key" export TAVILY_API_KEY="your_key" # Run ingestion (first time only) poetry run python ingestion.py # Run the agentic RAG poetry run python main.py
Example usage:
from graph.workflow import app result = app.invoke(input={"question": "What is agent memory?"}) print(result["generation"])
Lessons Learned
Building this system taught us several valuable lessons:
- Quality over Speed: The multiple grading steps significantly improve output quality
- Structured Outputs: Using Pydantic for LLM outputs ensures reliable parsing
- Conditional Logic: LangGraph's conditional edges enable sophisticated control flows
- Observability Matters: Logging each decision helps debug complex workflows
- Hybrid Approaches Win: Combining local and web search provides best of both worlds
Future Enhancements
Potential improvements to explore:
- Memory: Add conversation memory for multi-turn interactions
- Streaming: Implement streaming responses for better UX
- Multi-Modal: Support image and video content in retrieval
- Personalization: Adapt routing and grading to user preferences
- Fine-Tuning: Train custom models for domain-specific grading
- A/B Testing: Compare different routing strategies
- Feedback Loops: Learn from user corrections and ratings
Conclusion
Agentic RAG represents a significant evolution in how we build AI systems. By incorporating self-reflection, quality assurance, and adaptive behavior, we can create RAG applications that are not just powerful, but reliable and trustworthy.
The system demonstrated here showcases the power of LangGraph for building production-ready agentic applications. The modular architecture, clear separation of concerns, and rigorous quality checks make it an excellent foundation for real-world applications.
Whether you're building customer support bots, research assistants, or knowledge management systems, the principles and patterns shown here provide a solid starting point. The key is to think of your RAG system not as a simple retrieve-and-generate pipeline, but as an intelligent agent that reasons about information quality and adapts its strategy accordingly.
Resources
- Code Repository: langgraph-course
- LangGraph Documentation: LangGraph Docs
- Original Tutorial: Advance RAG control flow with Mistral and LangChain
- LangChain Cookbook: Mistral Cookbook
Built with β€οΈ using LangGraph, LangChain, and OpenAI

