Agentic RAG System with LangGraph

Introduction

Retrieval-Augmented Generation (RAG) has revolutionized how we build AI applications that need to answer questions based on specific knowledge bases. However, traditional RAG systems often suffer from a critical limitation: they blindly trust retrieved documents and generate answers without verifying relevance or accuracy. This is where Agentic RAG comes in—a self-reflective, adaptive approach that makes intelligent decisions about information retrieval, validates document relevance, detects hallucinations, and dynamically switches between local knowledge and web search.

In this blog post, we'll explore a production-ready implementation of an Agentic RAG system built with LangGraph, demonstrating how to create a sophisticated AI agent that doesn't just retrieve and generate—it thinks, evaluates, and adapts.

What is Agentic RAG?

Agentic RAG is an advanced form of RAG that incorporates autonomous decision-making capabilities into the retrieval and generation pipeline. Unlike traditional RAG systems that follow a linear path (retrieve → generate), Agentic RAG implements:

🎯 Intelligent Routing: Determines whether to use local knowledge or web search based on the query type
📊 Document Relevance Grading: Evaluates whether retrieved documents actually answer the question
🔍 Hallucination Detection: Verifies that generated answers are grounded in retrieved facts
🔄 Adaptive Behavior: Dynamically adjusts its strategy based on quality checks
🌐 Multi-Source Retrieval: Seamlessly combines local vectorstore and web search results

System Architecture

Our Agentic RAG system is built on LangGraph, a framework for creating stateful, multi-actor applications with LLMs. The architecture consists of several key components:

Core Components

1. State Management

The system maintains a typed state throughout the workflow:

class GraphState(TypedDict):
    question: str        # User's query
    generation: str      # LLM-generated answer
    web_search: bool     # Flag indicating if web search is needed
    documents: List[str] # Retrieved or searched documents

This state flows through the entire pipeline, ensuring each node has access to the information it needs.

2. Graph Nodes

The system implements four primary nodes, each with a specific responsibility:

Retrieve Node

Fetches relevant documents from a local vector database (Chroma) using OpenAI embeddings.

def retrieve(state: GraphState) -> Dict[str, Any]:
    question = state["question"]
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}

Grade Documents Node

Evaluates each retrieved document for relevance using an LLM-based grader. If any document is irrelevant, it sets a flag to trigger web search.

def grade_documents(state: GraphState) -> Dict[str, Any]:
    question = state["question"]
    documents = state["documents"]

    filtered_docs = []
    web_search = False

    for doc in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": doc.page_content}
        )
        if score.binary_score.lower() == "yes":
            filtered_docs.append(doc)
        else:
            web_search = True

    return {
        "documents": filtered_docs,
        "question": question,
        "web_search": web_search
    }

Generate Node

Creates an answer based on the context (documents) and question using a RAG prompt template.

def generate(state: GraphState) -> Dict[str, Any]:
    question = state["question"]
    documents = state["documents"]

    generation = generation_chain.invoke(
        {"context": documents, "question": question}
    )
    return {
        "documents": documents,
        "question": question,
        "generation": generation
    }

Web Search Node

Performs web search using Tavily API when local knowledge is insufficient.

def web_search(state: GraphState) -> Dict[str, Any]:
    question = state["question"]
    documents = state.get("documents", [])

    tavily_results = web_search_tool.invoke({"query": question})["results"]
    joined_result = "\n".join([r["content"] for r in tavily_results])
    web_results = Document(page_content=joined_result)

    documents.append(web_results)
    return {"documents": documents, "question": question}

3. LLM-Powered Chains

The system uses specialized LLM chains for various evaluation tasks:

Question Router

Intelligently routes questions to either the vectorstore or web search based on content.

class RouteQuery(BaseModel):
    datasource: Literal["vectorstore", "websearch"] = Field(
        description="Route to web search or vectorstore"
    )

system_prompt = """You are an expert at routing a user question to a
vectorstore or web search. The vectorstore contains documents related to
agents, prompt engineering, and adversarial attacks. Use the vectorstore
for questions on these topics. For all else, use web-search."""

Retrieval Grader

Assesses whether a retrieved document is relevant to the question.

class GradeDocuments(BaseModel):
    binary_score: str = Field(
        description="Documents are relevant to the question, 'yes' or 'no'"
    )

system_prompt = """You are a grader assessing relevance of a retrieved
document to a user question. If the document contains keywords or semantic
meaning related to the question, grade it as relevant."""

Hallucination Grader

Verifies that the generated answer is grounded in the retrieved facts.

class GradeHallucinations(BaseModel):
    binary_score: bool = Field(
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )

system_prompt = """You are a grader assessing whether an LLM generation
is grounded in / supported by a set of retrieved facts. Give a binary
score 'yes' or 'no'."""

Answer Grader

Checks whether the answer actually addresses the user's question.

class GradeAnswer(BaseModel):
    binary_score: bool = Field(
        description="Answer addresses the question, 'yes' or 'no'"
    )

system_prompt = """You are a grader assessing whether an answer addresses /
resolves a question. Give a binary score 'yes' or 'no'."""

The Workflow: How It All Comes Together

The beauty of this system lies in its conditional workflow. Here's how the graph executes:

1. Entry Point: Intelligent Routing

def route_question(state: GraphState) -> str:
    question = state["question"]
    source = question_router.invoke({"question": question})

    if source.datasource == "websearch":
        return WEBSEARCH
    else:
        return RETRIEVE

When a question arrives, the router analyzes it and decides:

Vectorstore: For domain-specific questions (agents, prompt engineering, adversarial attacks)
Web Search: For general knowledge or current events

2. Retrieval & Grading Path

If routed to the vectorstore:

Retrieve documents from the vector database
Grade Documents for relevance
Decision Point:
- If all documents are relevant → proceed to Generate
- If any document is irrelevant → trigger Web Search

def decide_to_generate(state):
    if state["web_search"]:
        return WEBSEARCH  # Need more information
    else:
        return GENERATE   # Have enough relevant docs

3. Generation & Quality Control

After generating an answer, the system performs rigorous quality checks:

def grade_generation_grounded_in_documents_and_question(state: GraphState) -> str:
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    # Check for hallucinations
    hallucination_score = hallucination_grader.invoke({
        "documents": documents,
        "generation": generation
    })

    if hallucination_score.binary_score:
        # Answer is grounded, now check if it addresses the question
        answer_score = answer_grader.invoke({
            "question": question,
            "generation": generation
        })

        if answer_score.binary_score:
            return "useful"          # Perfect answer
        else:
            return "not useful"      # Grounded but doesn't answer
    else:
        return "not supported"       # Hallucination detected

Based on the quality checks:

"useful": Answer is perfect → END
"not useful": Need better context → Web Search
"not supported": Hallucination detected → Regenerate

4. Building the Workflow Graph

LangGraph's StateGraph provides a powerful API for constructing the workflow. Here's how we wire everything together:

from langgraph.graph import StateGraph, END

workflow = StateGraph(GraphState)

# Add all nodes to the graph
workflow.add_node(RETRIEVE, retrieve)
workflow.add_node(GRADE_DOCUMENTS, grade_documents)
workflow.add_node(GENERATE, generate)
workflow.add_node(WEBSEARCH, web_search)

# Set conditional entry point - routes to either retrieve or websearch
workflow.set_conditional_entry_point(
    route_question,
    {
        WEBSEARCH: WEBSEARCH,
        RETRIEVE: RETRIEVE,
    },
)

# Retrieve always flows to grade_documents
workflow.add_edge(RETRIEVE, GRADE_DOCUMENTS)

# Grade documents conditionally routes to generate or websearch
workflow.add_conditional_edges(
    GRADE_DOCUMENTS,
    decide_to_generate,
    {
        WEBSEARCH: WEBSEARCH,
        GENERATE: GENERATE,
    },
)

# Generate has three possible outcomes
workflow.add_conditional_edges(
    GENERATE,
    grade_generation_grounded_in_documents_and_question,
    {
        "not supported": GENERATE,  # Regenerate if hallucination
        "useful": END,              # End if answer is good
        "not useful": WEBSEARCH,    # Search web if not useful
    },
)

# Websearch always flows to generate
workflow.add_edge(WEBSEARCH, GENERATE)

# Compile the graph into an executable application
app = workflow.compile()

Key Concepts:

add_node(): Registers a function as a node in the graph
set_conditional_entry_point(): Defines how the graph starts based on a routing function
add_edge(): Creates a deterministic transition between nodes
add_conditional_edges(): Creates dynamic transitions based on a decision function
compile(): Converts the graph definition into an executable application

Note the elegant handling of the self-correcting loop: when GENERATE detects a hallucination ("not supported"), it routes back to itself, triggering regeneration with the same context but potentially different LLM sampling.

5. Visual Workflow

The complete workflow can be visualized as:

The diagram shows the complete flow of the Agentic RAG system:

Start → Routes to either retrieve (vectorstore) or websearch based on question type
Retrieve → Fetches documents from vectorstore
Grade Documents → Evaluates relevance and decides whether to generate or search web
Web Search → Gathers additional context when needed
Generate → Creates answer from available documents
Quality Checks → Validates answer with three possible outcomes:
- useful: Perfect answer → End
- not useful: Needs more context → Web Search
- not supported: Hallucination detected → Regenerate

Key Features & Benefits

1. Self-Correcting Architecture

The system doesn't just fail when it encounters poor results—it adapts:

Irrelevant documents? → Trigger web search
Hallucinated answer? → Regenerate with better prompts
Answer doesn't address question? → Search for more context

2. Multi-Layered Quality Assurance

Three independent grading mechanisms ensure high-quality outputs:

Document relevance grading before generation
Hallucination detection after generation
Answer usefulness evaluation before returning to user

3. Hybrid Knowledge Access

Seamlessly combines:

Local vectorstore: Fast, domain-specific knowledge
Web search (Tavily): Current information and broader coverage

4. Production-Ready Design

Typed state management with TypedDict
Modular architecture with clear separation of concerns
Structured outputs using Pydantic models
Comprehensive logging for debugging and monitoring
Error handling at each node

5. Observable & Debuggable

Each step prints its decision-making process:

---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---

Data Ingestion & Vector Store

The system uses a carefully curated knowledge base:

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

# Load and split documents
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250,
    chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

# Create vectorstore with Chroma and OpenAI embeddings
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=OpenAIEmbeddings(),
    persist_directory="./.chroma"
)

Implementation Stack

The system leverages modern LLM tooling:

LangGraph: State management and workflow orchestration
LangChain: LLM chains, prompts, and integrations
OpenAI: LLM for generation and structured outputs
Chroma: Vector database for document storage
Tavily: Web search API for external knowledge
Pydantic: Data validation and structured outputs

Real-World Applications

This Agentic RAG architecture is ideal for:

Customer Support Systems: Answer queries using documentation while falling back to web search for unknown topics
Research Assistants: Verify claims against source documents and detect hallucinations
Knowledge Management: Route questions to the right knowledge source (internal docs vs. public web)
Educational Platforms: Ensure factual accuracy in tutoring applications
Enterprise Search: Combine private knowledge bases with public information

Performance Considerations

Optimization Strategies

Parallel Grading: Document relevance can be assessed in parallel
Caching: Cache embeddings and frequent queries
Early Termination: Stop processing irrelevant documents early
Batch Processing: Process multiple questions efficiently

Trade-offs

Latency vs. Quality: Multiple LLM calls increase latency but ensure quality
Cost vs. Accuracy: Each grading step adds cost but reduces errors
Local vs. Web: Balancing fast local retrieval with comprehensive web search

Getting Started

Running the system is straightforward:

# Install dependencies
poetry install

# Set up environment variables
export OPENAI_API_KEY="your_key"
export TAVILY_API_KEY="your_key"

# Run ingestion (first time only)
poetry run python ingestion.py

# Run the agentic RAG
poetry run python main.py

Example usage:

from graph.workflow import app

result = app.invoke(input={"question": "What is agent memory?"})
print(result["generation"])

Lessons Learned

Building this system taught us several valuable lessons:

Quality over Speed: The multiple grading steps significantly improve output quality
Structured Outputs: Using Pydantic for LLM outputs ensures reliable parsing
Conditional Logic: LangGraph's conditional edges enable sophisticated control flows
Observability Matters: Logging each decision helps debug complex workflows
Hybrid Approaches Win: Combining local and web search provides best of both worlds

Future Enhancements

Potential improvements to explore:

Memory: Add conversation memory for multi-turn interactions
Streaming: Implement streaming responses for better UX
Multi-Modal: Support image and video content in retrieval
Personalization: Adapt routing and grading to user preferences
Fine-Tuning: Train custom models for domain-specific grading
A/B Testing: Compare different routing strategies
Feedback Loops: Learn from user corrections and ratings

Conclusion

Agentic RAG represents a significant evolution in how we build AI systems. By incorporating self-reflection, quality assurance, and adaptive behavior, we can create RAG applications that are not just powerful, but reliable and trustworthy.

The system demonstrated here showcases the power of LangGraph for building production-ready agentic applications. The modular architecture, clear separation of concerns, and rigorous quality checks make it an excellent foundation for real-world applications.

Whether you're building customer support bots, research assistants, or knowledge management systems, the principles and patterns shown here provide a solid starting point. The key is to think of your RAG system not as a simple retrieve-and-generate pipeline, but as an intelligent agent that reasons about information quality and adapts its strategy accordingly.

Resources

Code Repository: langgraph-course
LangGraph Documentation: LangGraph Docs
Original Tutorial: Advance RAG control flow with Mistral and LangChain
LangChain Cookbook: Mistral Cookbook

Built with ❤️ using LangGraph, LangChain, and OpenAI