🧩 ReAct Prompting - Bridging Reasoning and Action in LLMs

πŸ” Teaching LLMs to Think, Act, and Verify
single

Introduction

Retrieval-Augmented Generation (RAG) has revolutionized how we build AI applications that need to answer questions based on specific knowledge bases. However, traditional RAG systems often suffer from a critical limitation: they blindly trust retrieved documents and generate answers without verifying relevance or accuracy. This is where Agentic RAG comes inβ€”a self-reflective, adaptive approach that makes intelligent decisions about information retrieval, validates document relevance, detects hallucinations, and dynamically switches between local knowledge and web search.

In this blog post, we'll explore a production-ready implementation of an Agentic RAG system built with LangGraph, demonstrating how to create a sophisticated AI agent that doesn't just retrieve and generateβ€”it thinks, evaluates, and adapts.

What is Agentic RAG?

Agentic RAG is an advanced form of RAG that incorporates autonomous decision-making capabilities into the retrieval and generation pipeline. Unlike traditional RAG systems that follow a linear path (retrieve β†’ generate), Agentic RAG implements:

  • 🎯 Intelligent Routing: Determines whether to use local knowledge or web search based on the query type
  • πŸ“Š Document Relevance Grading: Evaluates whether retrieved documents actually answer the question
  • πŸ” Hallucination Detection: Verifies that generated answers are grounded in retrieved facts
  • πŸ”„ Adaptive Behavior: Dynamically adjusts its strategy based on quality checks
  • 🌐 Multi-Source Retrieval: Seamlessly combines local vectorstore and web search results

System Architecture

Our Agentic RAG system is built on LangGraph, a framework for creating stateful, multi-actor applications with LLMs. The architecture consists of several key components:

Core Components

1. State Management

The system maintains a typed state throughout the workflow:

class GraphState(TypedDict):
    question: str        # User's query
    generation: str      # LLM-generated answer
    web_search: bool     # Flag indicating if web search is needed
    documents: List[str] # Retrieved or searched documents

This state flows through the entire pipeline, ensuring each node has access to the information it needs.

2. Graph Nodes

The system implements four primary nodes, each with a specific responsibility:

Retrieve Node

Fetches relevant documents from a local vector database (Chroma) using OpenAI embeddings.

def retrieve(state: GraphState) -> Dict[str, Any]:
    question = state["question"]
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}
Grade Documents Node

Evaluates each retrieved document for relevance using an LLM-based grader. If any document is irrelevant, it sets a flag to trigger web search.

def grade_documents(state: GraphState) -> Dict[str, Any]:
    question = state["question"]
    documents = state["documents"]

    filtered_docs = []
    web_search = False

    for doc in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": doc.page_content}
        )
        if score.binary_score.lower() == "yes":
            filtered_docs.append(doc)
        else:
            web_search = True

    return {
        "documents": filtered_docs,
        "question": question,
        "web_search": web_search
    }
Generate Node

Creates an answer based on the context (documents) and question using a RAG prompt template.

def generate(state: GraphState) -> Dict[str, Any]:
    question = state["question"]
    documents = state["documents"]

    generation = generation_chain.invoke(
        {"context": documents, "question": question}
    )
    return {
        "documents": documents,
        "question": question,
        "generation": generation
    }
Web Search Node

Performs web search using Tavily API when local knowledge is insufficient.

def web_search(state: GraphState) -> Dict[str, Any]:
    question = state["question"]
    documents = state.get("documents", [])

    tavily_results = web_search_tool.invoke({"query": question})["results"]
    joined_result = "\n".join([r["content"] for r in tavily_results])
    web_results = Document(page_content=joined_result)

    documents.append(web_results)
    return {"documents": documents, "question": question}

3. LLM-Powered Chains

The system uses specialized LLM chains for various evaluation tasks:

Question Router

Intelligently routes questions to either the vectorstore or web search based on content.

class RouteQuery(BaseModel):
    datasource: Literal["vectorstore", "websearch"] = Field(
        description="Route to web search or vectorstore"
    )

system_prompt = """You are an expert at routing a user question to a
vectorstore or web search. The vectorstore contains documents related to
agents, prompt engineering, and adversarial attacks. Use the vectorstore
for questions on these topics. For all else, use web-search."""
Retrieval Grader

Assesses whether a retrieved document is relevant to the question.

class GradeDocuments(BaseModel):
    binary_score: str = Field(
        description="Documents are relevant to the question, 'yes' or 'no'"
    )

system_prompt = """You are a grader assessing relevance of a retrieved
document to a user question. If the document contains keywords or semantic
meaning related to the question, grade it as relevant."""
Hallucination Grader

Verifies that the generated answer is grounded in the retrieved facts.

class GradeHallucinations(BaseModel):
    binary_score: bool = Field(
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )

system_prompt = """You are a grader assessing whether an LLM generation
is grounded in / supported by a set of retrieved facts. Give a binary
score 'yes' or 'no'."""
Answer Grader

Checks whether the answer actually addresses the user's question.

class GradeAnswer(BaseModel):
    binary_score: bool = Field(
        description="Answer addresses the question, 'yes' or 'no'"
    )

system_prompt = """You are a grader assessing whether an answer addresses /
resolves a question. Give a binary score 'yes' or 'no'."""

The Workflow: How It All Comes Together

The beauty of this system lies in its conditional workflow. Here's how the graph executes:

1. Entry Point: Intelligent Routing

def route_question(state: GraphState) -> str:
    question = state["question"]
    source = question_router.invoke({"question": question})

    if source.datasource == "websearch":
        return WEBSEARCH
    else:
        return RETRIEVE

When a question arrives, the router analyzes it and decides:

  • Vectorstore: For domain-specific questions (agents, prompt engineering, adversarial attacks)
  • Web Search: For general knowledge or current events

2. Retrieval & Grading Path

If routed to the vectorstore:

  1. Retrieve documents from the vector database
  2. Grade Documents for relevance
  3. Decision Point:
    • If all documents are relevant β†’ proceed to Generate
    • If any document is irrelevant β†’ trigger Web Search
def decide_to_generate(state):
    if state["web_search"]:
        return WEBSEARCH  # Need more information
    else:
        return GENERATE   # Have enough relevant docs

3. Generation & Quality Control

After generating an answer, the system performs rigorous quality checks:

def grade_generation_grounded_in_documents_and_question(state: GraphState) -> str:
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    # Check for hallucinations
    hallucination_score = hallucination_grader.invoke({
        "documents": documents,
        "generation": generation
    })

    if hallucination_score.binary_score:
        # Answer is grounded, now check if it addresses the question
        answer_score = answer_grader.invoke({
            "question": question,
            "generation": generation
        })

        if answer_score.binary_score:
            return "useful"          # Perfect answer
        else:
            return "not useful"      # Grounded but doesn't answer
    else:
        return "not supported"       # Hallucination detected

Based on the quality checks:

  • "useful": Answer is perfect β†’ END
  • "not useful": Need better context β†’ Web Search
  • "not supported": Hallucination detected β†’ Regenerate

4. Building the Workflow Graph

LangGraph's StateGraph provides a powerful API for constructing the workflow. Here's how we wire everything together:

from langgraph.graph import StateGraph, END

workflow = StateGraph(GraphState)

# Add all nodes to the graph
workflow.add_node(RETRIEVE, retrieve)
workflow.add_node(GRADE_DOCUMENTS, grade_documents)
workflow.add_node(GENERATE, generate)
workflow.add_node(WEBSEARCH, web_search)

# Set conditional entry point - routes to either retrieve or websearch
workflow.set_conditional_entry_point(
    route_question,
    {
        WEBSEARCH: WEBSEARCH,
        RETRIEVE: RETRIEVE,
    },
)

# Retrieve always flows to grade_documents
workflow.add_edge(RETRIEVE, GRADE_DOCUMENTS)

# Grade documents conditionally routes to generate or websearch
workflow.add_conditional_edges(
    GRADE_DOCUMENTS,
    decide_to_generate,
    {
        WEBSEARCH: WEBSEARCH,
        GENERATE: GENERATE,
    },
)

# Generate has three possible outcomes
workflow.add_conditional_edges(
    GENERATE,
    grade_generation_grounded_in_documents_and_question,
    {
        "not supported": GENERATE,  # Regenerate if hallucination
        "useful": END,              # End if answer is good
        "not useful": WEBSEARCH,    # Search web if not useful
    },
)

# Websearch always flows to generate
workflow.add_edge(WEBSEARCH, GENERATE)

# Compile the graph into an executable application
app = workflow.compile()

Key Concepts:

  • add_node(): Registers a function as a node in the graph
  • set_conditional_entry_point(): Defines how the graph starts based on a routing function
  • add_edge(): Creates a deterministic transition between nodes
  • add_conditional_edges(): Creates dynamic transitions based on a decision function
  • compile(): Converts the graph definition into an executable application

Note the elegant handling of the self-correcting loop: when GENERATE detects a hallucination ("not supported"), it routes back to itself, triggering regeneration with the same context but potentially different LLM sampling.

5. Visual Workflow

The complete workflow can be visualized as:

The diagram shows the complete flow of the Agentic RAG system:

  1. Start β†’ Routes to either retrieve (vectorstore) or websearch based on question type
  2. Retrieve β†’ Fetches documents from vectorstore
  3. Grade Documents β†’ Evaluates relevance and decides whether to generate or search web
  4. Web Search β†’ Gathers additional context when needed
  5. Generate β†’ Creates answer from available documents
  6. Quality Checks β†’ Validates answer with three possible outcomes:
    • useful: Perfect answer β†’ End
    • not useful: Needs more context β†’ Web Search
    • not supported: Hallucination detected β†’ Regenerate

Key Features & Benefits

1. Self-Correcting Architecture

The system doesn't just fail when it encounters poor resultsβ€”it adapts:

  • Irrelevant documents? β†’ Trigger web search
  • Hallucinated answer? β†’ Regenerate with better prompts
  • Answer doesn't address question? β†’ Search for more context

2. Multi-Layered Quality Assurance

Three independent grading mechanisms ensure high-quality outputs:

  1. Document relevance grading before generation
  2. Hallucination detection after generation
  3. Answer usefulness evaluation before returning to user

3. Hybrid Knowledge Access

Seamlessly combines:

  • Local vectorstore: Fast, domain-specific knowledge
  • Web search (Tavily): Current information and broader coverage

4. Production-Ready Design

  • Typed state management with TypedDict
  • Modular architecture with clear separation of concerns
  • Structured outputs using Pydantic models
  • Comprehensive logging for debugging and monitoring
  • Error handling at each node

5. Observable & Debuggable

Each step prints its decision-making process:

---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---
---RETRIEVE---
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---

Data Ingestion & Vector Store

The system uses a carefully curated knowledge base:

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

# Load and split documents
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250,
    chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

# Create vectorstore with Chroma and OpenAI embeddings
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=OpenAIEmbeddings(),
    persist_directory="./.chroma"
)

Implementation Stack

The system leverages modern LLM tooling:

  • LangGraph: State management and workflow orchestration
  • LangChain: LLM chains, prompts, and integrations
  • OpenAI: LLM for generation and structured outputs
  • Chroma: Vector database for document storage
  • Tavily: Web search API for external knowledge
  • Pydantic: Data validation and structured outputs

Real-World Applications

This Agentic RAG architecture is ideal for:

  1. Customer Support Systems: Answer queries using documentation while falling back to web search for unknown topics
  2. Research Assistants: Verify claims against source documents and detect hallucinations
  3. Knowledge Management: Route questions to the right knowledge source (internal docs vs. public web)
  4. Educational Platforms: Ensure factual accuracy in tutoring applications
  5. Enterprise Search: Combine private knowledge bases with public information

Performance Considerations

Optimization Strategies

  1. Parallel Grading: Document relevance can be assessed in parallel
  2. Caching: Cache embeddings and frequent queries
  3. Early Termination: Stop processing irrelevant documents early
  4. Batch Processing: Process multiple questions efficiently

Trade-offs

  • Latency vs. Quality: Multiple LLM calls increase latency but ensure quality
  • Cost vs. Accuracy: Each grading step adds cost but reduces errors
  • Local vs. Web: Balancing fast local retrieval with comprehensive web search

Getting Started

Running the system is straightforward:

# Install dependencies
poetry install

# Set up environment variables
export OPENAI_API_KEY="your_key"
export TAVILY_API_KEY="your_key"

# Run ingestion (first time only)
poetry run python ingestion.py

# Run the agentic RAG
poetry run python main.py

Example usage:

from graph.workflow import app

result = app.invoke(input={"question": "What is agent memory?"})
print(result["generation"])

Lessons Learned

Building this system taught us several valuable lessons:

  1. Quality over Speed: The multiple grading steps significantly improve output quality
  2. Structured Outputs: Using Pydantic for LLM outputs ensures reliable parsing
  3. Conditional Logic: LangGraph's conditional edges enable sophisticated control flows
  4. Observability Matters: Logging each decision helps debug complex workflows
  5. Hybrid Approaches Win: Combining local and web search provides best of both worlds

Future Enhancements

Potential improvements to explore:

  • Memory: Add conversation memory for multi-turn interactions
  • Streaming: Implement streaming responses for better UX
  • Multi-Modal: Support image and video content in retrieval
  • Personalization: Adapt routing and grading to user preferences
  • Fine-Tuning: Train custom models for domain-specific grading
  • A/B Testing: Compare different routing strategies
  • Feedback Loops: Learn from user corrections and ratings

Conclusion

Agentic RAG represents a significant evolution in how we build AI systems. By incorporating self-reflection, quality assurance, and adaptive behavior, we can create RAG applications that are not just powerful, but reliable and trustworthy.

The system demonstrated here showcases the power of LangGraph for building production-ready agentic applications. The modular architecture, clear separation of concerns, and rigorous quality checks make it an excellent foundation for real-world applications.

Whether you're building customer support bots, research assistants, or knowledge management systems, the principles and patterns shown here provide a solid starting point. The key is to think of your RAG system not as a simple retrieve-and-generate pipeline, but as an intelligent agent that reasons about information quality and adapts its strategy accordingly.

Resources


Built with ❀️ using LangGraph, LangChain, and OpenAI

thongvmdev_M9VMOt
WRITTEN BY

thongvmdev

Share and grow together