LLMsAI AgentsRAG 2026-05-28

ReAct and RAG: Giving LLMs Access to the External World

An LLM's knowledge stops at its training cutoff and it can't access your private data. ReAct and RAG are the two prompt engineering frameworks that fix both problems — turning a plain LLM into an agent that can act and retrieve.

This is Part 10 of the AI Agents series and the final part of the prompt engineering sub-series. Parts 8–9 covered Zero-Shot, Few-Shot, Chain-of-Thought, Self-Consistency, and Tree of Thoughts — techniques for improving how an LLM reasons over the information you give it.

This post covers the next level: giving the LLM access to information it doesn’t have at all — live data from the internet, real-time APIs, and your own private documents.


1. The two gaps prompting alone can’t fix

Standard prompting — even with advanced techniques — is still limited to what the model learned during training:

  • Temporal gap: Training data has a cutoff. Ask an LLM for last weekend’s box office results or the current temperature in Hyderabad and it cannot answer accurately. It will guess.
  • Private knowledge gap: Your company’s internal documents, policies, and data were never in the training set. Ask about “Nerchuko’s remote work policy” and it will hallucinate a plausible-sounding but wrong answer.

Two frameworks address these gaps: ReAct for live external data, RAG for private internal knowledge.


2. ReAct: Reason + Act

ReAct (Reason + Act) gives an LLM access to external tools — web search, APIs, databases, calculators — and lets it decide when and how to use them.

The model doesn’t just generate text. It operates in a loop:

Reason  →  Act (call a tool)  →  Observe (read the result)  →  Reason again

It keeps looping until it has enough information to answer confidently.

Example — current weather:

User: What's the current temperature in Hyderabad?

Reason: I don't have real-time weather data. I need to call a weather API.
Act: call weather_api(city="Hyderabad")
Observe: {"temperature": 32, "condition": "partly cloudy"}
Reason: I now have the data. I can answer.
Answer: It's currently 32°C and partly cloudy in Hyderabad.

Without ReAct, the model guesses a temperature based on seasonal patterns in its training data. With ReAct, it fetches the actual current reading.


Implementing ReAct with tool calling

Most LLM providers support tool calling natively. You define the tools available, and the model decides when to call them.

import os
import json
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

# Define the tools the model can use
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# Simulate the tool's actual implementation
def get_weather(city: str) -> dict:
    # In production: call a real weather API here
    return {"city": city, "temperature": 32, "condition": "partly cloudy"}

messages = [{"role": "user", "content": "What's the current temperature in Hyderabad?"}]

# First call: model reasons and decides to call the tool
response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=messages,
    tools=tools
)

# Check if the model wants to call a tool
tool_call = response.choices[0].message.tool_calls
if tool_call:
    # Execute the tool
    args = json.loads(tool_call[0].function.arguments)
    result = get_weather(**args)

    # Feed the result back to the model
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call[0].id,
        "content": json.dumps(result)
    })

    # Second call: model observes the result and generates final answer
    final_response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=messages
    )
    print(final_response.choices[0].message.content)

The model handles the Reason and Observe steps. You implement the actual tool functions. For a real application, get_weather would call a live weather API instead of returning hardcoded data.


When to use ReAct

ReAct is the right choice when answers require:

  • Real-time data: weather, stock prices, sports scores, news
  • Computation: a calculator tool for math that needs guaranteed accuracy
  • External actions: sending emails, querying databases, running code
  • Information beyond the training cutoff: anything recent

3. RAG: Retrieval-Augmented Generation

RAG solves the private knowledge problem. Instead of fine-tuning a model on your internal documents (expensive, slow, and stale the moment documents update), RAG retrieves the relevant documents at query time and includes them in the prompt.

Think of it as an open-book test. The LLM doesn’t need to have memorized your company handbook — it just needs to be handed the right page before it answers.


How RAG works

User query


Convert query to vector embedding


Search vector database for similar document chunks


Retrieve top-k matching chunks


Inject chunks into prompt as context


LLM answers based only on the provided context

The vector database stores your documents as numerical embeddings — representations of meaning rather than exact text. When a question comes in, it’s embedded the same way and matched against stored chunks by similarity. Only the most relevant chunks are retrieved.

This keeps the context window manageable. You’re not dumping an entire 500-page handbook into every prompt — you’re retrieving the 2–3 sections that actually answer the question.


RAG prompt pattern

Once you’ve retrieved the relevant document chunks, the prompt structure is straightforward:

import os
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

def answer_from_context(question: str, retrieved_chunks: list[str]) -> str:
    context = "\n\n".join(retrieved_chunks)

    prompt = f"""Answer the question using only the information provided in the context below.
If the answer is not in the context, say "I don't have that information."
Do not use any outside knowledge.

Context:
{context}

Question: {question}"""

    response = client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": prompt}]
    )

    return response.choices[0].message.content


# Example: company policy documents retrieved from vector DB
retrieved = [
    "Nerchuko employees are entitled to 22 paid holidays per year.",
    "Holidays include national public holidays and company-specific days as announced annually."
]

print(answer_from_context("How many holidays do Nerchuko employees get?", retrieved))

The instruction "Do not use any outside knowledge" is critical. Without it, the model may blend retrieved content with its training data and hallucinate. The constraint makes answers traceable to source documents.


Why not just fine-tune instead?

Fine-tuning bakes knowledge into the model weights. RAG keeps it in documents.

Fine-tuningRAG
Update documentsRetrain the modelUpdate the database
CostHighLow
Answer traceabilityHardEasy (you know the source chunk)
Good forBehavior/style changesKnowledge/facts

For knowledge that changes (policies, product docs, FAQs), RAG is almost always the better choice. Fine-tuning is for changing how the model behaves, not what it knows.


4. ReAct vs RAG: which one to use

ReActRAG
ProblemReal-time or external dataPrivate or internal knowledge
Data sourceAPIs, web, toolsYour own documents
LatencyDepends on tool response timeDepends on vector search speed
Hallucination riskLow (grounded in tool results)Low (grounded in retrieved docs)
Use case”What’s the weather?""What’s our refund policy?”

They’re not mutually exclusive. A real AI agent often uses both: RAG for internal knowledge and ReAct for anything requiring live external data.


5. The full picture: plain LLM → AI Agent

Looking back at the full series:

  • Parts 1–5: What LLMs are, how to use APIs, how to control output
  • Parts 6–7: Open-source models via Groq and locally via Ollama
  • Parts 8–10: Prompt engineering — from basic prompting to tool use and retrieval

The progression from Parts 8–10 specifically traces how prompts evolve:

  1. Zero/Few-Shot — tell the model what format you want
  2. Chain-of-Thought — tell the model how to reason
  3. ReAct + RAG — give the model external capabilities

At this point, you have a complete foundation. An LLM with a well-structured prompt, tool access via ReAct, and a document retrieval layer via RAG is, functionally, an AI agent — it can reason, act, retrieve, and respond accurately on live and private information.


What’s next

Part 11 goes deep on RAG specifically — how the retrieval layer actually works: chunking strategies, vector embeddings, cosine similarity vs Euclidean distance, and a full implementation using ChromaDB.

Full video walkthrough is embedded above.

Nerchuko Academy · Free DS Interview Prep