ReWOO vs. ReAct: Which Agent Pattern Should Power Your AI Stack in 2025?

AI teams love ReWOO for speed, ReAct for control. See which wins in 2025—plus get production-hardened code and benchmark truths.

A field guide for engineering leaders architecture trade-offs, production-ready code, and hard-won tips to ship faster (and cheaper)

We’ve all been there: your agent keeps thrashing tools, latency spikes, the token bill looks like a phone number, and the exec ask is still “Can we ship this quarter?” In this guide we go deep on ReAct (reason→act→observe) and ReWOO (plan→parallel work→solve), compare when each pattern shines, and give you drop-in, production-savvy code you can adapt right now. We also fact-check the common claims (token efficiency, accuracy bumps), link the originals, and add the safety rails you’ll wish you’d had last week.

TL;DR (for busy VPs & staff engineers)

ReAct interleaves thinking with tool use. It’s simple, adaptable, and great for interactive tasks. But it can balloon tokens and latency because every tool call pauses the model mid-thought.
ReWOO plans first, runs tools in parallel, then synthesizes an answer from evidence you control. It commonly reduces tokens and wall-clock on multi-tool tasks—but planning quality and validation matter. Paper reports ~5× token efficiency and ~+4% HotpotQA accuracy vs baselines (benchmark-specific; measure on your data).
Rule of thumb: If your task requires multiple tools or long chains, start ReWOO. If it’s highly interactive or step-wise with human feedback, start ReAct. Validate with evals.

1) Concepts in one minute

ReAct (Reason + Act)

The model alternates: Think → Act (tool) → Observe → Think → …, until it decides to finish. It’s intuitive, easy to implement, and works well when the next step depends tightly on the last observation. Downsides: repeated re-prompting and tool latency in the loop.

ReWOO (Reasoning WithOut Observation)

The model first plans the needed steps (and which tools), then you execute steps in parallel, collect evidence, and a final solver composes the answer using only that evidence. Benefits: fewer LLM turns, parallelism, better controllability/citations. Costs: you must validate plans/evidence or risk “confident nonsense.”

2) Production-ready skeletons (drop-in)

Below we show portable patterns. We avoid provider-specific SDK code by injecting a call_llm() adapter (so you can swap OpenAI/Anthropic/Vertex/etc. without rewriting the agent). We enforce JSON I/O, tool allow-lists, timeouts, and budget caps.

Minimal assumptions: each tool has a signature tool(input: str) -> str.

2.1 ReAct loop (safe & sturdy)

import json
from typing import Dict, Callable

class ToolError(Exception): pass

def react_agent(
    question: str,
    tools: Dict[str, Callable[[str], str]],
    call_llm: Callable[[str], str],
    max_turns: int = 6,
    max_tokens_budget: int = 120_000,
) -> str:
    """
    Portable ReAct agent.
    - Enforces JSON output: {"thought": "...", "action": {"tool": "<name>|finish", "input": "<text>"}}
    - Uses an allow-list of tools.
    - Guards against budget blowups and unknown tools.
    """
    transcript: list[dict] = []
    spent_tokens = 0

    for _ in range(max_turns):
        prompt = (
            "You are a ReAct agent. OUTPUT STRICT JSON ONLY:\n"
            '{"thought":"...", "action":{"tool":"<name>|finish","input":"<text>"},'
            '"final_answer":""}\n'
            "Tools: " + ", ".join(sorted(tools.keys())) + "\n"
            "Rules: Prefer minimal steps. If answerable now, use tool='finish'.\n"
            f"Transcript: {json.dumps(transcript)[:4000]}\n"
            f"Question: {question}\n"
        )
        step = call_llm(prompt)                          # provider-specific under the hood
        spent_tokens += len(step)                        # rough proxy; replace with provider usage
        if spent_tokens > max_tokens_budget:
            return "Budget exceeded; partial transcript suppressed."

        try:
            data = json.loads(step)
            action = data["action"]
        except Exception as e:
            return f"Malformed model output: {e}"

        if action["tool"] == "finish":
            return data.get("final_answer") or action.get("input", "")

        tool_name = action["tool"]
        tool = tools.get(tool_name)
        if not tool:
            raise ToolError(f"Unknown tool: {tool_name}")

        try:
            observation = tool(action["input"])
        except Exception as e:
            observation = f"TOOL_ERROR: {e}"

        transcript.append(
            {"thought": data.get("thought", ""), "tool": tool_name, "input": action["input"], "observation": observation}
        )

    return "No answer after max_turns."

Why this holds up in prod: JSON schema (less regex sadness), explicit allow-list, simple “token budget” fuse, and resilience to tool failures. Matches the ReAct idea from the paper; you supply better prompts/tools as needed.

2.2 ReWOO (plan → parallel work → solve), with validation

from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Dict, Callable, Any
import json, time

def plan(question: str, call_llm: Callable[[str], str], allowed_tools: set[str]) -> dict:
    schema = (
        "Return JSON ONLY:\n"
        '{"steps":[{"id":"s1","tool":"<ALLOWED_TOOL>","input":"..."}],'
        '"final":"Describe how to combine step IDs, e.g. [s1][s3]."}'
    )
    raw = call_llm(
        "You are a planner. Create the MINIMAL set of steps to answer.\n"
        f"Allowed tools: {sorted(allowed_tools)}\n{schema}\nQuestion: {question}"
    )
    p = json.loads(raw)
    # Basic validation
    assert isinstance(p.get("steps"), list) and p["steps"], "Empty plan"
    for s in p["steps"]:
        assert s["tool"] in allowed_tools, f"Unknown tool: {s['tool']}"
        assert s["id"].startswith("s"), "Each step needs an id like 's1'"
    return p

def run_workers(p: dict, tools: Dict[str, Callable[[str], str]], timeout_s: float = 20.0) -> dict[str, Any]:
    results: dict[str, Any] = {}
    with ThreadPoolExecutor(max_workers=min(8, len(p["steps"]))) as ex:
        futs = {ex.submit(tools[s["tool"]], s["input"]): s["id"] for s in p["steps"]}
        for f in as_completed(futs, timeout=timeout_s):
            sid = futs[f]
            try:
                results[sid] = f.result(timeout=timeout_s)
            except Exception as e:
                results[sid] = f"ERROR: {e}"
    return results

def solve(question: str, p: dict, evidence: dict[str, Any], call_llm: Callable[[str], str]) -> str:
    # Enforce evidence-only answering; require step-id citations.
    ctx = {"question": question, "plan": p, "evidence": evidence}
    answer = call_llm(
        "You are a solver. Use ONLY the evidence below. Cite step IDs like [s1]. "
        "If evidence is missing, say so and stop.\n" + json.dumps(ctx)
    )
    # Optional: verify cited IDs exist.
    cited = {sid for sid in evidence.keys() if f"[{sid}]" in answer}
    if not cited and evidence:
        answer = "No valid citations found; refusing."
    return answer

def rewoo_agent(question: str, tools: Dict[str, Callable[[str], str]], call_llm: Callable[[str], str]) -> str:
    allowed = set(tools.keys())
    p = plan(question, call_llm, allowed_tools=allowed)
    ev = run_workers(p, tools)
    return solve(question, p, ev, call_llm)

Why this works in practice: single LLM turn to plan, parallel tool fan-out, one LLM turn to synthesize with explicit citations. This reflects the ReWOO paper’s decoupling and tends to cut both tokens and wall-clock on multi-step tasks. Your mileage varies with planning quality and tool latency.

3) When to choose which (and why)

When to choose ReAct vs ReWOO (summary)
Scenario	Choose	Why
Conversational agents; step depends on last observation	ReAct	Fine-grained adaptivity; humans-in-the-loop flow well. (ReAct paper)
Multi-tool pipelines; long chains; strict provenance	ReWOO	Parallel tool calls, fewer LLM turns, evidence-bounded answers. (ReWOO paper)
Latency or token budget is tight	ReWOO (often)	Planning compresses tokens; parallel I/O reduces wall-clock time. (Benchmark-dependent.) (ReWOO paper)
Messy tasks with unknown next step	ReAct	Interleaving reasoning and acting lets the model adapt using the latest observation. (ReAct blog)

About the famous numbers: The ReWOO paper reports ~5× token efficiency and ~+4% HotpotQA accuracy vs baselines. Treat these as illustrative, benchmark-specific—you must validate on your workloads.

4) Implementation patterns we actually recommend

4.1 Safety rails (both patterns)

Tool allow-list & per-tool validators. Never allow arbitrary function names; validate inputs (URLs, SQL, shell).
Timeouts, retries, and budgets. Guard the loop with token/time caps and circuit-breakers.
Evidence-only composing. In ReWOO, require step-ID citations and verify them; refuse if missing.
Prompt-injection hardening. Don’t pass raw webpage/tool output into the model; sanitize and/or extract structured fields first.
Observability. Log plan, tool calls (with durations), usage, and final citations for every run (PII-safe).

4.2 LangGraph & “plan-and-execute”

If you prefer graph primitives, LangGraph ships a plan-and-execute tutorial and examples that mirror this guide. The ergonomics are good for state machines, retries, and eventing.

5) Practical use cases (with patterns)

A. Web research + code analysis (3 tools)

Tools: web_search(q), fetch(url), python_sandbox(code)
Pattern: ReWOO. Plan first, run fetch for the top K URLs in parallel, then summarize with citations like [s2].
Why: Parallel I/O dominates; single synth pass is cheaper and more controllable.

B. Interactive data cleaning with a human

Tools: df_preview(), apply_transform(expr)
Pattern: ReAct. Human feedback changes the next action (“Undo that,” “try regex capture”), so interleaving wins.

C. Ops runbooks (incident management)

Tools: pagerduty.search, k8s.logs, k8s.rollout
Pattern: Hybrid. Use ReWOO for the fan-out evidence collection step, then hand off to a ReAct loop with a human for remediation.

6) Benchmarks, claims, and reality

ReAct is established; great for QA (HotpotQA/FEVER) and decision-making tasks (ALFWorld/WebShop).
ReWOO introduced the decoupled plan/work/solve idea and reports efficiency/accuracy gains, plus robustness to tool failure. Treat as promising—but verify end-to-end on your infra and tools.
LangGraph “planning agents.” Official guidance suggests plan-and-execute can be faster/cheaper for many tasks; again, data- and tool-dependent.

7) Copy-paste adapters (wire any provider)

These helpers make the above agent code portable:

# Example adapters you can implement once per provider

def call_llm_openai(prompt: str) -> str:
    # from openai import OpenAI
    # client = OpenAI()
    # resp = client.chat.completions.create(
    #     model="gpt-4.1-mini",
    #     messages=[{"role":"user","content":prompt}],
    # )
    # return resp.choices[0].message.content
    raise NotImplementedError

def call_llm_anthropic(prompt: str) -> str:
    # from anthropic import Anthropic
    # client = Anthropic()
    # resp = client.messages.create(model="claude-3-5-sonnet-20241022",
    #     messages=[{"role":"user","content":prompt}],
    #     max_tokens=800)
    # return resp.content[0].text
    raise NotImplementedError

(API method names evolve—keep this adapter layer so agent code stays stable.)

8) Dev checklist

Choose pattern (ReAct vs ReWOO) based on tool fan-out and interactivity.
Enforce JSON contracts for model output.
Create a tool registry (name → function; validators; timeouts).
Add token & time budgets; log every step.
For ReWOO, require citations and verify them before returning.
Ship with evals (golden tasks + ops metrics: p95 latency, cost).
Reassess regularly; some workflows migrate from ReAct → ReWOO as they stabilize.

Key takeaways

Use ReAct when the next step depends on the last observation (and/or humans are in the loop).
Use ReWOO when you can plan, fan-out tools in parallel, and compose from evidence.
Demand JSON, allow-lists, timeouts, budgets, and verified citations.
Treat benchmark gains as signals, not guarantees. Measure on your stack.

Both ReAct and ReWOO are must-know patterns. ReAct gives you agility; ReWOO gives you scale. Great agent stacks in 2025 will use both, with sensible routing, ruthless observability, and strong guardrails.

— Cohorte Team
October 27, 2025