ReWOO vs. ReAct: Which Agent Pattern Should Power Your AI Stack in 2025?

A field guide for engineering leaders architecture trade-offs, production-ready code, and hard-won tips to ship faster (and cheaper)
We’ve all been there: your agent keeps thrashing tools, latency spikes, the token bill looks like a phone number, and the exec ask is still “Can we ship this quarter?” In this guide we go deep on ReAct (reason→act→observe) and ReWOO (plan→parallel work→solve), compare when each pattern shines, and give you drop-in, production-savvy code you can adapt right now. We also fact-check the common claims (token efficiency, accuracy bumps), link the originals, and add the safety rails you’ll wish you’d had last week.
TL;DR (for busy VPs & staff engineers)
- ReAct interleaves thinking with tool use. It’s simple, adaptable, and great for interactive tasks. But it can balloon tokens and latency because every tool call pauses the model mid-thought.
- ReWOO plans first, runs tools in parallel, then synthesizes an answer from evidence you control. It commonly reduces tokens and wall-clock on multi-tool tasks—but planning quality and validation matter. Paper reports ~5× token efficiency and ~+4% HotpotQA accuracy vs baselines (benchmark-specific; measure on your data).
- Rule of thumb: If your task requires multiple tools or long chains, start ReWOO. If it’s highly interactive or step-wise with human feedback, start ReAct. Validate with evals.
1) Concepts in one minute
ReAct (Reason + Act)
The model alternates: Think → Act (tool) → Observe → Think → …, until it decides to finish. It’s intuitive, easy to implement, and works well when the next step depends tightly on the last observation. Downsides: repeated re-prompting and tool latency in the loop.
ReWOO (Reasoning WithOut Observation)
The model first plans the needed steps (and which tools), then you execute steps in parallel, collect evidence, and a final solver composes the answer using only that evidence. Benefits: fewer LLM turns, parallelism, better controllability/citations. Costs: you must validate plans/evidence or risk “confident nonsense.”
2) Production-ready skeletons (drop-in)
Below we show portable patterns. We avoid provider-specific SDK code by injecting a call_llm() adapter (so you can swap OpenAI/Anthropic/Vertex/etc. without rewriting the agent). We enforce JSON I/O, tool allow-lists, timeouts, and budget caps.
Minimal assumptions: each tool has a signature tool(input: str) -> str.
2.1 ReAct loop (safe & sturdy)
import json
from typing import Dict, Callable
class ToolError(Exception): pass
def react_agent(
question: str,
tools: Dict[str, Callable[[str], str]],
call_llm: Callable[[str], str],
max_turns: int = 6,
max_tokens_budget: int = 120_000,
) -> str:
"""
Portable ReAct agent.
- Enforces JSON output: {"thought": "...", "action": {"tool": "<name>|finish", "input": "<text>"}}
- Uses an allow-list of tools.
- Guards against budget blowups and unknown tools.
"""
transcript: list[dict] = []
spent_tokens = 0
for _ in range(max_turns):
prompt = (
"You are a ReAct agent. OUTPUT STRICT JSON ONLY:\n"
'{"thought":"...", "action":{"tool":"<name>|finish","input":"<text>"},'
'"final_answer":""}\n'
"Tools: " + ", ".join(sorted(tools.keys())) + "\n"
"Rules: Prefer minimal steps. If answerable now, use tool='finish'.\n"
f"Transcript: {json.dumps(transcript)[:4000]}\n"
f"Question: {question}\n"
)
step = call_llm(prompt) # provider-specific under the hood
spent_tokens += len(step) # rough proxy; replace with provider usage
if spent_tokens > max_tokens_budget:
return "Budget exceeded; partial transcript suppressed."
try:
data = json.loads(step)
action = data["action"]
except Exception as e:
return f"Malformed model output: {e}"
if action["tool"] == "finish":
return data.get("final_answer") or action.get("input", "")
tool_name = action["tool"]
tool = tools.get(tool_name)
if not tool:
raise ToolError(f"Unknown tool: {tool_name}")
try:
observation = tool(action["input"])
except Exception as e:
observation = f"TOOL_ERROR: {e}"
transcript.append(
{"thought": data.get("thought", ""), "tool": tool_name, "input": action["input"], "observation": observation}
)
return "No answer after max_turns."Why this holds up in prod: JSON schema (less regex sadness), explicit allow-list, simple “token budget” fuse, and resilience to tool failures. Matches the ReAct idea from the paper; you supply better prompts/tools as needed.
2.2 ReWOO (plan → parallel work → solve), with validation
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Dict, Callable, Any
import json, time
def plan(question: str, call_llm: Callable[[str], str], allowed_tools: set[str]) -> dict:
schema = (
"Return JSON ONLY:\n"
'{"steps":[{"id":"s1","tool":"<ALLOWED_TOOL>","input":"..."}],'
'"final":"Describe how to combine step IDs, e.g. [s1][s3]."}'
)
raw = call_llm(
"You are a planner. Create the MINIMAL set of steps to answer.\n"
f"Allowed tools: {sorted(allowed_tools)}\n{schema}\nQuestion: {question}"
)
p = json.loads(raw)
# Basic validation
assert isinstance(p.get("steps"), list) and p["steps"], "Empty plan"
for s in p["steps"]:
assert s["tool"] in allowed_tools, f"Unknown tool: {s['tool']}"
assert s["id"].startswith("s"), "Each step needs an id like 's1'"
return p
def run_workers(p: dict, tools: Dict[str, Callable[[str], str]], timeout_s: float = 20.0) -> dict[str, Any]:
results: dict[str, Any] = {}
with ThreadPoolExecutor(max_workers=min(8, len(p["steps"]))) as ex:
futs = {ex.submit(tools[s["tool"]], s["input"]): s["id"] for s in p["steps"]}
for f in as_completed(futs, timeout=timeout_s):
sid = futs[f]
try:
results[sid] = f.result(timeout=timeout_s)
except Exception as e:
results[sid] = f"ERROR: {e}"
return results
def solve(question: str, p: dict, evidence: dict[str, Any], call_llm: Callable[[str], str]) -> str:
# Enforce evidence-only answering; require step-id citations.
ctx = {"question": question, "plan": p, "evidence": evidence}
answer = call_llm(
"You are a solver. Use ONLY the evidence below. Cite step IDs like [s1]. "
"If evidence is missing, say so and stop.\n" + json.dumps(ctx)
)
# Optional: verify cited IDs exist.
cited = {sid for sid in evidence.keys() if f"[{sid}]" in answer}
if not cited and evidence:
answer = "No valid citations found; refusing."
return answer
def rewoo_agent(question: str, tools: Dict[str, Callable[[str], str]], call_llm: Callable[[str], str]) -> str:
allowed = set(tools.keys())
p = plan(question, call_llm, allowed_tools=allowed)
ev = run_workers(p, tools)
return solve(question, p, ev, call_llm)Why this works in practice: single LLM turn to plan, parallel tool fan-out, one LLM turn to synthesize with explicit citations. This reflects the ReWOO paper’s decoupling and tends to cut both tokens and wall-clock on multi-step tasks. Your mileage varies with planning quality and tool latency.
3) When to choose which (and why)
About the famous numbers: The ReWOO paper reports ~5× token efficiency and ~+4% HotpotQA accuracy vs baselines. Treat these as illustrative, benchmark-specific—you must validate on your workloads.
4) Implementation patterns we actually recommend
4.1 Safety rails (both patterns)
- Tool allow-list & per-tool validators. Never allow arbitrary function names; validate inputs (URLs, SQL, shell).
- Timeouts, retries, and budgets. Guard the loop with token/time caps and circuit-breakers.
- Evidence-only composing. In ReWOO, require step-ID citations and verify them; refuse if missing.
- Prompt-injection hardening. Don’t pass raw webpage/tool output into the model; sanitize and/or extract structured fields first.
- Observability. Log plan, tool calls (with durations), usage, and final citations for every run (PII-safe).
4.2 LangGraph & “plan-and-execute”
If you prefer graph primitives, LangGraph ships a plan-and-execute tutorial and examples that mirror this guide. The ergonomics are good for state machines, retries, and eventing.
5) Practical use cases (with patterns)
A. Web research + code analysis (3 tools)
- Tools:
web_search(q),fetch(url),python_sandbox(code) - Pattern: ReWOO. Plan first, run
fetchfor the top K URLs in parallel, then summarize with citations like[s2]. - Why: Parallel I/O dominates; single synth pass is cheaper and more controllable.
B. Interactive data cleaning with a human
- Tools:
df_preview(),apply_transform(expr) - Pattern: ReAct. Human feedback changes the next action (“Undo that,” “try regex capture”), so interleaving wins.
C. Ops runbooks (incident management)
- Tools:
pagerduty.search,k8s.logs,k8s.rollout - Pattern: Hybrid. Use ReWOO for the fan-out evidence collection step, then hand off to a ReAct loop with a human for remediation.
6) Benchmarks, claims, and reality
- ReAct is established; great for QA (HotpotQA/FEVER) and decision-making tasks (ALFWorld/WebShop).
- ReWOO introduced the decoupled plan/work/solve idea and reports efficiency/accuracy gains, plus robustness to tool failure. Treat as promising—but verify end-to-end on your infra and tools.
- LangGraph “planning agents.” Official guidance suggests plan-and-execute can be faster/cheaper for many tasks; again, data- and tool-dependent.
7) Copy-paste adapters (wire any provider)
These helpers make the above agent code portable:
# Example adapters you can implement once per provider
def call_llm_openai(prompt: str) -> str:
# from openai import OpenAI
# client = OpenAI()
# resp = client.chat.completions.create(
# model="gpt-4.1-mini",
# messages=[{"role":"user","content":prompt}],
# )
# return resp.choices[0].message.content
raise NotImplementedError
def call_llm_anthropic(prompt: str) -> str:
# from anthropic import Anthropic
# client = Anthropic()
# resp = client.messages.create(model="claude-3-5-sonnet-20241022",
# messages=[{"role":"user","content":prompt}],
# max_tokens=800)
# return resp.content[0].text
raise NotImplementedError(API method names evolve—keep this adapter layer so agent code stays stable.)
8) Dev checklist
- Choose pattern (ReAct vs ReWOO) based on tool fan-out and interactivity.
- Enforce JSON contracts for model output.
- Create a tool registry (name → function; validators; timeouts).
- Add token & time budgets; log every step.
- For ReWOO, require citations and verify them before returning.
- Ship with evals (golden tasks + ops metrics: p95 latency, cost).
- Reassess regularly; some workflows migrate from ReAct → ReWOO as they stabilize.
Key takeaways
- Use ReAct when the next step depends on the last observation (and/or humans are in the loop).
- Use ReWOO when you can plan, fan-out tools in parallel, and compose from evidence.
- Demand JSON, allow-lists, timeouts, budgets, and verified citations.
- Treat benchmark gains as signals, not guarantees. Measure on your stack.
Both ReAct and ReWOO are must-know patterns. ReAct gives you agility; ReWOO gives you scale. Great agent stacks in 2025 will use both, with sensible routing, ruthless observability, and strong guardrails.
— Cohorte Team
October 27, 2025