Paperclip AI: Open-Source Platform for Managing AI Agent Teams.

Most teams do not have an “AI capability” problem anymore.
They have an AI management problem.
One agent can write code. Another can review a PR. A third can triage tickets. A fourth can draft launch copy. By the time a team has five or ten of them running, the hard questions are no longer about prompts. They are about ownership, cost, escalation, approvals, auditability, and who exactly is allowed to do what when nobody is watching. That is the gap Paperclip is explicitly going after. The project describes itself as open-source orchestration for “zero-human companies”: a Node.js server and React UI for running teams of AI agents with goals, budgets, org charts, and governance from one dashboard.
That framing is the breakthrough.
Paperclip is not trying to be one more magical agent runtime. It positions itself as the control plane above the runtimes. Its docs make that distinction clearly: Paperclip manages companies, employees, goals, tasks, budgets, tracing, and heartbeats, while the actual execution happens through external adapters such as Claude CLI, Codex CLI, shell-process adapters, HTTP webhooks, and other integrations.
That is why this project matters to both developers and AI VPs.
It does not ask, “How do we make one agent more impressive?”
It asks the more durable question:
How do we make a fleet of agents run like an actual organization?
The old mental model is breaking
A lot of agent tooling is very good at one of four things:
- making a model reason
- letting a model use tools
- chaining tasks together
- coordinating a set of agents in a workflow
But the moment those agents start doing meaningful work in parallel, teams run into a different class of problem:
Who owns this task?
Who can delegate?
Who approves strategy?
Which agent is wasting budget?
What wakes an idle worker up?
What happens after a crash?
How do we pause one worker without taking down the entire system?
Paperclip’s docs answer those with company primitives, not chatbot primitives. A company has a goal, employees, an org structure, a budget, and a task hierarchy. Employees are AI agents. Issues are the unit of work. Reporting lines are explicit. The board can intervene. Tasks roll up through parent issues back to company goals.
That sounds small on paper.
Operationally, it is a big shift.
It is the difference between “we have some bots doing things” and “we have a system we can actually operate.”
Why Paperclip feels different
1) It builds an org chart, not just a workflow
Paperclip treats hierarchy as a first-class primitive. The docs describe a strict tree structure in which each agent reports to exactly one manager, except the CEO. That hierarchy is not decorative; it shapes delegation, escalation, and ownership.
This is a bigger deal than it sounds.
Most teams already have an implicit org chart for AI work. It just lives in Slack, ad hoc conventions, and one over-caffeinated operator’s head.
Paperclip makes that structure explicit.
That means the system can answer, clearly:
- who breaks strategy into execution
- who is allowed to create sub-work
- who reviews
- who escalates
- who should never be freelancing changes to billing, auth, or customer-facing messaging
Or, to put it more bluntly: Paperclip helps prevent your agent stack from turning into a group project with root access.
2) It treats budgets like architecture, not analytics
Paperclip’s public materials emphasize work and cost tracking from one dashboard, and the platform’s implementation/spec work ties agent execution to cost and token reporting with budget limits intended to stop work when ceilings are reached.
That matters because most teams still handle agent cost the way people handle cloud overspend right before the finance meeting: by promising to “look into it.”
Paperclip bakes cost discipline into the mental model. Every worker can have a budget. Spend is part of the operating system, not a spreadsheet someone opens after the damage is done.
That is not glamorous.
It is better than glamorous.
It is how real systems survive contact with finance.
3) It treats human oversight as governance
Paperclip’s docs describe the human operator as the board. The board can approve hires, approve the CEO’s initial strategy, inspect work, intervene, and use override controls such as pausing, resuming, or terminating agents. The platform also keeps an activity and audit trail around work and decisions.
That is a much stronger version of “human in the loop” than the usual vague promise that someone can probably click a button if things go sideways.
Paperclip’s framing is better:
- agents are workers
- humans are the board
- the board sets policy
- the board approves high-consequence moves
- the board can step in whenever needed
That is not babysitting.
That is governance.
The heart of the idea: Paperclip is a control plane
This is the sentence that makes the rest of the platform click:
Paperclip is not trying to replace your agent runtimes. It is trying to manage them.
That is straight from the project’s own architecture framing. Agents run in heartbeats, using configured adapters. Paperclip starts the adapter, passes current context, lets it run until exit/timeout/cancel, captures outputs like status, token usage, errors, and logs, and updates the UI.
That creates a very practical separation of concerns:
- use the runtime best suited to the role
- keep management, tasking, goals, tracing, and spend in one place
This is a strong design choice because runtimes will keep changing. Teams will swap models, providers, CLIs, and execution environments.
Management problems are not going away.
How Paperclip actually works
At a high level, the documented setup flow is straightforward:
- create a company
- define the company goal
- create the CEO agent and configure its adapter
- build the rest of the org chart
- assign work and budgets
- let the company run via heartbeats and intervene through the board UI as needed
That flow is revealing.
Paperclip is not asking teams to begin with prompt syntax.
It is asking them to begin with operating design:
- What are we trying to achieve?
- Who works here?
- Who reports to whom?
- What is each worker allowed to spend?
- Which actions require human approval?
That is exactly the right place to start once AI work stops being a novelty and starts becoming an operating function.
The most important technical lesson: idle autonomy is not free
Paperclip’s heartbeat model is elegant. Agents do not run continuously; they wake, work in a bounded execution window, then report back. That keeps execution legible and makes intervention easier.
But there is a critical engineering lesson hiding in the project’s issue tracker: scheduled heartbeats can still spend tokens even when there is no useful work to do, because waking a full runtime can itself be expensive. Paperclip maintainers and contributors have discussed the need for cheaper triage before full adapter invocation precisely because timer-based wakeups can waste spend.
That is a valuable insight far beyond Paperclip.
A badly designed multi-agent system does not only cost money when it is productive.
It can cost money when it wakes up, thinks hard, and concludes that absolutely nothing interesting is happening.
We have all worked with someone like that.
The difference here is that the invoice arrives in tokens.
A deployment pattern we would actually recommend
The best thing about Paperclip is also the easiest thing to misuse: it gives teams a lot of structure. That does not mean teams should start by building a ten-agent AI conglomerate before lunch.
We would not.
We would start smaller.
Phase 1: choose one operating lane
Pick one contained function:
- QA triage
- release documentation
- growth experiments
- support escalation
- internal tooling maintenance
- bug backlog grooming
The goal of the first deployment is not to look futuristic.
It is to learn where autonomy helps, where it leaks money, and where governance actually needs to sit.
Phase 2: use one manager and two workers
A clean first org looks like this:
- one manager agent
- one execution worker
- one review worker
- one human board
That gives the system a simple shape:
plan, execute, review, govern.
If the system struggles, the source of failure is easier to diagnose.
If you start with seven peer agents all coordinating laterally, you may build something impressive-looking, but it will be much harder to reason about ownership, cost, and failure modes.
Phase 3: budget before scale
Paperclip makes budget part of the model. Good. Keep it there.
A practical pattern is:
- cheap scout agents
- mid-cost manager agents
- expensive specialist agents
The exact numbers will vary by workload, but the architectural principle holds:
not every worker needs the biggest model, the widest context, or the highest spend ceiling.
That is not optimization theater. It is operational sanity.
Phase 4: reserve approvals for strategic moves
Paperclip’s board model is strongest when used around high-consequence actions, not every tiny state transition.
Good candidates for approval:
- new agent creation
- company strategy changes
- production deployments
- budget increases
- destructive operations
- customer-facing messaging changes
Bad candidates:
- every draft revision
- every routine task update
- every internal note
Too few approvals and you get chaos.
Too many approvals and you get a very expensive checklist.
Illustrative org design
The examples below are illustrative deployment patterns, not official Paperclip config syntax.
Here is a simple way we would model a first engineering org around Paperclip:
company:
name: "Acme AI Studio"
goal: "Ship a developer-facing feature to GA in 60 days"
agents:
- name: "CEO"
role: "strategy and prioritization"
reports_to: null
monthly_budget_usd: 400
runtime: "claude-code"
- name: "CTO"
role: "technical planning and delegation"
reports_to: "CEO"
monthly_budget_usd: 300
runtime: "codex"
- name: "Engineer"
role: "implementation"
reports_to: "CTO"
monthly_budget_usd: 500
runtime: "cursor"
- name: "QA"
role: "review and regression checks"
reports_to: "CTO"
monthly_budget_usd: 150
runtime: "shell-process"
board_policies:
approvals_required_for:
- "new_agent"
- "prod_deploy"
- "budget_increase"
- "customer_facing_change"Why this shape works:
- the CEO does not write code
- the CTO does not do every task
- the Engineer does not invent strategy
- QA is not optional
- the board governs the risky edges
That is not just cleaner for humans. It is easier for the system to enforce.
Illustrative policy logic: route work by consequence
This is not a Paperclip API. It is the kind of application-side policy logic we would put around a Paperclip deployment.
type RiskLevel = "low" | "medium" | "high";
interface TaskPolicy {
risk: RiskLevel;
requiresBoardApproval: boolean;
defaultAssignee: string;
}
const policyTable: Record<string, TaskPolicy> = {
"docs-update": { risk: "low", requiresBoardApproval: false, defaultAssignee: "Docs" },
"bug-fix": { risk: "medium", requiresBoardApproval: false, defaultAssignee: "Engineer" },
"schema-migration": { risk: "high", requiresBoardApproval: true, defaultAssignee: "CTO" },
"billing-change": { risk: "high", requiresBoardApproval: true, defaultAssignee: "CEO" },
};
function routeTask(taskType: string): TaskPolicy {
const policy = policyTable[taskType];
if (!policy) throw new Error(`No policy registered for ${taskType}`);
return policy;
}This is the kind of small rule engine that keeps autonomy legible.
Not every task deserves the same freedom.
Not every task deserves the same worker.
And absolutely not every task deserves the same budget.
Illustrative budgeting: give agents salaries, not vibes
Again, not official Paperclip schema. Just a useful deployment pattern.
AGENT_SALARY_PLAN = {
"Scout": {"monthly_budget_usd": 50, "job": "triage and routing"},
"Manager": {"monthly_budget_usd": 200, "job": "planning and escalation"},
"Specialist": {"monthly_budget_usd": 600, "job": "deep execution"},
}If an agent only needs to classify, route, or decide whether work exists, do not hand it the most expensive runtime you have.
That is the AI equivalent of hiring a senior staff engineer to check whether the meeting room projector is plugged in.
Illustrative heartbeat control: triage before expensive work
Paperclip’s heartbeat model is real. The policy below is ours. It is the sort of cheap decision layer we would want around heartbeat-heavy deployments to keep idle cost under control.
type TriageDecision =
| { action: "sleep"; reason: string }
| { action: "work"; worker: string; priority: "low" | "medium" | "high" }
| { action: "escalate"; manager: string; reason: string };
function triageHeartbeat(input: {
hasAssignedIssue: boolean;
waitingOnApproval: boolean;
blocked: boolean;
remainingBudgetUsd: number;
}): TriageDecision {
if (input.remainingBudgetUsd < 10) {
return { action: "sleep", reason: "Budget nearly exhausted" };
}
if (input.waitingOnApproval) {
return { action: "sleep", reason: "Awaiting board decision" };
}
if (input.blocked) {
return { action: "escalate", manager: "CTO", reason: "Blocked on dependency or policy" };
}
if (input.hasAssignedIssue) {
return { action: "work", worker: "Specialist", priority: "medium" };
}
return { action: "sleep", reason: "No actionable work" };
}This may look simple.
That is the point.
The best way to reduce waste in agent systems is often not more reasoning. It is better gating.
How Paperclip compares to adjacent tools
This part matters, because Paperclip is not trying to solve the same layer as every other framework.
Paperclip vs LangGraph
LangGraph is built for long-running, stateful workflows and agents with persistence and durable execution. Its docs explicitly describe durability modes and show how state can be persisted to resume after failure or human interruption.
Paperclip solves a different layer. It gives teams a company-level operating model: org charts, budgets, issues, goals, board governance, and heartbeat-based management of external runtimes.
The cleanest framing is:
- LangGraph helps build the workflow brain.
- Paperclip helps run a workforce of agents like an organization.
Those are not necessarily competing roles.
In many stacks, they complement each other.
Paperclip vs CrewAI
CrewAI focuses on orchestrating autonomous AI agents through agents, crews, and flows. Its official docs emphasize collaborative crews and structured, event-driven flows.
Paperclip’s center of gravity is more managerial than collaborative.
- CrewAI asks: how should agents work together?
- Paperclip asks: who works here, who reports to whom, who owns the task, what is the budget, and who can step in?
That is not a subtle distinction once a system becomes operational.
It is the difference between designing a workflow and designing an organization.
Paperclip vs OpenHands
OpenHands is explicitly focused on AI-driven software development. Its current docs describe the Software Agent SDK as a composable Python library for building software agents, and position it as purpose-built for software engineering.
Paperclip is broader. It is designed to manage mixed-role AI companies rather than only coding agents. That makes it especially interesting for teams spanning engineering, product, operations, and go-to-market work.
So the simplest stack-level summary is:
- LangGraph = workflow engine
- CrewAI = collaborative orchestration
- OpenHands = software-agent stack
- Paperclip = management layer for an AI workforce
That is the sentence many teams have been missing.
What developers should pay attention to
The most interesting thing about Paperclip is not the dashboard.
It is the shift from prompt engineering to organizational engineering.
The hard problems become:
- role design
- escalation design
- budget design
- heartbeat design
- approval design
- failure recovery design
- auditability design
Those are healthier problems.
Because these are the decisions that still matter when the models change.
A prompt can go stale quickly. A clean operating model ages much better.
What AI VPs should pay attention to
For leadership, the biggest promise is not “more autonomy.”
It is managed autonomy.
Paperclip creates a bridge between experiments and operations:
- work can be attached to goals
- spend can be attached to workers
- approvals can be attached to policy
- interventions can be attached to governance
- runtime activity can be attached to a visible audit trail
That is the path from a clever demo to a credible operating system.
Without that layer, most AI orgs end up in the same awkward middle:
- lots of promise
- some real wins
- weak visibility
- unclear accountability
- fragile trust
Paperclip is compelling because it attacks that exact stage.
Real caveats teams should not ignore
This is where the adult conversation starts.
Paperclip is promising, but it does not magically solve operational design.
A bad org chart is still bad.
A vague mission is still vague.
A noisy heartbeat policy is still noisy.
A sloppy approval model is still sloppy.
And there are real operational concerns teams should plan for.
Open project discussions have highlighted that timer-based heartbeats can create unnecessary spend, and other issues show that local autonomous runtimes can get stuck on approval/sandbox prompts or require smoother recovery when server restarts interrupt running processes. Those are not reasons to dismiss the platform. They are reminders that once agents are treated like workers, runtime resilience and policy design become part of the job.
That should shape how teams deploy:
- favor event-driven wakeups where possible
- isolate risky workers
- keep logs and heartbeat metrics visible
- design pause and intervention paths up front
- avoid interactive approval traps inside autonomous runtimes
In other words: do not just install a management layer.
Operate like you mean it.
The key takeaway
Paperclip’s core insight is simple and important:
multi-agent systems do not fail only because the agents are weak; they fail because nobody designed management around them.
That is why this project feels fresh.
Not because it promises infinite autonomy.
Because it starts from the more believable premise that autonomy needs:
- structure
- budgets
- reporting lines
- bounded execution
- approvals
- intervention
- traceability
That is a much more mature foundation.
And probably a much more durable one.
The next era of AI will not be defined by who built the most impressive standalone agent demo.
It will be defined by who learned how to manage a fleet of AI workers without losing cost discipline, visibility, and control.
That is why Paperclip matters.
It is not asking, “How do we make the bot more magical?”
It is asking the better question:
How do we make AI work run like an organization?
That is a question with a long shelf life.
— Cohorte Team
March 16, 2026.