Paperclip AI: Open-Source Platform for Managing AI Agent Teams.

Discover how Paperclip helps teams manage AI agents with org charts, budgets, approvals, and heartbeat-driven execution—an open-source control plane for serious AI operations.

Most teams do not have an “AI capability” problem anymore.

They have an AI management problem.

One agent can write code. Another can review a PR. A third can triage tickets. A fourth can draft launch copy. By the time a team has five or ten of them running, the hard questions are no longer about prompts. They are about ownership, cost, escalation, approvals, auditability, and who exactly is allowed to do what when nobody is watching. That is the gap Paperclip is explicitly going after. The project describes itself as open-source orchestration for “zero-human companies”: a Node.js server and React UI for running teams of AI agents with goals, budgets, org charts, and governance from one dashboard.

That framing is the breakthrough.

Paperclip is not trying to be one more magical agent runtime. It positions itself as the control plane above the runtimes. Its docs make that distinction clearly: Paperclip manages companies, employees, goals, tasks, budgets, tracing, and heartbeats, while the actual execution happens through external adapters such as Claude CLI, Codex CLI, shell-process adapters, HTTP webhooks, and other integrations.

That is why this project matters to both developers and AI VPs.

It does not ask, “How do we make one agent more impressive?”

It asks the more durable question:

How do we make a fleet of agents run like an actual organization?

The old mental model is breaking

A lot of agent tooling is very good at one of four things:

making a model reason
letting a model use tools
chaining tasks together
coordinating a set of agents in a workflow

But the moment those agents start doing meaningful work in parallel, teams run into a different class of problem:

Who owns this task?
Who can delegate?
Who approves strategy?
Which agent is wasting budget?
What wakes an idle worker up?
What happens after a crash?
How do we pause one worker without taking down the entire system?

Paperclip’s docs answer those with company primitives, not chatbot primitives. A company has a goal, employees, an org structure, a budget, and a task hierarchy. Employees are AI agents. Issues are the unit of work. Reporting lines are explicit. The board can intervene. Tasks roll up through parent issues back to company goals.

That sounds small on paper.

Operationally, it is a big shift.

It is the difference between “we have some bots doing things” and “we have a system we can actually operate.”

Why Paperclip feels different

1) It builds an org chart, not just a workflow

Paperclip treats hierarchy as a first-class primitive. The docs describe a strict tree structure in which each agent reports to exactly one manager, except the CEO. That hierarchy is not decorative; it shapes delegation, escalation, and ownership.

This is a bigger deal than it sounds.

Most teams already have an implicit org chart for AI work. It just lives in Slack, ad hoc conventions, and one over-caffeinated operator’s head.

Paperclip makes that structure explicit.

That means the system can answer, clearly:

who breaks strategy into execution
who is allowed to create sub-work
who reviews
who escalates
who should never be freelancing changes to billing, auth, or customer-facing messaging

Or, to put it more bluntly: Paperclip helps prevent your agent stack from turning into a group project with root access.

2) It treats budgets like architecture, not analytics

Paperclip’s public materials emphasize work and cost tracking from one dashboard, and the platform’s implementation/spec work ties agent execution to cost and token reporting with budget limits intended to stop work when ceilings are reached.

That matters because most teams still handle agent cost the way people handle cloud overspend right before the finance meeting: by promising to “look into it.”

Paperclip bakes cost discipline into the mental model. Every worker can have a budget. Spend is part of the operating system, not a spreadsheet someone opens after the damage is done.

That is not glamorous.

It is better than glamorous.

It is how real systems survive contact with finance.

3) It treats human oversight as governance

Paperclip’s docs describe the human operator as the board. The board can approve hires, approve the CEO’s initial strategy, inspect work, intervene, and use override controls such as pausing, resuming, or terminating agents. The platform also keeps an activity and audit trail around work and decisions.

That is a much stronger version of “human in the loop” than the usual vague promise that someone can probably click a button if things go sideways.

Paperclip’s framing is better:

agents are workers
humans are the board
the board sets policy
the board approves high-consequence moves
the board can step in whenever needed

That is not babysitting.

That is governance.

The heart of the idea: Paperclip is a control plane

This is the sentence that makes the rest of the platform click:

Paperclip is not trying to replace your agent runtimes. It is trying to manage them.

That is straight from the project’s own architecture framing. Agents run in heartbeats, using configured adapters. Paperclip starts the adapter, passes current context, lets it run until exit/timeout/cancel, captures outputs like status, token usage, errors, and logs, and updates the UI.

That creates a very practical separation of concerns:

use the runtime best suited to the role
keep management, tasking, goals, tracing, and spend in one place

This is a strong design choice because runtimes will keep changing. Teams will swap models, providers, CLIs, and execution environments.

Management problems are not going away.

How Paperclip actually works

At a high level, the documented setup flow is straightforward:

create a company
define the company goal
create the CEO agent and configure its adapter
build the rest of the org chart
assign work and budgets
let the company run via heartbeats and intervene through the board UI as needed

That flow is revealing.

Paperclip is not asking teams to begin with prompt syntax.

It is asking them to begin with operating design:

What are we trying to achieve?
Who works here?
Who reports to whom?
What is each worker allowed to spend?
Which actions require human approval?

That is exactly the right place to start once AI work stops being a novelty and starts becoming an operating function.

The most important technical lesson: idle autonomy is not free

Paperclip’s heartbeat model is elegant. Agents do not run continuously; they wake, work in a bounded execution window, then report back. That keeps execution legible and makes intervention easier.

But there is a critical engineering lesson hiding in the project’s issue tracker: scheduled heartbeats can still spend tokens even when there is no useful work to do, because waking a full runtime can itself be expensive. Paperclip maintainers and contributors have discussed the need for cheaper triage before full adapter invocation precisely because timer-based wakeups can waste spend.

That is a valuable insight far beyond Paperclip.

A badly designed multi-agent system does not only cost money when it is productive.

It can cost money when it wakes up, thinks hard, and concludes that absolutely nothing interesting is happening.

We have all worked with someone like that.

The difference here is that the invoice arrives in tokens.

A deployment pattern we would actually recommend

The best thing about Paperclip is also the easiest thing to misuse: it gives teams a lot of structure. That does not mean teams should start by building a ten-agent AI conglomerate before lunch.

We would not.

We would start smaller.

Phase 1: choose one operating lane

Pick one contained function:

QA triage
release documentation
growth experiments
support escalation
internal tooling maintenance
bug backlog grooming

The goal of the first deployment is not to look futuristic.

It is to learn where autonomy helps, where it leaks money, and where governance actually needs to sit.

Phase 2: use one manager and two workers

A clean first org looks like this:

one manager agent
one execution worker
one review worker
one human board

That gives the system a simple shape:
plan, execute, review, govern.

If the system struggles, the source of failure is easier to diagnose.

If you start with seven peer agents all coordinating laterally, you may build something impressive-looking, but it will be much harder to reason about ownership, cost, and failure modes.

Phase 3: budget before scale

Paperclip makes budget part of the model. Good. Keep it there.

A practical pattern is:

cheap scout agents
mid-cost manager agents
expensive specialist agents

The exact numbers will vary by workload, but the architectural principle holds:

not every worker needs the biggest model, the widest context, or the highest spend ceiling.

That is not optimization theater. It is operational sanity.

Phase 4: reserve approvals for strategic moves

Paperclip’s board model is strongest when used around high-consequence actions, not every tiny state transition.

Good candidates for approval:

new agent creation
company strategy changes
production deployments
budget increases
destructive operations
customer-facing messaging changes

Bad candidates:

every draft revision
every routine task update
every internal note

Too few approvals and you get chaos.

Too many approvals and you get a very expensive checklist.

Illustrative org design

The examples below are illustrative deployment patterns, not official Paperclip config syntax.

Here is a simple way we would model a first engineering org around Paperclip:

company:
  name: "Acme AI Studio"
  goal: "Ship a developer-facing feature to GA in 60 days"

agents:
  - name: "CEO"
    role: "strategy and prioritization"
    reports_to: null
    monthly_budget_usd: 400
    runtime: "claude-code"

  - name: "CTO"
    role: "technical planning and delegation"
    reports_to: "CEO"
    monthly_budget_usd: 300
    runtime: "codex"

  - name: "Engineer"
    role: "implementation"
    reports_to: "CTO"
    monthly_budget_usd: 500
    runtime: "cursor"

  - name: "QA"
    role: "review and regression checks"
    reports_to: "CTO"
    monthly_budget_usd: 150
    runtime: "shell-process"

board_policies:
  approvals_required_for:
    - "new_agent"
    - "prod_deploy"
    - "budget_increase"
    - "customer_facing_change"

Why this shape works:

the CEO does not write code
the CTO does not do every task
the Engineer does not invent strategy
QA is not optional
the board governs the risky edges

That is not just cleaner for humans. It is easier for the system to enforce.

Illustrative policy logic: route work by consequence

This is not a Paperclip API. It is the kind of application-side policy logic we would put around a Paperclip deployment.

type RiskLevel = "low" | "medium" | "high";

interface TaskPolicy {
  risk: RiskLevel;
  requiresBoardApproval: boolean;
  defaultAssignee: string;
}

const policyTable: Record<string, TaskPolicy> = {
  "docs-update": { risk: "low", requiresBoardApproval: false, defaultAssignee: "Docs" },
  "bug-fix": { risk: "medium", requiresBoardApproval: false, defaultAssignee: "Engineer" },
  "schema-migration": { risk: "high", requiresBoardApproval: true, defaultAssignee: "CTO" },
  "billing-change": { risk: "high", requiresBoardApproval: true, defaultAssignee: "CEO" },
};

function routeTask(taskType: string): TaskPolicy {
  const policy = policyTable[taskType];
  if (!policy) throw new Error(`No policy registered for ${taskType}`);
  return policy;
}

This is the kind of small rule engine that keeps autonomy legible.

Not every task deserves the same freedom.

Not every task deserves the same worker.

And absolutely not every task deserves the same budget.

Illustrative budgeting: give agents salaries, not vibes

Again, not official Paperclip schema. Just a useful deployment pattern.

AGENT_SALARY_PLAN = {
    "Scout": {"monthly_budget_usd": 50, "job": "triage and routing"},
    "Manager": {"monthly_budget_usd": 200, "job": "planning and escalation"},
    "Specialist": {"monthly_budget_usd": 600, "job": "deep execution"},
}

If an agent only needs to classify, route, or decide whether work exists, do not hand it the most expensive runtime you have.

That is the AI equivalent of hiring a senior staff engineer to check whether the meeting room projector is plugged in.

Illustrative heartbeat control: triage before expensive work

Paperclip’s heartbeat model is real. The policy below is ours. It is the sort of cheap decision layer we would want around heartbeat-heavy deployments to keep idle cost under control.

type TriageDecision =
  | { action: "sleep"; reason: string }
  | { action: "work"; worker: string; priority: "low" | "medium" | "high" }
  | { action: "escalate"; manager: string; reason: string };

function triageHeartbeat(input: {
  hasAssignedIssue: boolean;
  waitingOnApproval: boolean;
  blocked: boolean;
  remainingBudgetUsd: number;
}): TriageDecision {
  if (input.remainingBudgetUsd < 10) {
    return { action: "sleep", reason: "Budget nearly exhausted" };
  }

  if (input.waitingOnApproval) {
    return { action: "sleep", reason: "Awaiting board decision" };
  }

  if (input.blocked) {
    return { action: "escalate", manager: "CTO", reason: "Blocked on dependency or policy" };
  }

  if (input.hasAssignedIssue) {
    return { action: "work", worker: "Specialist", priority: "medium" };
  }

  return { action: "sleep", reason: "No actionable work" };
}

This may look simple.

That is the point.

The best way to reduce waste in agent systems is often not more reasoning. It is better gating.

How Paperclip compares to adjacent tools

This part matters, because Paperclip is not trying to solve the same layer as every other framework.

Paperclip vs LangGraph

LangGraph is built for long-running, stateful workflows and agents with persistence and durable execution. Its docs explicitly describe durability modes and show how state can be persisted to resume after failure or human interruption.

Paperclip solves a different layer. It gives teams a company-level operating model: org charts, budgets, issues, goals, board governance, and heartbeat-based management of external runtimes.

The cleanest framing is:

LangGraph helps build the workflow brain.
Paperclip helps run a workforce of agents like an organization.

Those are not necessarily competing roles.

In many stacks, they complement each other.

Paperclip vs CrewAI

CrewAI focuses on orchestrating autonomous AI agents through agents, crews, and flows. Its official docs emphasize collaborative crews and structured, event-driven flows.

Paperclip’s center of gravity is more managerial than collaborative.

CrewAI asks: how should agents work together?
Paperclip asks: who works here, who reports to whom, who owns the task, what is the budget, and who can step in?

That is not a subtle distinction once a system becomes operational.

It is the difference between designing a workflow and designing an organization.

Paperclip vs OpenHands

OpenHands is explicitly focused on AI-driven software development. Its current docs describe the Software Agent SDK as a composable Python library for building software agents, and position it as purpose-built for software engineering.

Paperclip is broader. It is designed to manage mixed-role AI companies rather than only coding agents. That makes it especially interesting for teams spanning engineering, product, operations, and go-to-market work.

So the simplest stack-level summary is:

LangGraph = workflow engine
CrewAI = collaborative orchestration
OpenHands = software-agent stack
Paperclip = management layer for an AI workforce

That is the sentence many teams have been missing.

What developers should pay attention to

The most interesting thing about Paperclip is not the dashboard.

It is the shift from prompt engineering to organizational engineering.

The hard problems become:

role design
escalation design
budget design
heartbeat design
approval design
failure recovery design
auditability design

Those are healthier problems.

Because these are the decisions that still matter when the models change.

A prompt can go stale quickly. A clean operating model ages much better.

What AI VPs should pay attention to

For leadership, the biggest promise is not “more autonomy.”

It is managed autonomy.

Paperclip creates a bridge between experiments and operations:

work can be attached to goals
spend can be attached to workers
approvals can be attached to policy
interventions can be attached to governance
runtime activity can be attached to a visible audit trail

That is the path from a clever demo to a credible operating system.

Without that layer, most AI orgs end up in the same awkward middle:

lots of promise
some real wins
weak visibility
unclear accountability
fragile trust

Paperclip is compelling because it attacks that exact stage.

Real caveats teams should not ignore

This is where the adult conversation starts.

Paperclip is promising, but it does not magically solve operational design.

A bad org chart is still bad.
A vague mission is still vague.
A noisy heartbeat policy is still noisy.
A sloppy approval model is still sloppy.

And there are real operational concerns teams should plan for.

Open project discussions have highlighted that timer-based heartbeats can create unnecessary spend, and other issues show that local autonomous runtimes can get stuck on approval/sandbox prompts or require smoother recovery when server restarts interrupt running processes. Those are not reasons to dismiss the platform. They are reminders that once agents are treated like workers, runtime resilience and policy design become part of the job.

That should shape how teams deploy:

favor event-driven wakeups where possible
isolate risky workers
keep logs and heartbeat metrics visible
design pause and intervention paths up front
avoid interactive approval traps inside autonomous runtimes

In other words: do not just install a management layer.

Operate like you mean it.

The key takeaway

Paperclip’s core insight is simple and important:

multi-agent systems do not fail only because the agents are weak; they fail because nobody designed management around them.

That is why this project feels fresh.

Not because it promises infinite autonomy.

Because it starts from the more believable premise that autonomy needs:

structure
budgets
reporting lines
bounded execution
approvals
intervention
traceability

That is a much more mature foundation.

And probably a much more durable one.

The next era of AI will not be defined by who built the most impressive standalone agent demo.

It will be defined by who learned how to manage a fleet of AI workers without losing cost discipline, visibility, and control.

That is why Paperclip matters.

It is not asking, “How do we make the bot more magical?”

It is asking the better question:

How do we make AI work run like an organization?

That is a question with a long shelf life.

— Cohorte Team
March 16, 2026.