PI and Hermes: Building AI Agents That Actually Ship
What separates toy demos from production-ready agents? We break down the architecture behind PI (a coding agent harness) and Hermes (an open-weights agent framework) — and what we learned wiring them into real client work.

Photo: Markus Spiske on Unsplash
Every week a new “agent framework” launches. Most die in a GitHub repo with three stars and a founder’s Substack. The ones that survive — the ones we actually wire into client systems — share a handful of unglamorous traits. PI and Hermes are two projects that, in different ways, get these traits right. Here’s what we’ve learned deploying both.
The pattern that separates demos from products
Demos answer the question “can the model call a tool?” Products answer a much harder one: “what does this agent do when the tool returns garbage, the user is wrong, or the budget hits zero in the middle of step 14?” The difference is not the model. It’s the harness.
PI is a coding-agent harness — the kind of thing Claude Code, Cursor, and Aider all sit on top of. What it gets right is that it’s small, composable, and treats the model as a fallible collaborator rather than a god.
Three production-grade principles
- Models fail. Tools fail. Networks fail. The harness must survive all three without corrupting state.
- Every action should be inspectable, replayable, and reversible. If you can't undo a tool call, you can't run the agent unsupervised.
- Budgets are not optional. Token, time, and cost ceilings must be enforced by the harness, not the model.
Where Hermes fits
Hermes is the open-weights agent framework from Nous Research. Where PI is the harness, Hermes is the runtime — a smaller, more inspectable alternative to LangChain/LlamaIndex that doesn’t try to own your entire stack. We use it when clients need an agent on-device, on-prem, or under a strict license.
“The best agent frameworks feel like they were written by people who have been paged at 3am. Hermes has that energy.”
What this means for your business
If you’re paying an agency to “build an AI agent,” ask them: what’s your fallback when the model hallucinates? Can you replay a failed run? What enforces the cost ceiling? If the answers are vague, you’re buying a demo, not a product.
We build on harnesses like PI and runtimes like Hermes because they survive the messy parts of real deployments. That’s the difference between a tweet thread and a feature your team uses every day.
Want this applied to your business?
We deploy AI agents and frontier models into real workflows every week. Book a free 30-minute call and we'll show you what's possible.
Book a free call
