ferrfleetpar ferrlabs
Public beta · ships Q3 2026

Run AI agents in production.
Without losing money.

An infra layer for shipping LLM agents that don't fall over at 3am. Queues, retries, fallback models, cost ledger, eval harness — opinionated defaults so you can stop reinventing them.

Read the docs → See a live fleet
Live · req_8x9k2j · invoice-extract
$ ferrfleet dispatch
→ queue: high · age 0s
→ model: haiku-4-5 (fallback chain ready)
→ tools: pdf-parse, llm-extract
→ tokens in: 418
✓ checkpoint: parse-pdf (180ms)
✓ checkpoint: extract-fields (640ms)
→ tokens out: 1.1k
✓ done in 940ms · cost $0.018 · margin $0.121
P&L · today
CALLS
24,979
MARGIN
$776 ↑ +22%
invoice-extract
+$0.121
support-triage
+$0.033
pr-reviewer
+$0.088
sql-writer
+$0.071
Eight things, done right.

Most agent platforms are
demos in disguise.

LangChain is a toolkit, not a runtime. Make.com is a no-code toy. We're the boring layer between your prompt and your bill — durable, observable, replayable.

01

Durable execution

Every step is checkpointed. Process crashes, network blips, model timeouts — pick up where you left off, no double-charging the user.

02

Model fallback chains

Sonnet down? Slide to Haiku. Haiku rate-limited? Try GPT-4o. Set the cascade, pin the budget, get a single quality score across providers.

03

Cost ledger, by request

Per-request cost in tokens AND dollars, attributed to user / tenant / agent. Pipe it to Stripe, your data warehouse, or a CSV.

04

Eval harness in CI

Run your eval set on every PR. Block merges that regress on accuracy, latency, or cost. Diff outputs side-by-side in the PR comment.

05

Sandboxed tool calls

Every tool runs in an ephemeral container. Network policy, file system, secrets — scoped per-agent. The model can't exfiltrate what it can't reach.

06

Replay any run

A user complains? Pull the run by ID, see the full trace — prompts, tool calls, tokens, model choices. Replay with a different model in one click.

07

Queues with backpressure

Per-agent concurrency limits. Per-tenant rate caps. Priority lanes for paying customers. We push back when downstream is slow — your DB will thank us.

08

On-prem or managed

Helm chart for your cluster, or our cloud. Same API, same dashboard, same dispatcher. Move between them without rewriting agent code.

P&L for every agent

Each agent is a small business. Treat it like one.

Margin per request, retries per hour, cost per token. We compute it, surface it, and pause the agent when it goes red.

Margin per call · last 24 h 5 agents tracked
invoice-extract
cost $0.018 · rev $0.18
+$0.121 OK
support-triage
cost $0.002 · rev $0.04
+$0.033 OK
pr-reviewer
cost $0.124 · rev $0.50
+$0.088 OK
sql-writer
cost $0.041 · rev $0.22
+$0.071 OK
release-notes
cost $0.082 · rev $0.00
$-0.296 PAUSED
Bring your own

Stack-agnostic by design.

Models
  • Anthropic
  • OpenAI
  • Google
  • Cohere
  • Mistral
  • Local (vLLM)
Vector
  • Pinecone
  • pgvector
  • Weaviate
  • Turbopuffer
  • Qdrant
Observability
  • Datadog
  • Honeycomb
  • OTel
  • Sentry
  • Grafana
Runtime
  • Kubernetes
  • AWS Lambda
  • Cloud Run
  • Fly.io
  • Modal

Des agents qui survivent au lundi matin. Dispatch, retry, observabilité, facturation — une seule plateforme.