Jetty
One API for AI workflows, evaluation, and agentic execution.
Jetty gives your team a single /v1/chat/completions endpoint that does three things: proxies 100+ LLM providers, orchestrates multi-step evaluation pipelines, and runs autonomous agents in isolated sandboxes. Drop it into any OpenAI-compatible integration — existing SDKs work out of the box.
Start here
Chat Completions API
One endpoint, two modes. Without the jetty block it's a standard LLM proxy with automatic trajectory recording. With the jetty block it provisions a sandbox, runs an agent, and returns structured results.
Works with any OpenAI SDK. Switch providers by changing the model field.
Architecture
Understand how Jetty's three engines — Passthrough, Workflow, and Runbook — connect through a single API layer to persistence, tracing, and object storage. See how Collections, Tasks, and Trajectories organize your work.
Build with Jetty
Writing Runbooks
A runbook is a structured markdown file that tells a coding agent how to accomplish a complex task end-to-end — with evaluation loops, iteration, and quality gates. When the first attempt is rarely sufficient, a runbook encodes your domain expertise into a repeatable process.
Covers the canonical structure, frontmatter schema, evaluation patterns, common pitfalls, and the /create-runbook wizard.
Agent Skill & MCP
Use Jetty directly from Claude Code, Cursor, Windsurf, VS Code Copilot, Zed, or Gemini CLI. Your agent can create workflows, kick off runs, inspect trajectories, run evaluations, and browse 40+ step templates — all without leaving your editor.
Three connection methods: Claude Code plugin (/jetty commands), MCP server, or raw REST API.
Guides
Hands-on tutorials for real workflows:
- Agentic quickstart — Upload a file, run an agent in a sandbox, retrieve artifacts. 5 minutes.
- Evaluating LLMs — Build evaluation pipelines with LLM-as-Judge scoring.
- Custom benchmarks — Upload agents and datasets to TerminalBench at runtime.
- CI integration — Trigger workflows from GitHub Actions with quality gates.
- Brand compliance — Automated content review against your guidelines.
Quick Start
Get running fast with progressive tutorials:
- 60 seconds — Generate an image in one API call.
- Setup — Get your API token and configure model keys.
- First flow — Create and run your first workflow.
- Model comparison — Compare GPT-4, Claude, and Gemini side-by-side (5 min).
- Agent benchmarking — Test coding agents with TerminalBench (10 min).
How it works
# Passthrough mode — standard LLM proxy, every call recorded as a trajectory
curl https://flows-api.jetty.io/v1/chat/completions \
-H "Authorization: Bearer $JETTY_API_TOKEN" \
-d '{
"model": "claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Summarize this quarter'\''s metrics"}],
"stream": true
}'
# Runbook mode — add a jetty block to run an agent in an isolated sandbox
curl https://flows-api.jetty.io/v1/chat/completions \
-H "Authorization: Bearer $JETTY_API_TOKEN" \
-d '{
"model": "claude-sonnet-4-6",
"messages": [{"role": "user", "content": "Run the evaluation suite"}],
"jetty": {
"collection": "my-evals",
"runbook_url": "https://raw.githubusercontent.com/org/repo/main/RUNBOOK.md"
}
}'
Both modes return trajectories — full execution traces with inputs, outputs, and metadata — so you always have observability into what happened and why.
More resources
Step Library 47+ pre-built activities: AI models, control flow, data processing, evaluation.
Examples Copy-paste workflow JSON for common tasks: chat, image gen, batch processing, translation.
API Reference Chat completions, webhooks, GitHub PR integration, and authentication.