Agentic Workflows
Describe the outcome in English. Jetty handles execution.
The Idea
A runbook is a markdown document that becomes an AI agent's mission. You describe what you want — in plain language, with as much or as little structure as you need — and Jetty provisions an isolated sandbox, installs the agent, uploads your files, and lets the agent work autonomously until the job is done.
This is for tasks that are too complex for a single LLM call but too valuable to run without persistence and observability.
Sandbox Execution Lifecycle
What Happens
- Provision a sandbox — Isolated container from a custom image, pre-built snapshot, or Dockerfile build
- Install the agent — Claude Code, Codex, or Gemini CLI
- Upload files — PDFs, CSVs, images, code — anything the agent needs
- Inject the runbook — Your system prompt becomes the agent's mission plan
- Agent executes freely — Shell, Python, network, file I/O — like a developer on their own machine
- Collect artifacts — Everything written to
/app/results/is persisted to cloud storage - Record the trajectory — Full execution history for replay and evaluation
- Return results — Structured response with file URLs, streamed in real time via SSE
Each sandbox is ephemeral. When execution completes, the sandbox is destroyed — but every artifact and log survives in persistent storage.
API Call
Add a jetty block to a standard /v1/chat/completions request:
{
"model": "claude-sonnet-4-6",
"messages": [
{"role": "system", "content": "You are a data analyst. Read the uploaded CSV..."},
{"role": "user", "content": "Analyze this dataset and produce a report"}
],
"stream": true,
"jetty": {
"runbook": true,
"collection": "my-org",
"task": "analyze-data",
"agent": "claude-code",
"file_paths": ["uploads/dataset.csv"]
}
}
Without the jetty block, it's a standard LLM passthrough — streaming tokens from any of 100+ providers. With it, Jetty launches a full agent execution.
See Chat Completions Reference for the full endpoint spec.
Writing Runbooks
A runbook is not a prompt template. It's a full specification: what to do, in what order, with what tools, and how to validate output.
Example: pdf2mlcroissant — turning academic papers into machine-readable dataset metadata:
# Mission: Extract Dataset Metadata
1. Install the `mlcroissant` Python package
2. Read the uploaded PDF
3. Extract dataset metadata — name, creators, structure, distributions
4. Cross-reference HuggingFace APIs if a dataset URL is provided
5. Build a Croissant JSON-LD metadata file
6. Validate against the schema — fix errors up to 3 times
7. Write a summary report
8. Verify all output files exist and are non-empty
The agent handles the entire pipeline autonomously. The runbook is your contract; the agent figures out the implementation.
Runbook Best Practices
- Be specific about outputs — Tell the agent exactly what files to produce and where to write them (
/app/results/) - Include validation steps — Have the agent check its own work before declaring success
- Set iteration limits — "Fix errors up to 3 times" prevents infinite loops
- Specify tools — If you need a specific package or CLI, say so explicitly
- Define success criteria — What does "done" look like?
Trajectories
Every run produces a trajectory — a complete record of what happened:
- Inputs and outputs
- Intermediate files
- Step timings
- Agent logs
- Token usage and cost
Trajectories are persisted and can be queried, labeled, compared, and replayed. This is how you build evaluation datasets, debug failures, and track quality over time.
Agents
The sandbox runs whatever agent CLI you specify:
| Agent | Value |
|---|---|
| Claude Code | claude-code |
| OpenAI Codex | codex |
| Gemini CLI | gemini-cli |
The pattern is extensible — any tool that takes an instruction and produces files can be a Jetty agent.
What You Can Build
- Document processing — Extract, transform, validate, and report on uploaded files
- Code generation & testing — Generate code, run tests in a real environment, iterate on failures
- Data analysis — Upload datasets, run Python, produce visualizations and reports
- Research automation — Multi-step literature analysis, data extraction, cross-referencing
- Browser automation — Sandboxes with web scraping and end-to-end testing
Next Steps
- Quickstart Guide — Run your first runbook
- Writing Runbooks — Full runbook structure, evaluation patterns, and the
/create-runbookwizard - Workflow Orchestration — Chain multiple steps into pipelines
- Chat Completions Reference — Full API spec