Agentic Workflows

Describe the outcome in English. Jetty handles execution.

The Idea

A runbook is a markdown document that becomes an AI agent's mission. You describe what you want — in plain language, with as much or as little structure as you need — and Jetty provisions an isolated sandbox, installs the agent, uploads your files, and lets the agent work autonomously until the job is done.

This is for tasks that are too complex for a single LLM call but too valuable to run without persistence and observability.

Sandbox Execution Lifecycle

What Happens

Provision a sandbox — Isolated container from a custom image, pre-built snapshot, or Dockerfile build
Install the agent — Claude Code, Codex, or Gemini CLI
Upload files — PDFs, CSVs, images, code — anything the agent needs
Inject the runbook — Your system prompt becomes the agent's mission plan
Agent executes freely — Shell, Python, network, file I/O — like a developer on their own machine
Collect artifacts — Everything written to /app/results/ is persisted to cloud storage
Record the trajectory — Full execution history for replay and evaluation
Return results — Structured response with file URLs, streamed in real time via SSE

Each sandbox is ephemeral. When execution completes, the sandbox is destroyed — but every artifact and log survives in persistent storage.

API Call

Add a jetty block to a standard /v1/chat/completions request:

{
  "model": "claude-sonnet-4-6",
  "messages": [
    {"role": "system", "content": "You are a data analyst. Read the uploaded CSV..."},
    {"role": "user", "content": "Analyze this dataset and produce a report"}
  ],
  "stream": true,
  "jetty": {
    "runbook": true,
    "collection": "my-org",
    "task": "analyze-data",
    "agent": "claude-code",
    "file_paths": ["uploads/dataset.csv"]
  }
}

Without the jetty block, it's a standard LLM passthrough — streaming tokens from any of 100+ providers. With it, Jetty launches a full agent execution.

See Chat Completions Reference for the full endpoint spec.

Writing Runbooks

A runbook is not a prompt template. It's a full specification: what to do, in what order, with what tools, and how to validate output.

Example: pdf2mlcroissant — turning academic papers into machine-readable dataset metadata:

# Mission: Extract Dataset Metadata

Install the `mlcroissant` Python package
Read the uploaded PDF
Extract dataset metadata — name, creators, structure, distributions
Cross-reference HuggingFace APIs if a dataset URL is provided
Build a Croissant JSON-LD metadata file
Validate against the schema — fix errors up to 3 times
Write a summary report
Verify all output files exist and are non-empty

The agent handles the entire pipeline autonomously. The runbook is your contract; the agent figures out the implementation.

Runbook Best Practices

Be specific about outputs — Tell the agent exactly what files to produce and where to write them (/app/results/)
Include validation steps — Have the agent check its own work before declaring success
Set iteration limits — "Fix errors up to 3 times" prevents infinite loops
Specify tools — If you need a specific package or CLI, say so explicitly
Define success criteria — What does "done" look like?

Trajectories

Every run produces a trajectory — a complete record of what happened:

Inputs and outputs
Intermediate files
Step timings
Agent logs
Token usage and cost

Trajectories are persisted and can be queried, labeled, compared, and replayed. This is how you build evaluation datasets, debug failures, and track quality over time.

Agents

The sandbox runs whatever agent CLI you specify:

Agent	Value
Claude Code	`claude-code`
OpenAI Codex	`codex`
Gemini CLI	`gemini-cli`

The pattern is extensible — any tool that takes an instruction and produces files can be a Jetty agent.

What You Can Build

Document processing — Extract, transform, validate, and report on uploaded files
Code generation & testing — Generate code, run tests in a real environment, iterate on failures
Data analysis — Upload datasets, run Python, produce visualizations and reports
Research automation — Multi-step literature analysis, data extraction, cross-referencing
Browser automation — Sandboxes with web scraping and end-to-end testing

Scheduling Runbooks

Anything you can fire one-shot via POST /run/{collection}/{task} can also be put on a recurring schedule via a routine. A routine fires the same FlowWorkflow.run your run endpoint does — same workflow input, same trajectory recording, same artifact persistence — with two additions: an init_params_overrides blob merged on top of the task's defaults at fire time, and a triggered_by_routine_id tag on the resulting trajectory so runs are queryable per routine. Cadences cover the common cases (manual, hourly, daily, weekdays, weekly); cron is deferred to v2. Because the run pipeline is shared, anything you observe about a one-shot run — sandbox lifecycle, artifact paths, trajectory shape — applies identically to a scheduled fire.

See Scheduling Routines for the walkthrough and Routines API Reference for the endpoint spec.

Next Steps

Quickstart Guide — Run your first runbook
Writing Runbooks — Full runbook structure, evaluation patterns, and the /create-runbook wizard
Scheduling Routines — Put a task on a recurring cadence
Workflow Orchestration — Chain multiple steps into pipelines
Chat Completions Reference — Full API spec

The Idea​

Sandbox Execution Lifecycle​

What Happens​

API Call​

Writing Runbooks​

Runbook Best Practices​

Trajectories​

Agents​

What You Can Build​

Scheduling Runbooks​

Next Steps​