Workflow Orchestration

Build AI data pipelines with built-in evaluation.

The Idea

Single-agent runs are powerful, but production workloads often need multiple steps: extract data, process it with an LLM, evaluate quality, branch on results. Jetty's workflow engine composes these into DAGs with 47+ step types, path expressions to wire data between steps, and durable execution that survives crashes.

Multi-Step Workflow DAG

Step Types

Category	Step Types	Purpose
AI Models	`litellm_chat`, image generation, vision, embeddings	LLM calls with any provider
Evaluation	`simple_judge`, structured evals, scoring	LLM-as-judge, criteria-based assessment
Control Flow	Branching, loops, iteration over collections	Conditional logic and repetition
Data Processing	Text templates, tool execution, file transforms	Transform and reshape data
Agent Execution	`runbook_agent`	Full sandboxed agent runs (see Agentic Workflows)

Path Expressions

Steps wire together through path expressions. One step's output becomes the next step's input:

step_a.outputs.files[0].path  →  becomes input for step_b
extract.outputs.content        →  text extracted by the LLM
evaluate.outputs.results[0]    →  first evaluation result

The workflow engine handles sequencing, error recovery, retry logic, and artifact management automatically.

Eval-Driven Quality Gates

The real power of orchestration is closing the loop between execution and evaluation.

The Pattern

Runbook Agent — Executes the task: generate code, analyze data, process documents. Outputs files + structured data.
LLM-as-Judge — Evaluates output against your criteria. Returns a score, explanation, pass/fail, and criteria breakdown.
Quality Gate — Branches on pass/fail. Pass = proceed. Fail = block + explain. Post status to PR.

Example Use Cases

Use Case	Agent Does	Judge Checks	Gate Action
AI Code Review	Diffs PR for bugs, security, style	Scores quality 1-5	Block merge if score < 3
Document Validation	Extracts info from uploads	Checks accuracy, completeness	Fail if critical fields missing
Test Generation	Generates tests for new code, runs in sandbox	Evaluates correctness, coverage	Gate on pass rate

Building Evaluation Datasets

Every workflow run produces a trajectory with full execution history. Over time, you accumulate a labeled dataset of quality evaluations:

Compare runs across models
Track quality trends
Replay failures
Train on outcomes (which recommendations were accepted vs. rejected)

CI for AI

Workflows integrate directly into your CI pipeline via GitHub Actions.

Two Endpoint Modes

Mode	Endpoint	Behavior
Sync	`/run-github-action/{collection}/{task}`	Blocks until complete. Best for fast checks within CI timeout.
Async	`/run-github-action-async/{collection}/{task}`	Returns immediately with poll URL. Webhook callback when done. For long-running tasks.

See CI Integration Guide for a step-by-step setup walkthrough.

Next Steps

CI Integration Guide — Set up GitHub Actions with Jetty
Jetty Agent — Close the loop with telemetry-driven improvements
Webhook Reference — Webhook callback schema and verification

The Idea​

Multi-Step Workflow DAG​

Step Types​

Path Expressions​

Eval-Driven Quality Gates​

The Pattern​

Example Use Cases​

Building Evaluation Datasets​

CI for AI​

Two Endpoint Modes​

Next Steps​