Workflow Orchestration
Build AI data pipelines with built-in evaluation.
The Idea
Single-agent runs are powerful, but production workloads often need multiple steps: extract data, process it with an LLM, evaluate quality, branch on results. Jetty's workflow engine composes these into DAGs with 47+ step types, path expressions to wire data between steps, and durable execution that survives crashes.
Multi-Step Workflow DAG
Step Types
| Category | Step Types | Purpose |
|---|---|---|
| AI Models | litellm_chat, image generation, vision, embeddings | LLM calls with any provider |
| Evaluation | simple_judge, structured evals, scoring | LLM-as-judge, criteria-based assessment |
| Control Flow | Branching, loops, iteration over collections | Conditional logic and repetition |
| Data Processing | Text templates, tool execution, file transforms | Transform and reshape data |
| Agent Execution | runbook_agent | Full sandboxed agent runs (see Agentic Workflows) |
Path Expressions
Steps wire together through path expressions. One step's output becomes the next step's input:
step_a.outputs.files[0].path → becomes input for step_b
extract.outputs.content → text extracted by the LLM
evaluate.outputs.results[0] → first evaluation result
The workflow engine handles sequencing, error recovery, retry logic, and artifact management automatically.
Eval-Driven Quality Gates
The real power of orchestration is closing the loop between execution and evaluation.
The Pattern
- Runbook Agent — Executes the task: generate code, analyze data, process documents. Outputs files + structured data.
- LLM-as-Judge — Evaluates output against your criteria. Returns a score, explanation, pass/fail, and criteria breakdown.
- Quality Gate — Branches on pass/fail. Pass = proceed. Fail = block + explain. Post status to PR.
Example Use Cases
| Use Case | Agent Does | Judge Checks | Gate Action |
|---|---|---|---|
| AI Code Review | Diffs PR for bugs, security, style | Scores quality 1-5 | Block merge if score < 3 |
| Document Validation | Extracts info from uploads | Checks accuracy, completeness | Fail if critical fields missing |
| Test Generation | Generates tests for new code, runs in sandbox | Evaluates correctness, coverage | Gate on pass rate |
Building Evaluation Datasets
Every workflow run produces a trajectory with full execution history. Over time, you accumulate a labeled dataset of quality evaluations:
- Compare runs across models
- Track quality trends
- Replay failures
- Train on outcomes (which recommendations were accepted vs. rejected)
CI for AI
Workflows integrate directly into your CI pipeline via GitHub Actions.
Two Endpoint Modes
| Mode | Endpoint | Behavior |
|---|---|---|
| Sync | /run-github-action/{collection}/{task} | Blocks until complete. Best for fast checks within CI timeout. |
| Async | /run-github-action-async/{collection}/{task} | Returns immediately with poll URL. Webhook callback when done. For long-running tasks. |
See CI Integration Guide for a step-by-step setup walkthrough.
Next Steps
- CI Integration Guide — Set up GitHub Actions with Jetty
- Jetty Agent — Close the loop with telemetry-driven improvements
- Webhook Reference — Webhook callback schema and verification