Generate & Evaluate Images in 5 Minutes

Generate an image and automatically check for quality issues using LLM-as-Judge.

What You'll Build

Text Prompt → Generate Image → Quality Check → Score + Explanation

The Workflow

{
  "init_params": {
    "prompt": "A professional headshot of a business person",
    "model": "black-forest-labs/flux-schnell"
  },
  "step_configs": {
    "generate": {
      "activity": "replicate_text2image",
      "model_path": "init_params.model",
      "prompt_path": "init_params.prompt",
      "aspect_ratio": "1:1",
      "output_format": "jpg"
    },
    "quality_check": {
      "model": "gpt-4o",
      "activity": "simple_judge",
      "items_path": "generate.outputs.images[0].path",
      "judge_type": "scale",
      "instruction": "Rate the professionalism and quality of this headshot for business use.",
      "scale_range": [1, 5],
      "model_provider": "openai"
    }
  },
  "steps": ["generate", "quality_check"]
}

Try It

Copy the workflow above
Paste into Jetty UI or run via API
Change the prompt to generate different images
See the quality score and explanation

What You'll Learn

1. `replicate_text2image` - Generate images

{
  "activity": "replicate_text2image",
  "model": "black-forest-labs/flux-schnell",
  "prompt_path": "init_params.prompt",
  "aspect_ratio": "16:9",
  "output_format": "jpg"
}

Popular models:

Model	Best For
`black-forest-labs/flux-schnell`	Fast iterations
`black-forest-labs/flux-kontext-pro`	Production quality
`ideogram-ai/ideogram-v2-turbo`	Text in images

2. `simple_judge` - Evaluate with GPT-4 Vision

{
  "activity": "simple_judge",
  "model": "gpt-4o",
  "items_path": "generate.outputs.images[0].path",
  "judge_type": "scale",
  "scale_range": [1, 5],
  "instruction": "Your evaluation criteria here"
}

3. Chaining steps with path expressions

The magic: generate.outputs.images[0].path passes the generated image to the judge.

generate step → outputs.images[0].path → quality_check step

The Output

{
  "quality_check": {
    "outputs": {
      "rating": "4",
      "explanation": "Professional lighting and composition. Good eye contact. Minor improvement possible in background simplicity.",
      "average_score": 4.0
    }
  }
}

Common Evaluation Criteria

Swap out the instruction for different use cases:

Use Case	Instruction
Brand compliance	"Does this image match corporate brand guidelines? Check colors, style, and professionalism."
IP risk	"Could this image infringe on any intellectual property? Look for recognizable logos, characters, or copyrighted elements."
Content safety	"Does this image contain inappropriate, offensive, or potentially harmful content?"
Accessibility	"Is any text in this image readable and accessible? Check contrast and font size."
Prompt adherence	"How accurately does this image depict the original prompt?"

Add Multiple Evaluation Criteria

Run several judges in parallel:

{
  "init_params": {
    "prompt": "A logo design for a tech startup"
  },
  "step_configs": {
    "generate": {
      "activity": "replicate_text2image",
      "model": "black-forest-labs/flux-schnell",
      "prompt_path": "init_params.prompt"
    },
    "brand_check": {
      "activity": "simple_judge",
      "model": "gpt-4o",
      "items_path": "generate.outputs.images[0].path",
      "judge_type": "scale",
      "scale_range": [1, 5],
      "instruction": "Rate the brand professionalism and memorability.",
      "model_provider": "openai"
    },
    "ip_check": {
      "activity": "simple_judge",
      "model": "gpt-4o",
      "items_path": "generate.outputs.images[0].path",
      "judge_type": "scale",
      "scale_range": [0, 1],
      "instruction": "Is there potential IP infringement risk?",
      "model_provider": "openai"
    }
  },
  "steps": ["generate", "brand_check", "ip_check"]
}

Next Steps

Model Comparison - Compare outputs from different image models
Batch Processing - Generate and evaluate hundreds of images
LLM Evaluation - Evaluate text outputs instead of images

What You'll Build​

The Workflow​

Try It​

What You'll Learn​

1. replicate_text2image - Generate images​

2. simple_judge - Evaluate with GPT-4 Vision​

3. Chaining steps with path expressions​

The Output​

Common Evaluation Criteria​

Add Multiple Evaluation Criteria​

Next Steps​