Generate & Evaluate Images in 5 Minutes
Generate an image and automatically check for quality issues using LLM-as-Judge.
What You'll Build
Text Prompt → Generate Image → Quality Check → Score + Explanation
The Workflow
{
"init_params": {
"prompt": "A professional headshot of a business person",
"model": "black-forest-labs/flux-schnell"
},
"step_configs": {
"generate": {
"activity": "replicate_text2image",
"model_path": "init_params.model",
"prompt_path": "init_params.prompt",
"aspect_ratio": "1:1",
"output_format": "jpg"
},
"quality_check": {
"model": "gpt-4o",
"activity": "simple_judge",
"items_path": "generate.outputs.images[0].path",
"judge_type": "scale",
"instruction": "Rate the professionalism and quality of this headshot for business use.",
"scale_range": [1, 5],
"model_provider": "openai"
}
},
"steps": ["generate", "quality_check"]
}
Try It
- Copy the workflow above
- Paste into Jetty UI or run via API
- Change the prompt to generate different images
- See the quality score and explanation
What You'll Learn
1. replicate_text2image - Generate images
{
"activity": "replicate_text2image",
"model": "black-forest-labs/flux-schnell",
"prompt_path": "init_params.prompt",
"aspect_ratio": "16:9",
"output_format": "jpg"
}
Popular models:
| Model | Best For |
|---|---|
black-forest-labs/flux-schnell | Fast iterations |
black-forest-labs/flux-kontext-pro | Production quality |
ideogram-ai/ideogram-v2-turbo | Text in images |
2. simple_judge - Evaluate with GPT-4 Vision
{
"activity": "simple_judge",
"model": "gpt-4o",
"items_path": "generate.outputs.images[0].path",
"judge_type": "scale",
"scale_range": [1, 5],
"instruction": "Your evaluation criteria here"
}
3. Chaining steps with path expressions
The magic: generate.outputs.images[0].path passes the generated image to the judge.
generate step → outputs.images[0].path → quality_check step
The Output
{
"quality_check": {
"outputs": {
"rating": "4",
"explanation": "Professional lighting and composition. Good eye contact. Minor improvement possible in background simplicity.",
"average_score": 4.0
}
}
}
Common Evaluation Criteria
Swap out the instruction for different use cases:
| Use Case | Instruction |
|---|---|
| Brand compliance | "Does this image match corporate brand guidelines? Check colors, style, and professionalism." |
| IP risk | "Could this image infringe on any intellectual property? Look for recognizable logos, characters, or copyrighted elements." |
| Content safety | "Does this image contain inappropriate, offensive, or potentially harmful content?" |
| Accessibility | "Is any text in this image readable and accessible? Check contrast and font size." |
| Prompt adherence | "How accurately does this image depict the original prompt?" |
Add Multiple Evaluation Criteria
Run several judges in parallel:
{
"init_params": {
"prompt": "A logo design for a tech startup"
},
"step_configs": {
"generate": {
"activity": "replicate_text2image",
"model": "black-forest-labs/flux-schnell",
"prompt_path": "init_params.prompt"
},
"brand_check": {
"activity": "simple_judge",
"model": "gpt-4o",
"items_path": "generate.outputs.images[0].path",
"judge_type": "scale",
"scale_range": [1, 5],
"instruction": "Rate the brand professionalism and memorability.",
"model_provider": "openai"
},
"ip_check": {
"activity": "simple_judge",
"model": "gpt-4o",
"items_path": "generate.outputs.images[0].path",
"judge_type": "scale",
"scale_range": [0, 1],
"instruction": "Is there potential IP infringement risk?",
"model_provider": "openai"
}
},
"steps": ["generate", "brand_check", "ip_check"]
}
Next Steps
- Model Comparison - Compare outputs from different image models
- Batch Processing - Generate and evaluate hundreds of images
- LLM Evaluation - Evaluate text outputs instead of images