Compare AI Models in 5 Minutes
Ask the same question to GPT-4, Claude, and Gemini - see how they differ.
What You'll Build
Question → GPT-4 → Response 1
→ Claude → Response 2
→ Gemini → Response 3
All three models run in parallel!
The Workflow
{
"init_params": {
"question": "Explain quantum computing in one sentence.",
"system_prompt": "You are a helpful teacher. Keep answers simple and accessible."
},
"step_configs": {
"gpt4": {
"model": "openai/gpt-4o",
"activity": "litellm_chat",
"user_prompt_path": "init_params.question",
"system_prompt_path": "init_params.system_prompt"
},
"claude": {
"model": "anthropic/claude-sonnet-4-20250514",
"activity": "litellm_chat",
"user_prompt_path": "init_params.question",
"system_prompt_path": "init_params.system_prompt"
},
"gemini": {
"model": "gemini/gemini-2.5-flash",
"activity": "litellm_chat",
"user_prompt_path": "init_params.question",
"system_prompt_path": "init_params.system_prompt"
}
},
"steps": ["gpt4", "claude", "gemini"]
}
Try It
- Copy the workflow above
- Paste into Jetty UI or run via API
- Change the
questionto test different prompts - Compare the three responses side-by-side
What You'll Learn
1. litellm_chat - Universal LLM connector
LiteLLM connects to 100+ models through a unified interface:
{
"activity": "litellm_chat",
"model": "openai/gpt-4o",
"user_prompt_path": "init_params.question"
}
2. Model naming conventions
| Provider | Format | Examples |
|---|---|---|
| OpenAI | openai/model-name | openai/gpt-4o, openai/gpt-4o-mini |
| Anthropic | anthropic/model-name | anthropic/claude-sonnet-4-20250514 |
gemini/model-name | gemini/gemini-2.5-flash, gemini/gemini-2.5-pro |
3. Path expressions
init_params.question references your input parameter:
"init_params": {
"question": "Your question here" ← This value
}
4. Parallel execution
All steps in the steps array run simultaneously when they don't depend on each other.
The Output
Each model's response is stored in its step outputs:
{
"gpt4": {
"outputs": {
"content": "Quantum computing uses quantum bits that can exist in multiple states simultaneously..."
}
},
"claude": {
"outputs": {
"content": "Quantum computing harnesses the principles of quantum mechanics..."
}
},
"gemini": {
"outputs": {
"content": "Quantum computing leverages quantum phenomena like superposition..."
}
}
}
Add a Judge to Pick the Best
Let an LLM decide which response is best:
{
"init_params": {
"question": "Explain quantum computing in one sentence.",
"system_prompt": "You are a helpful teacher. Keep answers simple."
},
"step_configs": {
"gpt4": {
"model": "openai/gpt-4o",
"activity": "litellm_chat",
"user_prompt_path": "init_params.question",
"system_prompt_path": "init_params.system_prompt"
},
"claude": {
"model": "anthropic/claude-sonnet-4-20250514",
"activity": "litellm_chat",
"user_prompt_path": "init_params.question",
"system_prompt_path": "init_params.system_prompt"
},
"gemini": {
"model": "gemini/gemini-2.5-flash",
"activity": "litellm_chat",
"user_prompt_path": "init_params.question",
"system_prompt_path": "init_params.system_prompt"
},
"judge": {
"model": "openai/gpt-4o",
"activity": "litellm_chat",
"user_prompt": "Compare these three explanations of quantum computing and pick the best one for a beginner:\n\n1. GPT-4: {{ gpt4.outputs.content }}\n\n2. Claude: {{ claude.outputs.content }}\n\n3. Gemini: {{ gemini.outputs.content }}\n\nWhich is clearest and why?"
}
},
"steps": ["gpt4", "claude", "gemini", "judge"]
}
Note: The judge step runs after the first three because it references their outputs.
Compare Different Model Versions
Test the same provider's models:
{
"step_configs": {
"gpt4": {
"model": "openai/gpt-4o",
"activity": "litellm_chat",
"user_prompt_path": "init_params.question"
},
"gpt4_mini": {
"model": "openai/gpt-4o-mini",
"activity": "litellm_chat",
"user_prompt_path": "init_params.question"
}
},
"steps": ["gpt4", "gpt4_mini"]
}
Available Configuration Options
{
"activity": "litellm_chat",
"model": "openai/gpt-4o",
"user_prompt_path": "init_params.question",
"system_prompt": "You are helpful.",
"temperature": 0.7,
"max_tokens": 1000,
"top_p": 0.9
}
| Parameter | Description | Default |
|---|---|---|
temperature | Creativity (0-2) | 1.0 |
max_tokens | Maximum response length | Model default |
top_p | Nucleus sampling | 1.0 |
system_prompt | System message | None |
Next Steps
- LLM Evaluation - Score and rank model outputs automatically
- Batch Processing - Compare models across many prompts
- Image Generation - Compare image generation models