Automating Image Generation with Precision: A Developer’s Guide to the Image Generation Agent

Here’s a friendly, conversational guide to help you—and your AI team—get up and running with the Image Generation Agent for producing precise, stunning AI-generated imagery without all the manual loop-closing. You’ll see how to install the toolchain, configure your API keys, launch the backend and frontend, and even customize the prompt-refinement loop—all sprinkled with code snippets and real-world examples.
Overview
The Image Generation Agent is an open-source project designed to automate the entire prompt-refinement → image-generation → visual-feedback loop, so you can focus on creativity rather than copy-paste cycles . Under the hood, it leverages two core tools:
generate_image
, which taps the OpenAI image-generation API to turn your textual prompts into pixels .evaluate_generated_image
, which uses Google’s Gemini vision capabilities to judge whether the image truly matches your intent—and, if not, automatically tweaks and retries .
You’ll interact via a lightweight Gradio UI, watching the agent call tools, evaluate outputs, refine prompts, and ultimately hand you the final art piece—all in a few seconds.
Getting Started
Prerequisites
- Python 3.8+ (any recent version will do).
- UV: a blazing-fast pip replacement and virtual environment manager written in Rust. UV replaces pip, pip-tools, virtualenv, and more, giving you 10–100× faster installs and a universal lockfile .
- API keys for OpenAI and Google Cloud Vision/Gemini.
Installing UV
If you don’t already have UV, grab it via pip:
pip install uv
(This uses the PyPI build; see the UV docs for standalone installers, Docker images, or Homebrew packages if you prefer .)
Cloning and Bootstrapping
# 1. Clone the repo
git clone https://github.com/run-llama/image-generation-agent
cd image-generation-agent
# 2. Sync dependencies & create a virtual environment
uv sync
source .venv/bin/activate
Congrats—you now have a clean, isolated environment with everything you need .
Configuring API Keys
Navigate into the scripts/
folder and rename the example environment file:
cd scripts
mv .env.example .env
Then edit .env
(or export vars directly) to include:
OPENAI_API_KEY="sk-…"
GOOGLE_API_KEY="AIza…"
This lets the agent call both the OpenAI image API and Google’s vision evaluator .
Launching the Agent
You’ll run two processes—backend and frontend—in separate terminals:
1. Start the WebSocket Backend
# (still in scripts/, venv active)
python3 server.py
You’ll see a log confirming the WebSocket server is listening on port 8765 .
2. Launch the Gradio Frontend
Open a new terminal (with .venv
activated) and run:
python3 client.py
Point your browser to http://localhost:7860 and voilà—you have a friendly UI to chat with your Image Generation Agent .
Live Example: “Llama Painter”
Let’s have some fun. In the Gradio UI, type:
“A llama painter drawing mountains on a canvas.”
Hit Generate, and you’ll see:
- Agent starts—calls
generate_image
with your prompt. - Intermediate images—displayed as the agent fine-tunes the prompt.
- Evaluation—the agent invokes
evaluate_generated_image
to check faithfulness. - Final output—a majestic llama wielding a brush, painting alpine vistas.
All of these steps happen automatically, with status updates in the UI, so you can sit back with a ☕ and watch the magic unfold.
Deep Dive: How the Loop Works
- Initial Prompt →
generate_image
kicks off an OpenAI image-generation call. - Quality Check → the
evaluate_generated_image
tool scores the result for prompt fidelity. - Decision
- Pass → immediately return image to user.
- Fail → refine prompt (e.g., “add more brush-stroke detail”) and repeat.
This continues until the agent either achieves a satisfactory score or exhausts its refinement budget—usually just one or two extra rounds. You can tweak the refinement logic by editing scripts/server.py
in the ImageGenerationAgent
class for more rounds or different scoring thresholds.
Customization & Pro Tips
- Swap models: Out of the box it uses OpenAI’s latest diffusers API, but you can point
generate_image
at any compatible Hugging Face endpoint by subclassing the tool spec. - Tune refinements: Adjust the scoring function in
evaluate_generated_image
to penalize color drift or reward higher resolution detail. - Batch mode: Modify the client to send multiple prompts in parallel—ideal for storyboard generation.
- CLI integration: Instead of Gradio, build your own CLI wrapper by importing the agent directly:
from agent import ImageGenerationAgent
agent = ImageGenerationAgent(...)
img = agent.generate_and_evaluate("A futuristic city skyline at sunset")
img.save("sunset_city.png")
Wrapping Up
With just a few commands and minimal config, you can automate the tedious trial-and-error of image-AI workflows—freeing you to iterate on ideas, not prompts. Give it a spin, share your feedback, and watch your next marketing creative or in-house prototype spring to life with precision and style!
Happy generating!
Cohorte Team
May 27, 2025