Automating Image Generation with Precision: A Developer’s Guide to the Image Generation Agent

The Image Generation Agent automates prompt refinement, image generation, and evaluation—all in one intelligent loop. This guide walks developers and AI leaders through setup, usage, and customization with hands-on examples. Learn how to generate high-quality images aligned with intent, without endless retries. Ideal for teams looking to streamline their creative workflows with a touch of empathy and efficiency.

Here’s a friendly, conversational guide to help you—and your AI team—get up and running with the Image Generation Agent for producing precise, stunning AI-generated imagery without all the manual loop-closing. You’ll see how to install the toolchain, configure your API keys, launch the backend and frontend, and even customize the prompt-refinement loop—all sprinkled with code snippets and real-world examples.

Overview

The Image Generation Agent is an open-source project designed to automate the entire prompt-refinement → image-generation → visual-feedback loop, so you can focus on creativity rather than copy-paste cycles . Under the hood, it leverages two core tools:

  1. generate_image, which taps the OpenAI image-generation API to turn your textual prompts into pixels .
  2. evaluate_generated_image, which uses Google’s Gemini vision capabilities to judge whether the image truly matches your intent—and, if not, automatically tweaks and retries .

You’ll interact via a lightweight Gradio UI, watching the agent call tools, evaluate outputs, refine prompts, and ultimately hand you the final art piece—all in a few seconds.

Getting Started

Prerequisites

  • Python 3.8+ (any recent version will do).
  • UV: a blazing-fast pip replacement and virtual environment manager written in Rust. UV replaces pip, pip-tools, virtualenv, and more, giving you 10–100× faster installs and a universal lockfile .
  • API keys for OpenAI and Google Cloud Vision/Gemini.

Installing UV

If you don’t already have UV, grab it via pip:

Python
pip install uv

(This uses the PyPI build; see the UV docs for standalone installers, Docker images, or Homebrew packages if you prefer .)

Cloning and Bootstrapping

Python
# 1. Clone the repo
git clone https://github.com/run-llama/image-generation-agent
cd image-generation-agent

# 2. Sync dependencies & create a virtual environment
uv sync
source .venv/bin/activate

Congrats—you now have a clean, isolated environment with everything you need .

Configuring API Keys

Navigate into the scripts/ folder and rename the example environment file:

Python
cd scripts
mv .env.example .env

Then edit .env (or export vars directly) to include:

Python
OPENAI_API_KEY="sk-…"
GOOGLE_API_KEY="AIza…"

This lets the agent call both the OpenAI image API and Google’s vision evaluator .

Launching the Agent

You’ll run two processes—backend and frontend—in separate terminals:

1. Start the WebSocket Backend

Python
# (still in scripts/, venv active)
python3 server.py

You’ll see a log confirming the WebSocket server is listening on port 8765 .

2. Launch the Gradio Frontend

Open a new terminal (with .venv activated) and run:

Python
python3 client.py

Point your browser to http://localhost:7860 and voilà—you have a friendly UI to chat with your Image Generation Agent .

Live Example: “Llama Painter”

Let’s have some fun. In the Gradio UI, type:

“A llama painter drawing mountains on a canvas.”

Hit Generate, and you’ll see:

  1. Agent starts—calls generate_image with your prompt.
  2. Intermediate images—displayed as the agent fine-tunes the prompt.
  3. Evaluation—the agent invokes evaluate_generated_image to check faithfulness.
  4. Final output—a majestic llama wielding a brush, painting alpine vistas.

All of these steps happen automatically, with status updates in the UI, so you can sit back with a ☕ and watch the magic unfold.

Deep Dive: How the Loop Works

  1. Initial Promptgenerate_image kicks off an OpenAI image-generation call.
  2. Quality Check → the evaluate_generated_image tool scores the result for prompt fidelity.
  3. Decision
    • Pass → immediately return image to user.
    • Fail → refine prompt (e.g., “add more brush-stroke detail”) and repeat.

This continues until the agent either achieves a satisfactory score or exhausts its refinement budget—usually just one or two extra rounds. You can tweak the refinement logic by editing scripts/server.py in the ImageGenerationAgent class for more rounds or different scoring thresholds.

Customization & Pro Tips

  • Swap models: Out of the box it uses OpenAI’s latest diffusers API, but you can point generate_image at any compatible Hugging Face endpoint by subclassing the tool spec.
  • Tune refinements: Adjust the scoring function in evaluate_generated_image to penalize color drift or reward higher resolution detail.
  • Batch mode: Modify the client to send multiple prompts in parallel—ideal for storyboard generation.
  • CLI integration: Instead of Gradio, build your own CLI wrapper by importing the agent directly:
Python
from agent import ImageGenerationAgent

agent = ImageGenerationAgent(...)
img = agent.generate_and_evaluate("A futuristic city skyline at sunset")
img.save("sunset_city.png")

Wrapping Up

With just a few commands and minimal config, you can automate the tedious trial-and-error of image-AI workflows—freeing you to iterate on ideas, not prompts. Give it a spin, share your feedback, and watch your next marketing creative or in-house prototype spring to life with precision and style!

Happy generating!

Cohorte Team

May 27, 2025