Unlocking Image Creation with Flux and GPT-4o

This week I decided to see how many buzzwords I could get into one post, given the recent release of Flux by Black Forest Labs I had originally intended on trying to get it up and running locally - however it quickly became clear that my M3 MacBook Pro with its 36GB of RAM wasn’t going to cut it. Because of this, I decided to look at one of the many online services that offer access to the various Flux models via their APIs which led me to Fal.AI which in turn gave me the idea for this post.

The Tools

Before we dive into the idea I had, let’s quickly get ourselves up to speed with the tools we’ll be using in this post. Each of these technologies plays a crucial role in our AI-powered image generation project.

Flux.1: The new kid on the block

A team of AI researchers and engineers, renowned for their work on foundational generative AI models like VQGAN and Latent Diffusion across academic, industrial, and open-source platforms, surprised everyone with the announcement of Black Forest Labs↗ .

Alongside their Series Seed funding round of $31 million, they also revealed the immediate availability of three variations of their Flux.1 model:

FLUX.1 [pro]: The premier version of the FLUX.1 model, offering state-of-the-art image generation with exceptional prompt following, visual quality, image detail, and output diversity. FLUX.1 [pro] is accessible via API, and is also available through platforms such as Replicate and fal.ai. Additionally, it supports dedicated and customized enterprise solutions.
FLUX.1 [dev]: An open-weight, guidance-distilled model intended for non-commercial applications. Derived directly from FLUX.1 [pro], FLUX.1 [dev] provides similar quality and prompt adherence capabilities while being more efficient than standard models of the same size. The model’s weights are available top download from Hugging Face↗ , and it can be executed on Replicate↗ and fal.ai↗ .
FLUX.1 [schnell]: Designed as the fastest model in the lineup, FLUX.1 [schnell] is optimized for local development and personal use. It is freely available under an Apache 2.0 license, with weights hosted on Hugging Face. Inference code can be found on GitHub and in Hugging Face’s Diffusers, and the model is integrated with ComfyUI from day one.

For an idea of what you can produce have a look at the example images below, which were all generated using the tools we are going to be covering in this post:

Sample image #2: A group of teenagers, dressed in typical 1980s attire with vibrant colors and iconic fashion of the era, huddle together in a classic American diner. The diner has a retro aesthetic featuring checkerboard floors, red vinyl booths, and chrome accents. They are deeply engaged, animatedly discussing their next move in a comedic adventure. Behind them, a glowing neon sign reads Russ.Cloud in flamboyant cursive letters, casting a warm, colorful glow over the scene.

Sample image #2: A vibrant and detailed tilt-shift photograph capturing the bustling energy of a busy urban street during morning rush hour. The cityscape is bathed in warm, golden sunlight, casting long shadows as streams of people hurry along the sidewalks and crosswalks. Tiny, miniature-like cars move in an organized chaos, while a mix of modern and historic buildings rise in the background. The scene is alive with the movement and vibrancy of city life, enhanced by the glowing sunlight reflecting off windows and surfaces.

Sample image #3: A disheveled mad scientist stands in a chaotic laboratory overflowing with peculiar, bubbling potions and bizarre, steampunk contraptions. The room is dimly lit, casting eerie shadows across the clutter. Bright, multicolored smoke wafts from a test tube and artistically swirls into the psychedelic word russ.cloud above his head. The scientist has wild hair, goggles perched on his forehead, and an intense expression of discovery.

Sample image #4: A diverse cast of astronauts and scientists dressed in vintage 1970s space suits, standing on the metallic deck of a futuristic spaceship. The scene is bathed in the soft glow of retro, colorful lighting with vintage 1970s special effects. The backdrop features a massive window showcasing distant planets and stars in a dramatic, cinematic style. The characters are poised with expressions of anticipation and bravery, ready to embark on an epic space adventure.

As you can see, it can not only handle text extremely well, but it also produces high-quality photo-realistic images as well as some abstract ones.

fal.ai: Doing the heavy lifting

Also already mentioned, running even the small model locally was out of the question so I decided to look at one of the two original partners providing all three models in the Flux.1 family, I chose to focus on fal.ai↗ as I had used them previously to test another model earlier in the year.

Creating an image using cURL

export FAL_KEY="YOUR_API_KEY"
curl --request POST \
  --url https://fal.run/fal-ai/flux-pro \
  --header "Authorization: Key $FAL_KEY" \
  --header "Content-Type: application/json" \
  --data '{
     "prompt": "Extreme close-up of a single tiger eye, direct frontal view. Detailed iris and pupil. Sharp focus on eye texture and color. Natural lighting to capture authentic eye shine and depth. The word \"FLUX\" is painted over it in big, white brush strokes with visible texture."
   }'

This returned the following JSON response:

The results of the image

Lines: 17 Charaters: 658 Language: JSON

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
    "images": [
        {
            "url": "https://fal.media/files/panda/oGSuf_FrMn7im_0DGnSdg_4378270c342f4170bb0f55e14cbb0636.jpg",
            "width": 1024,
            "height": 768,
            "content_type": "image/jpeg"
        }
    ],
    "timings": {},
    "seed": 932194184,
    "has_nsfw_concepts": [
        false
    ],
    "prompt": "Extreme close-up of a single tiger eye, direct frontal view. Detailed iris and pupil. Sharp focus on eye texture and color. Natural lighting to capture authentic eye shine and depth. The word \"FLUX\" is painted over it in big, white brush strokes with visible texture."
}

Opening the image in the response gave us the following:

Extreme close-up of a single tiger eye, direct frontal view. Detailed iris and pupil. Sharp focus on eye texture and color. Natural lighting to capture authentic eye shine and depth. The word FLUX is painted over it in big, white brush strokes with visible texture.

This got me thinking - there is a Python library for fal.ai so I could use that, also what other tools have Python libraries I can use alongside fai.ai?

OpenAI: Building a Better Prompt

OpenAI↗ shouldn’t need an introduction, we will be using the GPT4o models↗ to help with prompting.

Streamlit: Bring it all together

Streamlit↗ is a Python framework from Snowflake↗ that quickly lets you build data and AI-driven web applications with a really low entry point, see the following for a quick overview:

There was just one problem with this, I am not a developer - so how could I do this?

Claude: Your personal development team

The final tool used in this post is Claude↗ from Anthropic↗ , while I may not be a developer - I can prompt and debug code and since the release of Claude 3.5 Sonnet↗ I have been using it more and more for development tasks.

So let’s give it a go !!!

The Idea

The idea I had was to use Streamlit to build an interface to interact with the fal.ai API and the Flux family of models from my local machine - as part of this it should also use GPT4o to help tune the prompt to give us the best shot at getting a good result from Flux.

Running the application

Before we dive into the code, let’s just download and run the app.

Getting the keys

To run the application you will need two API keys, one for fal.ai and the other for the OpenAI API, the links below:

Make a note of the keys and keep them very safe.

Creating the environment

Next up we need to create an environment to run the application in and grab the code from GitHub↗ .

Info

I use Conda to run Python on my local machine, you can find my blog post about installing and using Conda here.

Run the commands below to create the conda environment, switch to it, check the code out from GitHub↗ and install the required Python packages:

Preparing the environment

conda create -n streamlit python=3.12
conda activate streamlit
git clone https://github.com/russmckendrick/flux-fal-openai-streamlit.git
cd flux-fal-openai-streamlit
pip install -r requirements.txt

Launching the application

We are nearly ready to launch the application, just one more thing to do and that is the expose our API keys as an environment variable for our application to read:

Exporting the API keys

export FAL_KEY="<put-your-fal.ai-key-here>"
export OPENAI_API_KEY='<put-your-openai-key-here>'

With the keys in place we can launch the application by running the command below:

Exporting the API keys

streamlit run app.py

This will open the application window, which should look something like the following:

Using the application

Now we have the application open, let’s generate an image, first let’s write a basic prompt:

Initial Prompt:

A magical cityscape

Just clicking “☁️ Generate image” at this point will give you an impressive-looking image without any tuning, below is the image which was generated using the basic prompt above:

Lets now get GPT-4o to expand on our basic prompt. First, click on “Use OpenAI for prompt tuning” and then select your chosen GPT-4o model from the drop-down, in this example, I will use the default of gpt-4o rather than gpt-4o-mini, then click on the “✏️ Tune Prompt” button.

This will send your original prompt to the model and not only return a new prompt, which in this run was:

Tuned Prompt:

A sprawling cityscape filled with fantastical architecture, where towering crystal spires and floating gardens intertwine with ancient, ivy-covered stone buildings. The streets are aglow with bioluminescent plants and bustling with mythical creatures such as griffins and fairies. In the sky, airships float alongside dragons, under a sky painted with auroras and dotted with twinkling stars and moons.

You will also get if you click on “See explanation and suggestions” more details on why the prompt has been updated:

Explanation of changes:

The original prompt was very general and lacked specificity. By adding details such as “towering crystal spires,” “floating gardens,” “ancient, ivy-covered stone buildings,” and “bioluminescent plants,” the image becomes more vivid and specific. Including mythical creatures like griffins and fairies adds an element of magic and life to the city. The mention of airships, dragons, auroras, and multiple moons helps to set a more fantastical and immersive atmosphere.

As well as more suggestions on how you could further improve the prompt:

Additional suggestions:

To further enhance the prompt, consider specifying the time of day or the overall mood you want to convey (e.g., twilight, festive, serene). You could also indicate the cultural influences or architectural styles you envision (e.g., “influenced by ancient Greek and futuristic aesthetics”).

If you are happy with the new prompt, click on “✅ Accept Tuned Prompt” and then click the “☁️ Generate image” button, after a minute you will get your image back:

Tuned prompt: A sprawling cityscape filled with fantastical architecture, where towering crystal spires and floating gardens intertwine with ancient, ivy-covered stone buildings. The streets are aglow with bioluminescent plants and bustling with mythical creatures such as griffins and fairies. In the sky, airships float alongside dragons, under a sky painted with auroras and dotted with twinkling stars and moons.

The screens below show how the process above looks within the application:

The tuned prompt, explanation and recommendations

As you can see from the last picture, a copy of the image and a markdown summary have been saved to your local machine.

Code Highlights

Let’s take a closer look at some key parts of our application’s code. These snippets highlight the core functionality and demonstrate how we’re integrating the various tools we’ve discussed. By examining these, you’ll get a better understanding of how the different components work together to create our AI-powered image generation app.

tune_prompt_with_openai

Lines: 21 Charaters: 1087 Language: Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
def tune_prompt_with_openai(prompt, model):
    openai_api_key = os.getenv("OPENAI_API_KEY")
    if not openai_api_key:
        raise ValueError("OPENAI_API_KEY environment variable is not set")
    
    client = openai.OpenAI(api_key=openai_api_key)
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": "You are an advanced AI assistant specialized in refining and enhancing image generation prompts. Your goal is to help users create more effective, detailed, and creative prompts for high-quality images. Respond with: 1) An improved prompt (prefix with 'PROMPT:'), 2) Explanation of changes (prefix with 'EXPLANATION:'), and 3) Additional suggestions (prefix with 'SUGGESTIONS:'). Each section should be on a new line."
            },
            {
                "role": "user",
                "content": f"Improve this image generation prompt: {prompt}"
            }
        ]
    )
    return response.choices[0].message.content.strip()

As you can see from the system prompt below, we are instructing the model to not only generate the prompt but also everything else and how it should format its response:

The system prompt

You are an advanced AI assistant specialized in refining and enhancing image generation prompts. Your goal is to help users create more effective, detailed, and creative prompts for high-quality images.

Respond with:

An improved prompt (prefix with ‘PROMPT:’),
Explanation of changes (prefix with ‘EXPLANATION:’),
Additional suggestions (prefix with ‘SUGGESTIONS:’).

Each section should be on a new line.

This means we only make one call to the model and also we have sections we can identify when we receive the response.

As the images may take a little while to generate I had Claude add some messages to cycle through rather than us just looking at a boring spinner:

cycle_spinner_messages

Lines: 14 Charaters: 554 Language: Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def cycle_spinner_messages():
    messages = [
        "🎨 Mixing colors...",
        "✨ Sprinkling creativity dust...",
        "🖌️ Applying artistic strokes...",
        "🌈 Infusing with vibrant hues...",
        "🔍 Focusing on details...",
        "🖼️ Framing the masterpiece...",
        "🌟 Adding that special touch...",
        "🎭 Bringing characters to life...",
        "🏙️ Building the scene...",
        "🌅 Setting the perfect mood...",
    ]
    return itertools.cycle(messages)

When I said Streamlit made creating web applications really easy, I wasn’t kidding. The code below generates the bulk of what you see on the initial screen:

main()

Lines: 47 Charaters: 2141 Language: Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def main():
    st.title("🤖 Image Generation with fal.ai & Flux")

    # Check for environment variables
    if not os.getenv("FAL_KEY"):
        st.error("FAL_KEY environment variable is not set. Please set it before running the app.")
        return

    # Model selection dropdown
    model_options = {
        "Flux Pro": "fal-ai/flux-pro",
        "Flux Dev": "fal-ai/flux/dev",
        "Flux Schnell": "fal-ai/flux/schnell",
        "Flux Realism": "fal-ai/flux-realism"
    }
    selected_model = st.selectbox("Select Model:", list(model_options.keys()), index=0)

    # Basic parameters
    image_size = st.selectbox("Image Size:", ["square_hd", "square", "portrait_4_3", "portrait_16_9", "landscape_4_3", "landscape_16_9"], index=0)
    num_inference_steps = st.slider("Number of Inference Steps:", min_value=1, max_value=50, value=28)

    # Advanced configuration in an expander
    with st.expander("Advanced Configuration", expanded=False):
        guidance_scale = st.slider("Guidance Scale:", min_value=1.0, max_value=20.0, value=3.5, step=0.1)
        safety_tolerance = st.selectbox("Safety Tolerance:", ["1", "2", "3", "4"], index=1)

    # Initialize session state
    if 'user_prompt' not in st.session_state:
        st.session_state.user_prompt = ""
    if 'tuned_prompt' not in st.session_state:
        st.session_state.tuned_prompt = ""
    if 'prompt_accepted' not in st.session_state:
        st.session_state.prompt_accepted = False

    # User input for the prompt
    user_prompt = st.text_input("Enter your image prompt:", value=st.session_state.user_prompt)

    # Update session state when user types in the input field
    if user_prompt != st.session_state.user_prompt:
        st.session_state.user_prompt = user_prompt
        st.session_state.prompt_accepted = False

    # OpenAI prompt tuning options
    use_openai_tuning = st.checkbox("Use OpenAI for prompt tuning", value=False)
    
    openai_model_options = ["gpt-4o", "gpt-4o-mini"]
    selected_openai_model = st.selectbox("Select OpenAI Model:", openai_model_options, index=0, disabled=not use_openai_tuning)

The drop-downs, text input boxes, sliders and the hidden advanced settings are all native Streamlit components which “just work”, also being all native it gives the application a nice consistent look when other things happen like:

Generating image

Lines: 3 Charaters: 174 Language: Python

1
2
3
        # Display the prompt being used
        st.subheader("☁️ Generating image with the following prompt:")
        st.info(user_prompt)

Flux.1 [pro] …

Link to post

Conclusion

Now that you’ve seen the potential of combining Flux, fal.ai, OpenAI, and Streamlit, why not give it a try yourself? Clone the repository, set up your environment, and start experimenting with your own prompts. Whether you’re a developer looking to build on this framework or a creative professional curious about AI-assisted image generation, there’s plenty of room for exploration and innovation. Don’t forget to share your experiences or any cool images you generate – I’d love to see what you come up with!

The Tools#

Flux.1: The new kid on the block#

fal.ai: Doing the heavy lifting#

OpenAI: Building a Better Prompt#

Streamlit: Bring it all together#

Claude: Your personal development team#

The Idea#

Running the application#

Getting the keys#

Creating the environment#

Launching the application#

Using the application#

Code Highlights#

Some more images#

Generating an Azure Storage Account SAS token using Azure Logic and Function apps#

Azure Firewall KQL Query#

Azure Virtual Desktop KQL Queries#

Azure DevOps Ansible Pipeline; Boosting Efficiency with Caching#

Day to Day Tools, the 2024 edition#

Conda for Python environment management on macOS#

Conclusion#

The Tools

Flux.1: The new kid on the block

fal.ai: Doing the heavy lifting

OpenAI: Building a Better Prompt

Streamlit: Bring it all together

Claude: Your personal development team

The Idea

Running the application

Getting the keys

Creating the environment

Launching the application

Using the application

Code Highlights

Some more images

Generating an Azure Storage Account SAS token using Azure Logic and Function apps

Azure Firewall KQL Query

Azure Virtual Desktop KQL Queries

Azure DevOps Ansible Pipeline; Boosting Efficiency with Caching

Day to Day Tools, the 2024 edition

Conda for Python environment management on macOS

Conclusion