Me vs GPT-4o Image Generation

Can GPT-4o Replace a Human in Photoshop? Once again you find me going up against new technology. GPT-4o dropped this week, and let’s be honest—it’s impressive. Fast, multimodal, and now capable of generating incredible images, and most impressively, generating images with flawless text. It can recreate art styles and remix photos. But I had a question as it relates to Photoshop manipulation: Can GPT-4o take raw, mismatched source images and create a cinematic, story-rich composite as well as a human who knows how to work in Photoshop? There is a lot that goes into photomanipulation: lighting logic, composition, perspective, edge blending, ambient detail. And I wanted to see if GPT-4o, with all its intelligence, could replicate the nuanced decision-making that happens when you build a complex image by hand. So I made a sci-fi abduction scene. Built it manually in Photoshop from a pile of parts: A grimy urban alleyway A 70s-era rusted-out car A woman mid-jump, arms open A trash pile, a few flickering lights, and a classic flying saucer Image 1: Detailed Instructions Prompt: Scene Overview A dramatic urban night scene set in a dark, narrow alley. The atmosphere is eerie and cinematic, with a strong contrast between shadows and a vibrant beam of light. A human figure is being abducted by a UFO, caught mid-air in a glowing tractor beam. Environment & Setting Location: Gritty alleyway between brick buildings with wet, grimy pavement Time of Day: Nighttime, dimly lit except for a prominent blue beam of light Lighting: Blue-white spotlight from the UFO above casts a circular glow around the abductee. Surroundings are illuminated in cyan and teal hues, with orange light spill near the garage and right-side wall Foreground Details: Abductee: Woman dressed in black athletic wear, barefoot, levitating mid-air Street: Broken pavement, puddles reflecting light, scattered trash Car: Old rusty with broken headlights, graffiti reads “DOPE$” Background: Brick buildings, garage doors, utility wires, "children at play" sign Lighting Effects: Beam cuts through mist and darkness, detailed reflections UFO Design: Classic saucer-style, ring of blue lights underneath, metallic finish Color Palette: Cool tones—teal, cyan, electric blue GPT-4o Output What it got right: The beam lighting is genuinely well done—nice rim light, bounce on the ground, and glow on surrounding surfaces Floating papers were a great inferred touch Pose of the woman feels natural and cinematic But here’s the issue: Depth: The alley flattens out fast—more like a set than a real place Scale: The UFO feels small and unthreatening, proportions are slightly off Image 2: Limited Instructions + Blue Grade Prompt: Use the images to create a dramatic scene of an UFO abduction... make it look like a sci-fi movie scene with dramatic lighting and a blue color grade film GPT-4o Output What it nailed: Genuinely cinematic Great use of color contrast—amber light vs. cyan beam Trash, car, and signage were included Where it loses me: Scale: UFO too close and too small, subject too large for the space Depth: Better than the first, but still more like a backdrop Image 3: Vague Prompt / Just Scene Description Prompt: Use these images to create a cinematic sci-fi scene of an alien abduction GPT-4o Output What it got right: Best sense of place—buildings visible in the background, lived-in environment Strong composition and balance Effective lighting and atmospheric balance But then... Scale: Once again, the UFO feels too compact Integration: The abductee isn’t color-matched or lit properly to fit the environment Verdict The obvious strength of OpenAI’s new model is its understanding of language. That’s what really separates GPT-4o from the rest right now. I ran the exact same prompt across Recraft, MidJourney, and Flux—and while Flux came closest, none matched the scene comprehension or compositional awareness that GPT-4o delivered. Yes, speed and rate limits are still a thing. But I expect that to smooth out as OpenAI scales capacity. What’s more important is that GPT-4o image generation actually feels like the version of AI art we’ve been waiting for—where visual storytelling and language finally start to merge in a meaningful way. My AI Image Gen Wishlist Scene-aware Storyboarding I want to prompt across scenes, like building a storyboard. Let me describe 5 different shots and generate them in sequence while keeping consistency in the setting, lighting, and tone. Character Anchoring Give me a way to define a character once—through text, image, or a quick builder, and then just use an @name tag to drop them into new scenes. No more re-describing facial features or outfits every time. Personal Style Library Let me upload reference images and train a m

Mar 30, 2025 - 00:25

Can GPT-4o Replace a Human in Photoshop?

Once again you find me going up against new technology. GPT-4o dropped this week, and let’s be honest—it’s impressive. Fast, multimodal, and now capable of generating incredible images, and most impressively, generating images with flawless text. It can recreate art styles and remix photos.

But I had a question as it relates to Photoshop manipulation:

Can GPT-4o take raw, mismatched source images and create a cinematic, story-rich composite as well as a human who knows how to work in Photoshop?

There is a lot that goes into photomanipulation: lighting logic, composition, perspective, edge blending, ambient detail. And I wanted to see if GPT-4o, with all its intelligence, could replicate the nuanced decision-making that happens when you build a complex image by hand.

So I made a sci-fi abduction scene. Built it manually in Photoshop from a pile of parts:

A grimy urban alleyway
A 70s-era rusted-out car
A woman mid-jump, arms open
A trash pile, a few flickering lights, and a classic flying saucer

Image 1: Detailed Instructions

Prompt: Scene Overview

A dramatic urban night scene set in a dark, narrow alley. The atmosphere is eerie and cinematic, with a strong contrast between shadows and a vibrant beam of light. A human figure is being abducted by a UFO, caught mid-air in a glowing tractor beam.

Environment & Setting

Location: Gritty alleyway between brick buildings with wet, grimy pavement
Time of Day: Nighttime, dimly lit except for a prominent blue beam of light
Lighting: Blue-white spotlight from the UFO above casts a circular glow around the abductee. Surroundings are illuminated in cyan and teal hues, with orange light spill near the garage and right-side wall
Foreground Details:
- Abductee: Woman dressed in black athletic wear, barefoot, levitating mid-air
- Street: Broken pavement, puddles reflecting light, scattered trash
- Car: Old rusty with broken headlights, graffiti reads “DOPE$”
Background: Brick buildings, garage doors, utility wires, "children at play" sign
Lighting Effects: Beam cuts through mist and darkness, detailed reflections
UFO Design: Classic saucer-style, ring of blue lights underneath, metallic finish
Color Palette: Cool tones—teal, cyan, electric blue

GPT-4o Output

What it got right:

The beam lighting is genuinely well done—nice rim light, bounce on the ground, and glow on surrounding surfaces
Floating papers were a great inferred touch
Pose of the woman feels natural and cinematic

But here’s the issue:

Depth: The alley flattens out fast—more like a set than a real place
Scale: The UFO feels small and unthreatening, proportions are slightly off

Image 2: Limited Instructions + Blue Grade

Prompt:

Use the images to create a dramatic scene of an UFO abduction... make it look like a sci-fi movie scene with dramatic lighting and a blue color grade film

GPT-4o Output

What it nailed:

Genuinely cinematic
Great use of color contrast—amber light vs. cyan beam
Trash, car, and signage were included

Where it loses me:

Scale: UFO too close and too small, subject too large for the space
Depth: Better than the first, but still more like a backdrop

Image 3: Vague Prompt / Just Scene Description

Prompt:

Use these images to create a cinematic sci-fi scene of an alien abduction

GPT-4o Output

What it got right:

Best sense of place—buildings visible in the background, lived-in environment
Strong composition and balance
Effective lighting and atmospheric balance

But then...

Scale: Once again, the UFO feels too compact
Integration: The abductee isn’t color-matched or lit properly to fit the environment

Verdict

The obvious strength of OpenAI’s new model is its understanding of language. That’s what really separates GPT-4o from the rest right now. I ran the exact same prompt across Recraft, MidJourney, and Flux—and while Flux came closest, none matched the scene comprehension or compositional awareness that GPT-4o delivered.

Yes, speed and rate limits are still a thing. But I expect that to smooth out as OpenAI scales capacity. What’s more important is that GPT-4o image generation actually feels like the version of AI art we’ve been waiting for—where visual storytelling and language finally start to merge in a meaningful way.

My AI Image Gen Wishlist

Scene-aware Storyboarding

I want to prompt across scenes, like building a storyboard. Let me describe 5 different shots and generate them in sequence while keeping consistency in the setting, lighting, and tone.
Character Anchoring

Give me a way to define a character once—through text, image, or a quick builder, and then just use an @name tag to drop them into new scenes. No more re-describing facial features or outfits every time.
Personal Style Library

Let me upload reference images and train a mini-style model. I should be able to say “Use my noir style” or “Give this the same tone as my cyberpunk alley series.” Consistent tone and grade shouldn’t have to be reinvented with every prompt.

Overall, I'm good with this model.

And just for laughs, here are the outputs from other image models: