AI's Artistic Eye: Capturing the Scene, But Missing the Shot with Imagen-v3
- 9 minutes read - 1874 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunning visuals based on text prompts. However, achieving a perfect match between the prompt and the generated image remains a challenge. This blog post examines a recent experiment that highlights the strengths and weaknesses of current AI models in capturing the essence of a scene, specifically focusing on camera position, shot composition, and aesthetic style.
Created with: imagen-v3
A Solitary Figure Contemplates the Storm
A dramatic image of a lone figure standing on a cliff, silhouetted against a stormy sea. Dark clouds fill the sky, rain falls, and the scene evokes a sense of isolation, melancholy, and foreboding.
Prompt
poses rule-of-thirds: Epic, determined, hopeful ; A lone hero standing on a cliff overlooking a vast, stormy sea; Wide shot; Heroism; Dramatic sky with crashing waves; cinematic
Characteristic
Shot : A solitary figure stands on a cliff overlooking a stormy sea. Dark clouds fill the sky and rain falls.
Aesthetic Score : 0.7
Mood : dramatic, melancholic, ominous
Quality
Entropy : 6.79
Noise : 95
Prompt Clip Score : 0.36
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some slight artifacts are noticeable, particularly in the water and sky.
Whispers in the Dark: A Campfire’s Eerie Glow
Four figures huddle around a flickering campfire, their faces obscured by the shadows of a dense, ancient forest. The scene is both captivating and unsettling, with a somber mood and a sense of mystery that lingers in the air. The fire, a beacon of warmth in the darkness, becomes a focal point, drawing the viewer into the heart of the suspense.
Prompt
poses rule-of-thirds: Intriguing, mysterious, suspenseful ; A group of adventurers huddled around a campfire in a dense forest; Medium shot; Adventure; Shadows and flickering flames; cinematic
Characteristic
Shot : Four people are sitting around a campfire in a dark forest. The fire is in the center of the image, and the people are in a circle around it. The trees are tall and dark, and the mood is somber.
Aesthetic Score : 0.6
Mood : somber, mysterious, suspenseful
Quality
Entropy : 5.58
Noise : 89
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors
The Hands of a Gamer: Immersed in the Digital World
A close-up shot captures the intensity of a gamer’s focus as their hands expertly navigate a controller, mirroring the action unfolding on the TV screen. The image evokes a sense of immersion and the thrill of the game.
Prompt
poses rule-of-thirds: Focused, intense, exhilarating ; A gamer’s hands intensely gripping a controller, the screen displaying a thrilling moment in a video game; Close-up; Gaming; Blurred background of the game’s visuals; cinematic
Characteristic
Shot : A person is playing a video game, the image focuses on their hands holding the controller in front of a TV screen with the game displayed.
Aesthetic Score : 0.5
Mood : focused, intense, gaming
Quality
Entropy : 6.46
Noise : 79
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight blurriness in the background, particularly the TV screen. There is also a slight graininess in the overall image.
Mirrored Majesty: A Hiker Finds Serenity in Mountain Reflections
A lone hiker stands in awe on a rocky shore, gazing at a majestic mountain range reflected in a still, clear lake. The scene evokes a sense of peace and wonder, highlighting the breathtaking beauty and scale of the natural world.
Prompt
poses rule-of-thirds: Tranquil, awe-inspiring, peaceful ; A majestic mountain range reflected in a still lake, with a lone hiker standing on a rocky outcrop; Wide shot; Tourism; Clear blue sky and vibrant green foliage; cinematic
Characteristic
Shot : A lone hiker stands on a rocky shore gazing at a majestic mountain range reflected in a still, clear lake.
Aesthetic Score : 0.9
Mood : serene, awe-inspiring, peaceful
Quality
Entropy : 6.78
Noise : 107
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors or artifacts.
A Journey Through Time: Nostalgia and Adventure on a Vintage Train
A man gazes out the window of a vintage train, his eyes reflecting the beauty of a scenic countryside. Fields of yellow flowers stretch out in the distance, evoking a sense of tranquility and adventure. This image captures the essence of nostalgia, inviting you to imagine yourself on a journey through time and space.
Prompt
poses rule-of-thirds: Nostalgic, romantic, adventurous ; A vintage train speeding through a picturesque countryside, with a lone traveler gazing out the window; Medium shot; Travel; Rolling hills and vibrant fields; cinematic
Characteristic
Shot : A man looking out the window of a vintage train, traveling through a scenic countryside with a field of yellow flowers in the distance.
Aesthetic Score : 0.7
Mood : nostalgic, tranquil, adventurous
Quality
Entropy : 6.54
Noise : 96
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible artifacts or errors in the image.
Laughter and Camaraderie: Friends Celebrate in a Bustling Market
A group of four friends share a moment of pure joy and laughter, enjoying drinks and each other’s company in the lively atmosphere of a crowded outdoor market. The image captures the genuine connection and happiness that comes from spending time with loved ones.
Prompt
poses rule-of-thirds: Joyful, lively, celebratory ; A group of friends laughing and enjoying a meal together at a bustling outdoor market; Medium shot; Groups; Colorful stalls and vibrant street life; cinematic
Characteristic
Shot : Four friends are laughing and enjoying drinks in a crowded outdoor market, possibly at an event.
Aesthetic Score : 0.7
Mood : happy, joyful, social
Quality
Entropy : 6.65
Noise : 95
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable image errors.
Silhouette of Hope: A Solitary Figure Welcomes the Sunset
A lone figure stands on a sandy beach, their silhouette stark against the vibrant hues of a cloudy sunset. The scene evokes a sense of serenity, contemplation, and hope, leaving the viewer to ponder the figure’s thoughts and the promise of the coming night.
Prompt
poses rule-of-thirds: Melancholy, reflective, hopeful ; A lone figure standing on a deserted beach, watching the sun setting over the horizon; Wide shot; Heroism; Golden light illuminating the sky and water; cinematic
Characteristic
Shot : A lone figure stands on a sandy beach, facing a bright, cloudy sunset over the ocean.
Aesthetic Score : 0.7
Mood : serene, contemplative, hopeful
Quality
Entropy : 6.70
Noise : 71
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable artifacts or errors in the image.
Lost in the Shadows: A Journey Through the Jungle
A group of men navigate a dense jungle, bathed in the ethereal glow of backlighting. The mysterious atmosphere and dramatic play of light and shadow create a sense of adventure and suspense, leaving the viewer wondering what lies ahead.
Prompt
poses rule-of-thirds: Intriguing, suspenseful, adventurous ; A group of explorers navigating a treacherous jungle path, with dense foliage surrounding them; Medium shot; Adventure; Lush greenery and dappled sunlight; cinematic
Characteristic
Shot : A group of men are walking through a dense jungle. The light is coming from the back of the photo, and it’s creating a mysterious atmosphere.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, suspenseful
Quality
Entropy : 6.73
Noise : 116
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed, and the leaves in the background are slightly blurry. The image also appears to have some noise, which could be from the camera or a processing error.
Lost in Thought: A Moment of Intensity
A close-up shot captures a young man’s face, bathed in blue light, as he stares intently to the left. His headphones suggest a world of sound, while his focused expression hints at a deep internal struggle or a moment of profound contemplation. The blue lighting adds a layer of mystery, leaving the viewer to wonder what thoughts are swirling within his mind.
Prompt
poses rule-of-thirds: Focused, intense, determined ; A close-up of a gamer’s face, eyes glued to the screen, as they navigate a challenging level in a video game; Close-up; Gaming; Blurred background of the game’s visuals; cinematic
Characteristic
Shot : A close-up shot of a young man’s face, he is wearing headphones and looking to the left, the lighting is blue and he looks focused and intense.
Aesthetic Score : 0.6
Mood : intense, focused, mysterious
Quality
Entropy : 6.77
Noise : 89
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some artifacts and errors, especially in the skin texture, making it look slightly artificial. The lighting is also slightly uneven, making the subject’s face look unnatural.
A Solitary Figure Gazes Upon the City’s Vibrant Glow
A man stands alone on a rooftop, silhouetted against the dazzling lights of a nighttime cityscape. The scene evokes a sense of urban solitude and contemplation, with the man dwarfed by the vast expanse of the city below. The image captures a moment of awe and wonder, as he takes in the vibrant energy of the urban landscape.
Prompt
poses rule-of-thirds: Energetic, exciting, awe-inspiring ; A panoramic view of a bustling city skyline, with a lone tourist standing on a rooftop overlooking the scene; Wide shot; Tourism; Vibrant lights and towering buildings; cinematic
Characteristic
Shot : A man stands on a rooftop overlooking a nighttime cityscape. The city is lit up with streetlights and building lights, creating a vibrant and energetic atmosphere.
Aesthetic Score : 0.7
Mood : urban, solitude, contemplative
Quality
Entropy : 6.53
Noise : 97
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some minor noise visible in the dark areas.
Conclusion
The results show that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic aspect.
Here’s a breakdown:
- Camera Position: The model scored 0.25, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.475, which is also below average. This indicates that the model didn’t fully understand the scene and shot composition described in the prompt.
- Aesthetic Analysis: The model scored 0.07, which is considered very good. This means that the generated image closely matched the expected aesthetic style.
Overall, the model seems to be better at capturing the desired aesthetic than understanding the camera position and shot composition. This suggests that the model might need further training to improve its ability to interpret and translate these aspects of the prompt into the generated image.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/