AI's Artistic Eye: Capturing Poses, But Missing the Shot with Imagen-v2
- 9 minutes read - 1815 wordsTable of Contents
In the realm of artificial intelligence, image generation has become a fascinating area of exploration. One of the key challenges in this field is the ability to translate textual descriptions into visually compelling images. This blog post delves into an experiment that tested an AI model’s ability to generate images based on descriptions of poses and scenes, focusing on the model’s performance in capturing the intended camera position, shot analysis, and aesthetic style. The results reveal both strengths and weaknesses, highlighting the ongoing evolution of AI in image generation.
Created with: imagen-v2
Unwavering Determination in the Face of the Mountain
A close-up portrait captures the intense gaze of a rugged hiker, his face etched with determination as he confronts the snowy mountain backdrop. The dramatic framing emphasizes his adventurous spirit and the challenges he faces.
Prompt
poses leaning-in: determined, focused ; A lone adventurer; close-up; Adventure; a vast, snow-capped mountain range; cinematic
Characteristic
Shot : Close-up portrait of a man in a blue jacket and a backpack, looking at the camera with a serious expression, standing in front of a snowy mountain landscape.
Aesthetic Score : 0.7
Mood : intense, determined, adventurous
Quality
Entropy : 6.61
Noise : 79
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts and blur around the subject’s hair and edges. There is a slight color shift on the snow.
Superman Soars Above a City in Flames
A dramatic scene of Superman flying through a burning city, his determined expression reflecting the urgency of the situation. The smoke and fire create a sense of intensity and heroism, highlighting the Man of Steel’s unwavering commitment to saving lives.
Prompt
poses leaning-in: powerful, heroic ; A superhero in mid-flight; dynamic shot; Heroism; a cityscape with a burning building in the background; cinematic
Characteristic
Shot : Superman is flying over a burning city with a determined expression on his face. He is in a classic Superman pose, with his cape billowing behind him.
Aesthetic Score : 0.7
Mood : heroic, dramatic, hopeful
Quality
Entropy : 6.73
Noise : 67
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to be digitally painted and has some slight blurring and noise. There are some unnatural transitions between areas, particularly in the cape and the foreground.
The Hands That Type: A Close-Up Look at Focus and Intensity
A low-angle shot captures the focused hands of a person typing on a keyboard. The shallow depth of field draws the viewer’s attention to the intricate movements, creating a sense of intimacy and highlighting the intensity of the task at hand. The partially visible face in the background adds a layer of intrigue, while the blurry object further emphasizes the focus on the hands and the technological nature of the scene.
Prompt
poses leaning-in: intense, focused ; A gamer’s hands on a keyboard; close-up; Gaming; a brightly lit computer screen displaying a game; cinematic
Characteristic
Shot : A person is typing on a keyboard, the image is taken from a low angle, showing the person’s hand and part of their face.
Aesthetic Score : 0.6
Mood : focused, intense, digital
Quality
Entropy : 6.19
Noise : 84
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image. The color saturation is slightly high, but it’s not distracting.
Silhouettes of Love at Sunset
A couple stands on a beach, bathed in the golden light of the setting sun. The woman glances back at the camera, while the man gazes out at the ocean, creating a romantic and intimate scene.
Prompt
poses leaning-in: romantic, awe-inspired ; A couple gazing at a breathtaking sunset; medium shot; Tourism; a panoramic view of a beach with the sun setting over the ocean; cinematic
Characteristic
Shot : A couple is standing on a beach at sunset. The man is facing the sunset, and the woman is looking at the camera with her head resting on his shoulder.
Aesthetic Score : 0.7
Mood : romantic, cozy, peaceful
Quality
Entropy : 6.63
Noise : 92
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors.
Lost in the Landscape: A Moment of Contemplation on a Train
A young man gazes out the window of a moving train, his expression pensive as the green countryside blurs past. The natural light and the fleeting scenery evoke a sense of longing and introspection, capturing a moment of quiet contemplation.
Prompt
poses leaning-in: reflective, adventurous ; A backpacker looking out of a train window; close-up; Travel; a passing landscape of rolling hills and green fields; cinematic
Characteristic
Shot : A young man sits on a train and gazes out the window at a passing green countryside.
Aesthetic Score : 0.6
Mood : pensive, reflective, contemplative
Quality
Entropy : 6.47
Noise : 86
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some slight noise and grain, which may be due to the lighting or the camera used.
Mysterious Campfire Glow in the Dark Forest
A group of friends gather around a crackling campfire, its warm glow illuminating their faces and casting long shadows in the surrounding darkness. The scene evokes a sense of mystery, coziness, and adventure, promising a night filled with stories and secrets.
Prompt
poses leaning-in: intimate, warm ; A group of friends huddled together around a campfire; medium shot; Groups; a dark forest with the firelight illuminating their faces; cinematic
Characteristic
Shot : A group of people are gathered around a campfire in a forest at night.
Aesthetic Score : 0.6
Mood : mysterious, cozy, contemplative
Quality
Entropy : 6.18
Noise : 109
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some noticeable artifacts in the image, particularly in the leaves on the ground. The colors are slightly muted and lacking vibrancy.
On the Front Lines: A Soldier’s Focus Amidst Chaos
A low-angle shot captures a soldier in camouflage, their rifle aimed with unwavering focus. The background blurs into a chaotic scene of smoke and fire, highlighting the intensity and suspense of the moment. The image evokes a sense of tension and seriousness, capturing the dramatic reality of war.
Prompt
poses leaning-in: intense, focused ; A soldier peering through a sniper scope; close-up; Heroism; a battlefield with smoke and explosions in the distance; cinematic
Characteristic
Shot : A soldier in camouflage is aiming a rifle with a scope. There is an explosion in the background, and the soldier appears to be in a state of heightened focus.
Aesthetic Score : 0.6
Mood : tense, dramatic, focused
Quality
Entropy : 6.84
Noise : 87
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some minor artifacts, especially in the background. There is a slight blurriness in the soldier’s face and the scope. The colors are a little bit muted.
Lost in the Fog: A Mission of Mystery and Danger
Three figures, shrouded in mist, navigate a dense jungle. Their backpacks suggest a mission, but the low camera angle and shadowy silhouettes create an atmosphere of suspense and intrigue. What secrets lie hidden within the fog?
Prompt
poses leaning-in: determined, adventurous ; A group of explorers navigating a dense jungle; wide shot; Adventure; lush green foliage and towering trees; cinematic
Characteristic
Shot : Three people are exploring a lush jungle, they are crouched low to the ground and looking at something off-camera. It is a dense jungle, with a lot of greenery and foliage.
Aesthetic Score : 0.6
Mood : suspenseful, adventurous, eerie
Quality
Entropy : 6.69
Noise : 115
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are a few minor image artifacts, such as the slightly blurry edges of the leaves.
Intense Gaze, Mysterious Light: A Portrait of Unseen Emotions
This close-up portrait captures a young man’s intense expression, bathed in colorful, low-key lighting. The dramatic framing and piercing gaze create a sense of mystery and intrigue, leaving the viewer to wonder about the story behind his emotions.
Prompt
poses leaning-in: excited, immersed ; A gamer’s face lit by the screen; close-up; Gaming; a vibrant, colorful game interface; cinematic
Characteristic
Shot : A young man’s face, looking directly at the camera, with a blurred background of colorful lights.
Aesthetic Score : 0.7
Mood : intense, dramatic, mysterious
Quality
Entropy : 5.98
Noise : 105
Prompt Clip Score : 0.18
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some minor artifacts, particularly in the background, likely due to over-processing.
City Lights, Tiny Lives: A Moment of Contemplation
A family of four stands silhouetted against the twinkling cityscape, their small figures dwarfed by the vast expanse of the city. The scene evokes a sense of serenity and contemplation, highlighting the fleeting nature of life against the backdrop of an enduring urban landscape.
Prompt
poses leaning-in: joyful, appreciative ; A family looking out at a cityscape from a rooftop; medium shot; Tourism; a sprawling city skyline with twinkling lights; cinematic
Characteristic
Shot : A family of four is standing on a rooftop overlooking a city skyline at dusk. They are looking out at the view, silhouetted against the city lights.
Aesthetic Score : 0.7
Mood : serene, contemplative, romantic
Quality
Entropy : 6.72
Noise : 111
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is slight blurring around the edges of the image, most noticeable in the sky, suggesting possible image processing artifacts.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is considered below average. This suggests that the model didn’t accurately capture the intended camera positions described in the prompt.
- Shot Analysis: The model scored 0.43, also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create the intended shot composition.
- Aesthetic Analysis: The model scored 0.1, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model seems to be better at understanding the desired aesthetic style than it is at accurately interpreting camera positions and shot descriptions.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/