AI's Artistic Eye: Capturing Poses, But Missing the Shot with Imagen-v3-fast
- 9 minutes read - 1828 wordsTable of Contents
In the realm of artificial intelligence, image generation has made significant strides. One intriguing area of exploration is the ability of AI models to understand and recreate dramatic poses within specific scenes. This blog post delves into the results of an experiment that tested an AI model’s proficiency in this task. The experiment involved providing the model with descriptions of various poses and scenes, encompassing a range of emotions, settings, and camera angles. The results revealed a fascinating dichotomy: while the model excelled at capturing the aesthetic essence of the poses, it struggled with accurately replicating the intended camera positions and shot compositions. This suggests that while AI models are becoming increasingly adept at understanding artistic concepts, they still require further development to fully grasp the nuances of visual storytelling through camera work.
Created with: imagen-v3-fast
Solitude on the Clifftop
A lone figure contemplates the vast, green valley below, the winding river and cloudy sunset sky creating a sense of tranquility and solitude. The image evokes a feeling of peace and introspection, with the figure dwarfed by the expansive landscape.
Prompt
poses crossed-legs: determined, contemplative ; A lone adventurer, sitting on a cliff edge; wide shot; Adventure; a vast, breathtaking mountain range; cinematic
Characteristic
Shot : A lone figure sits on a cliff overlooking a vast, green valley with a winding river snaking through it. The sky is cloudy with hints of a sunset.
Aesthetic Score : 0.7
Mood : tranquil, contemplative, solitary
Quality
Entropy : 6.82
Noise : 68
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly blurry and the details are not sharp. There is some aliasing on the edges of objects.
A Lone Warrior Stands Amidst the Ruins of Victory
A solitary warrior, draped in a crimson cape, surveys the battlefield. Fallen comrades lie at his feet, while the city wall burns in the distance. The golden sky reflects a bittersweet victory, leaving a somber mood in its wake.
Prompt
poses crossed-legs: triumphant, confident ; A victorious warrior, standing tall on a battlefield; medium shot; Heroism; fallen enemies and a burning city in the background; cinematic
Characteristic
Shot : A lone warrior, clad in armor and a red cape, stands amidst a battlefield, with fallen soldiers at his feet. The background features a city wall with fires and a warm, golden sky.
Aesthetic Score : 0.7
Mood : dramatic, heroic, somber
Quality
Entropy : 6.63
Noise : 63
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some slight artifacts are visible around the edges of the warrior and the background, especially in the fallen soldiers. The textures could be more refined.
The Focus of a Champion: A Gamer’s Intensity in Low Light
A young gamer, radiating focus and determination, sits in his gaming chair, headphones on, eyes locked on the camera. The low lighting adds to the intensity of the moment, highlighting his competitive spirit.
Prompt
poses crossed-legs: intense, focused ; A gamer, intensely focused on a screen; close-up; Gaming; a dimly lit room with glowing monitors and gaming peripherals; cinematic
Characteristic
Shot : A young man, likely a gamer, sits in a gaming chair with headphones on, looking directly at the camera.
Aesthetic Score : 0.7
Mood : serious, focused, competitive
Quality
Entropy : 5.97
Noise : 36
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors, just a slight blurriness in the background.
City Lights, Rooftop Dreams: Friends Embrace the Twilight
A group of five friends bask in the golden glow of twilight, enjoying the breathtaking New York City skyline from a rooftop perch. Their laughter and playful banter capture a sense of adventure and carefree joy, making this moment one to remember.
Prompt
poses crossed-legs: excited, awe-struck ; A group of tourists, admiring a breathtaking view; medium shot; Tourism; a panoramic vista of a bustling city skyline; cinematic
Characteristic
Shot : Five people sitting on a rooftop with the New York City skyline in the background at twilight.
Aesthetic Score : 0.5
Mood : happy, playful, relaxed
Quality
Entropy : 6.67
Noise : 96
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has a few artifacts and errors, such as the people’s skin looking slightly artificial, the blur around the buildings being a little unnatural, and the shadows looking unreal.
Tranquility in Motion: A Leg, a Window, and the Blur of a Journey
A solitary leg, clad in a brown boot, rests on the window sill of a moving train. The passing landscape blurs into a dreamy haze, reflecting the quiet contemplation of the traveler. This image captures the essence of a journey, where the world rushes by, yet a sense of peace remains.
Prompt
poses crossed-legs: reflective, nostalgic ; A traveler, gazing out of a train window; close-up; Travel; a blur of passing landscapes and towns; cinematic
Characteristic
Shot : A person’s leg with a brown boot is resting on the window sill of a train. The view outside is of a blurry landscape with train tracks.
Aesthetic Score : 0.6
Mood : tranquil, contemplative, journey
Quality
Entropy : 6.28
Noise : 54
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors.
Cozy Night in the Woods with Friends
Four friends gather around a log in a forest, bathed in the warm glow of string lights. The scene exudes a friendly, cozy, and peaceful atmosphere, capturing the essence of shared moments under the stars.
Prompt
poses crossed-legs: joyful, relaxed ; A group of friends, laughing and sharing stories around a campfire; medium shot; Groups; a serene forest setting with twinkling stars above; cinematic
Characteristic
Shot : Four friends are sitting on a log in a forest at night, lit by string lights hanging in the trees.
Aesthetic Score : 0.7
Mood : friendly, cozy, peaceful
Quality
Entropy : 6.21
Noise : 85
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some noise and compression artifacts, particularly in the darker areas.
A Moment of Reflection: An Astronaut’s View of Earth
A solitary astronaut gazes out at a distant Earth, their posture and the vast emptiness of space evoking a sense of melancholy and contemplation. The muted colors and dramatic lighting amplify the feeling of isolation, while a glimmer of hope shines through in the astronaut’s unwavering gaze.
Prompt
poses crossed-legs: awe-inspired, contemplative ; A lone astronaut, gazing at Earth from a spaceship window; close-up; Heroism; a vast, blue planet against the backdrop of space; cinematic
Characteristic
Shot : An astronaut is sitting in a spaceship looking out of the window at planet Earth.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, hopeful
Quality
Entropy : 5.77
Noise : 63
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image appears to be generated by AI, as the astronaut’s face is somewhat blurry and lacks the finer details of a real person. The Earth is also rendered in a stylized way that seems more like a digital painting than a photograph.
Trapped in the Shadows: Three Men Face an Uncertain Fate
A chilling scene unfolds in a dark, narrow tunnel, where three men in workwear huddle together, illuminated only by flickering torches. The claustrophobic atmosphere and their tense expressions hint at a dangerous situation, leaving viewers on the edge of their seats.
Prompt
poses crossed-legs: suspenseful, cautious ; A group of explorers, huddled together in a dark cave; medium shot; Adventure; flickering torches illuminating the rough stone walls; cinematic
Characteristic
Shot : Three men in workwear, sitting in a dark, narrow tunnel lit by torches.
Aesthetic Score : 0.6
Mood : suspenseful, claustrophobic, gritty
Quality
Entropy : 6.69
Noise : 90
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors.
Confetti Celebration: A Moment of Pure Joy
A young man beams with happiness, fists raised in triumph, as confetti rains down around him. This image captures the pure joy and energy of a celebratory moment, leaving a lasting impression of pure delight.
Prompt
poses crossed-legs: exuberant, joyful ; A gamer, celebrating a victory with a triumphant fist pump; close-up; Gaming; a brightly lit room with a celebratory confetti explosion; cinematic
Characteristic
Shot : A young man is sitting cross-legged on a dark surface with confetti falling around him. He is smiling and has his fists raised in the air, as if celebrating.
Aesthetic Score : 0.7
Mood : joyful, celebratory, energetic
Quality
Entropy : 6.63
Noise : 58
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, but otherwise there are no noticeable errors.
Night Market Intimacy: A Glow of Friendship and Mystery
Four friends gather on a bench, bathed in warm yellow light, sharing food and laughter in a bustling night market. The play of light and shadow creates a sense of intimacy and mystery, drawing you into their shared moment.
Prompt
poses crossed-legs: lively, adventurous ; A group of travelers, sharing a meal at a bustling street market; medium shot; Travel; vibrant colors and aromas of exotic food stalls; cinematic
Characteristic
Shot : Four people are sitting on a bench in a night market, eating food. They are lit from above with yellow lights.
Aesthetic Score : 0.6
Mood : casual, warm, friendly
Quality
Entropy : 6.76
Noise : 92
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors in the image.
Conclusion
The results show that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.45, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.56, which is considered average. This indicates that the model was able to understand the scene and create a shot that somewhat matched the prompt’s description.
- Aesthetic Analysis: The model scored 0.1, which is considered very good. This means that the generated image’s aesthetic closely matched the expected aesthetic described in the prompt.
Overall, the model seems to be better at understanding the aesthetic aspects of the prompt than the camera position and shot composition. This suggests that the model might need further training to improve its ability to accurately interpret and implement camera positions and shot descriptions.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/