AI Captures the Scene, But Struggles with the Pose with Imagen-v2
- 9 minutes read - 1856 wordsTable of Contents
In the realm of artificial intelligence, image generation has emerged as a fascinating area of exploration. Generative AI models, trained on vast datasets of images and text, have the ability to create stunning visuals based on textual prompts. However, these models are not without their limitations. One such limitation is the ability to accurately capture poses within a generated image. This blog post delves into the performance of a generative AI model in understanding scene descriptions, camera positions, and aesthetic styles, while highlighting its challenges in capturing poses. We will explore examples of how the model excels in certain aspects, while struggling in others, providing insights into the current state of AI image generation and its potential for future development.
Created with: imagen-v2
Two Astronauts, One Handful of Hope on a Distant World
A poignant image captures the loneliness and wonder of space exploration. Two astronauts, hand in hand, traverse a barren lunar landscape, their backs turned towards a crescent moon and distant planets. The vast emptiness of the scene underscores the fragility of human life against the backdrop of the universe.
Prompt
poses holding-hands: Hopeful, determined, camaraderie ; Two astronauts; wide shot; heroism; the vastness of space with stars and planets in the background; cinematic
Characteristic
Shot : Two astronauts in spacesuits are standing on a moon-like surface, holding hands. They are in front of a crescent-shaped moon in the background.
Aesthetic Score : 0.7
Mood : mysterious, hopeful, surreal
Quality
Entropy : 5.78
Noise : 113
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some graininess and noise, especially in the background. The shadows are a little bit too harsh.
Lost in the Mist: A Romantic Jungle Escape
A young couple, hand in hand, ventures into a lush jungle shrouded in soft mist. The diffused light and their intimate connection create a sense of mystery and romance, promising an adventurous escape.
Prompt
poses holding-hands: Excited, adventurous, trusting ; A group of explorers; medium shot; adventure; a dense jungle with sunlight filtering through the canopy; cinematic
Characteristic
Shot : A couple is holding hands in a lush, green jungle setting. The light is soft and diffused, creating a romantic atmosphere.
Aesthetic Score : 0.7
Mood : romantic, adventurous, mysterious
Quality
Entropy : 6.80
Noise : 118
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, but otherwise well-composed.
Two Worlds Collide: A Moment of Intense Anticipation
A clash of gazes and a symphony of silence. Two young men, lost in their own worlds, find themselves drawn together in a moment of intense anticipation. The futuristic backdrop, bathed in a cool blue-green light, adds to the mystery and drama of the scene.
Prompt
poses holding-hands: Focused, competitive, collaborative ; Two gamers; close-up; gaming; a brightly lit gaming setup with glowing screens and controllers; cinematic
Characteristic
Shot : Two young men with headphones, sitting close together in an dimly lit room, likely a gaming setup. The background is blurry and they are the main focus. The lighting casts an orange glow on the left man and a blue glow on the right man.
Aesthetic Score : 0.7
Mood : intense, focused, suspenseful
Quality
Entropy : 6.51
Noise : 94
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly blurry, particularly in the background. There is also some noise and artifacts visible, especially in the shadows.
Parisian Romance: A Silhouette of Love Against the Eiffel Tower
A couple stands silhouetted against the iconic Eiffel Tower, capturing a moment of romantic longing and hope. The scene evokes a sense of grandeur and wistful beauty, making it a perfect picture of Parisian love.
Prompt
poses holding-hands: Romantic, happy, adventurous ; A couple; medium shot; tourism; a picturesque cityscape with iconic landmarks in the background; cinematic
Characteristic
Shot : A couple is standing on a rooftop in Paris, with the Eiffel Tower visible in the background.
Aesthetic Score : 0.7
Mood : romantic, nostalgic, dreamy
Quality
Entropy : 6.60
Noise : 93
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some minor noise and grain, particularly in the shadows.
Adventure Awaits: Silhouetted Figures on a Mountaintop
Four figures stand on a mountain peak, their silhouettes stark against the dramatic backdrop of a winding road leading into a valley. The scene evokes a sense of inspiration, adventure, and hope, emphasizing the vastness of the landscape and the possibilities that lie ahead.
Prompt
poses holding-hands: Joyful, connected, adventurous ; group; long shot; travel; a scenic mountain range with a winding road leading to the peak; cinematic
Characteristic
Shot : Four people stand in a line with their arms raised in the air, looking out over a vast mountain range with a winding road in the foreground. The scene is captured from a low angle, giving a sense of grandeur and scale.
Aesthetic Score : 0.7
Mood : serene, adventurous, hopeful
Quality
Entropy : 6.68
Noise : 92
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.80
Image errors : There are some minor artifacts in the mountains and the sky. The people’s faces and body proportions are a bit off.
Joyful Dance Under a Confetti Sky
Two women revel in the festive atmosphere, their dance moves illuminated by the falling confetti and the vibrant colors of the crowd. The scene captures the pure joy and celebration of the moment.
Prompt
poses holding-hands: Happy, celebratory, connected ; A group of friends; medium shot; groups; a vibrant festival with colorful decorations and music; cinematic
Characteristic
Shot : Two women are dancing in a crowded space, seemingly at a festival or celebration. The background is filled with out-of-focus people, brightly colored lights and blurred decorations.
Aesthetic Score : 0.6
Mood : joyful, festive, celebratory
Quality
Entropy : 6.68
Noise : 110
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is a slight chromatic aberration around the edges of the image. Some noise is visible in the background.
Reach for the Horizon: Adventure Awaits
A lone hiker, backpack in tow, extends a hand towards you from a mountaintop, inviting you to join his journey. The view stretches out before him, a sea of clouds concealing the valley below. This image captures the spirit of adventure, hope, and inspiration, urging you to embrace the unknown and reach for your own horizon.
Prompt
poses holding-hands: Determined, courageous, triumphant ; A hiker; close-up; heroism; a breathtaking mountain vista with clouds swirling below; cinematic
Characteristic
Shot : A man on a mountain top is reaching out his hand towards the viewer, as if inviting them to join him on his journey.
Aesthetic Score : 0.7
Mood : adventurous, hopeful, inspiring
Quality
Entropy : 6.63
Noise : 92
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has slight noise and some artifacts in the mountains
Love in the Golden Hour: A Moment of Intimacy
Experience the warmth and romance of a sunset as two hands intertwine in a tender moment. The soft lighting and blurred background of hills create a sense of intimacy and connection, making this the perfect image for those who cherish the beauty of love.
Prompt
poses holding-hands: Playful, celebratory, carefree ; close-up; adventure; cinematic
Characteristic
Shot : A couple holding hands with a sunset in the background.
Aesthetic Score : 0.7
Mood : romantic, hopeful, serene
Quality
Entropy : 6.69
Noise : 99
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some slight artifacts in the background, particularly around the sunset. This could be due to compression or noise reduction.
Spotlight Romance: A Dramatic Performance of Love and Connection
A man and a woman share an intimate moment on stage, their connection highlighted by the dramatic spotlight. The man, holding a microphone, looks down while the woman gazes up at him, her hand in his. The scene exudes romance and intimacy, creating a captivating display of emotion and drama.
Prompt
poses holding-hands: Passionate, connected, expressive ; musicians; medium shot; groups; a dimly lit stage with spotlights shining on them; cinematic
Characteristic
Shot : Two figures, a man in a suit and a woman in a long dress, are performing on a stage lit by spotlights. The stage appears to be a musical performance, with an instrument visible in the background. The man holds a microphone in his hand while the woman looks away.
Aesthetic Score : 0.7
Mood : dramatic, intense, intimate
Quality
Entropy : 6.04
Noise : 101
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight graininess and noise in the image. The highlights are blown out in some areas.
Golden Hour Romance: A Desert Love Story
In this serene and adventurous scene, a couple stands amidst the vast desert landscape, their gazes lost in the horizon. As the sun sets, it paints the scene with a warm, golden light, creating an intimate and romantic atmosphere.
Prompt
poses holding-hands: Romantic, adventurous, hopeful ; couple; long shot; travel; a vast desert landscape with a setting sun in the distance; cinematic
Characteristic
Shot : A couple is walking hand-in-hand in a desert landscape, with the setting sun casting a warm glow.
Aesthetic Score : 0.7
Mood : romantic, adventurous, serene
Quality
Entropy : 6.64
Noise : 56
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight color cast, but no major artifacts.
Conclusion
The results show that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.6, which is considered good. This indicates that the model was able to understand the scene described in the prompt and create a shot that aligns with it.
- Aesthetic Analysis: The model scored 0.1, which is considered very good. This means that the generated image closely matched the expected aesthetic style.
Overall, the model demonstrates a good understanding of the scene and shot composition, but needs improvement in accurately capturing the intended camera position. The aesthetic quality of the generated image is very good.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/