AI Captures the Pose, But Misses the Feeling with Imagen-v2
- 9 minutes read - 1824 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunning visuals based on text prompts. However, achieving the desired aesthetic remains a challenge. This blog post delves into an experiment that explores the capabilities of a generative AI model in capturing specific poses and scenes, highlighting its strengths and weaknesses in achieving the desired visual style.
Created with: imagen-v2
Silhouetted Against the Sunset, a Figure Contemplates the Vastness
A lone figure, cloaked in darkness, stands with their back to the viewer, gazing out at a majestic mountain range bathed in the warm glow of the setting sun. The silhouette against the fiery sky evokes a sense of isolation and contemplation, leaving the viewer to ponder the mysteries of the scene.
Prompt
poses profile: Epic, hopeful, determined ; A lone figure, silhouetted against a setting sun; wide shot; Heroism; A vast, mountainous landscape; cinematic
Characteristic
Shot : A man in a rugged cloak stands with his back to the camera, looking out at a vast landscape of mountains and a sunset. His face is partially shadowed, creating a sense of mystery.
Aesthetic Score : 0.8
Mood : dramatic, introspective, solitary
Quality
Entropy : 6.80
Noise : 83
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No obvious image errors
A Hiker’s Perspective: Where Nature’s Majesty Meets Serenity
A lone hiker stands on a cliff, dwarfed by the breathtaking beauty of a valley with two cascading waterfalls. The overcast sky adds a dramatic touch, while the scene evokes a sense of adventure and tranquility.
Prompt
poses profile: Adventurous, free-spirited, awe-inspired ; A backpacker standing on a cliff edge, looking out at a breathtaking view; medium shot; Adventure; A sprawling valley with cascading waterfalls; cinematic
Characteristic
Shot : A man is standing on a cliff overlooking a valley with two waterfalls. The sky is overcast, but the light is still bright. The man is wearing a backpack and is looking out at the view.
Aesthetic Score : 0.8
Mood : epic, adventurous, contemplative
Quality
Entropy : 6.71
Noise : 105
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors
The Intensity of Play: A Close-Up on Focused Hands
A close-up shot captures the intensity of a gamer’s focus as their hands grip a video game controller. The blurry background of a computer monitor and lamp adds to the sense of immersion, highlighting the player’s dedication to the game.
Prompt
poses profile: Focused, intense, passionate ; A gamer’s hands, illuminated by the glow of a monitor, holding a controller; close-up; Gaming; A dimly lit room with gaming posters on the walls; cinematic
Characteristic
Shot : Close-up shot of a person’s hands holding a video game controller. The background is blurry and out of focus, but it appears to be a gaming setup with a monitor and other electronics.
Aesthetic Score : 0.6
Mood : intense, focused, immersive
Quality
Entropy : 6.19
Noise : 90
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight color banding and noise, especially in the shadows.
Contemplation in the City: A Moment of Awe
A woman stands amidst the bustling city, her gaze drawn upwards to the imposing cathedral. The muted colors and cloudy sky create a peaceful atmosphere, inviting viewers to share in her introspective moment of wonder.
Prompt
poses profile: Curious, excited, appreciative ; A tourist gazing up at a majestic cathedral; medium shot; Tourism; A bustling city square with cobblestone streets; cinematic
Characteristic
Shot : A woman in a tan coat stands in front of a large cathedral with a crowd of people in the background. The camera is focused on the woman’s face, and the cathedral is blurred in the background.
Aesthetic Score : 0.6
Mood : pensive, contemplative, peaceful
Quality
Entropy : 6.75
Noise : 88
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, and the colors are a bit washed out. Some digital noise is visible in the background.
Lost in the Landscape: A Moment of Contemplation
A man gazes out the train window, his expression hinting at a melancholic introspection. The passing countryside becomes a backdrop for his inner thoughts, creating a poignant scene of longing and contemplation.
Prompt
poses profile: Reflective, contemplative, nostalgic ; A traveler sitting on a train, looking out the window at passing scenery; medium shot; Travel; A scenic train journey through rolling hills and fields; cinematic
Characteristic
Shot : A man sits by a train window, looking out at the passing countryside.
Aesthetic Score : 0.6
Mood : melancholy, contemplative, introspective
Quality
Entropy : 6.66
Noise : 103
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry, and the colors are muted. The subject appears slightly overexposed.
Laughter and Light: Capturing the Joy of a Festive Gathering
Two women share a moment of genuine laughter at a vibrant party, their joy illuminated by festive decorations. The shallow depth of field focuses on their expressions, creating a sense of intimacy and connection amidst the celebratory atmosphere.
Prompt
poses profile: Joyful, celebratory, connected ; A group of friends laughing and celebrating together; wide shot; Groups; A lively party with colorful decorations and music; cinematic
Characteristic
Shot : Two young women are laughing and enjoying themselves at a party, they are surrounded by friends and colorful decorations. It’s a celebratory atmosphere with a lot of energy.
Aesthetic Score : 0.7
Mood : joyful, celebratory, carefree
Quality
Entropy : 6.62
Noise : 109
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image contains some noise, particularly in the background. There is also some overexposure in the highlights and some chromatic aberration visible in the edges.
Superman Takes Flight Over Metropolis
A powerful image captures the iconic superhero standing tall on a rooftop, his cape billowing in the wind as he surveys the cityscape below. The scene evokes a sense of heroism, power, and hope, emphasizing Superman’s dominance and unwavering commitment to protecting the city.
Prompt
poses profile: Powerful, confident, inspiring ; A superhero standing tall, cape billowing in the wind; medium shot; Heroism; A cityscape with towering skyscrapers; cinematic
Characteristic
Shot : A man dressed as Superman standing on a rooftop overlooking a city skyline.
Aesthetic Score : 0.6
Mood : heroic, powerful, determined
Quality
Entropy : 6.68
Noise : 72
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.70
Image errors : The city skyline appears to be somewhat blurry and lacks detail. The red cape also has some unnatural folds.
Lost in the Jungle’s Embrace: A Mysterious Expedition
A group of explorers venture deep into a misty jungle, their path shrouded in mystery. The lush foliage and dramatic lighting create a sense of depth and intrigue, highlighting the central explorer as they navigate the unknown. This scene evokes a mood of adventure, mystery, and a touch of the eerie.
Prompt
poses profile: Intrigued, adventurous, determined ; A group of explorers navigating a dense jungle; wide shot; Adventure; Lush greenery, ancient ruins, and dappled sunlight; cinematic
Characteristic
Shot : A group of adventurers, mostly men, in period clothing (early 1900s), are walking through a lush jungle. The composition is a little awkward with the main character in the foreground but the light and colors are appealing, giving it an interesting mood.
Aesthetic Score : 0.6
Mood : mysterious, adventurous, jungle
Quality
Entropy : 6.71
Noise : 105
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly blurry and has some artifacts, particularly in the background. The textures seem to be a little blurry in the background.
Intense Gaze, Dramatic Lighting: A Portrait of Urban Cool
A young man, bathed in vibrant red and blue light, stares directly into the camera with an intensity that demands attention. His black jacket and headphones add to the edgy, urban aesthetic, creating a mood that is both dramatic and captivating.
Prompt
poses profile: Focused, competitive, determined ; A gamer’s face, lit by the screen, showing intense concentration; close-up; Gaming; A dimly lit room with a gaming setup and neon lights; cinematic
Characteristic
Shot : A close-up portrait of a young man with curly hair, wearing a black jacket and headphones, against a backdrop of neon lights.
Aesthetic Score : 0.7
Mood : intense, serious, mysterious
Quality
Entropy : 6.09
Noise : 96
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts around the edges of the subject’s hair and skin, and the lighting is slightly uneven.
Sunset Serenade: A Dreamy Beach Stroll
Experience the warmth of a romantic sunset as a couple shares a dreamy walk on the beach, their silhouettes painting a picture of love against the vibrant sky. The man gazes towards the camera, while the woman’s subtle downward glance adds an air of mystery to this visually striking scene.
Prompt
poses profile: Romantic, peaceful, serene ; A couple holding hands, walking along a beach at sunset; medium shot; Tourism; A golden beach with turquoise waters and a vibrant sky; cinematic
Characteristic
Shot : A couple is walking on the beach hand in hand, with a beautiful sunset behind them.
Aesthetic Score : 0.7
Mood : romantic, peaceful, warm
Quality
Entropy : 6.72
Noise : 91
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.30
Image errors : No visible artifacts or errors.
Conclusion
The results show that the generative AI model performed well in understanding and executing camera positions and shot composition, but struggled with achieving the desired aesthetic. Here’s a breakdown:
- Camera Position: The model scored 4.5 out of 10, indicating a good understanding of the camera position specified in the prompt. This suggests the model is capable of generating images with the intended camera angles and perspectives.
- Shot Analysis: The model scored 4.6 out of 10, also indicating a good understanding of the shot composition specified in the prompt. This suggests the model is capable of generating images with the intended framing and composition.
- Aesthetic Analysis: The model scored 0.05 out of 10, indicating a significant difference between the expected aesthetic and the actual aesthetic of the generated image. This suggests the model struggled to capture the desired visual style or mood.
Overall, the model demonstrates a strong ability to interpret and execute camera positions and shot composition, but needs improvement in achieving the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/