AI's Artistic Struggle: Capturing the Essence of Poses with Imagen-v2
- 9 minutes read - 1904 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on textual descriptions is a fascinating area of exploration. This blog post delves into the results of a generative AI model tasked with creating images based on scene descriptions, focusing on the model’s performance in capturing the essence of poses. The model demonstrates a decent understanding of camera position and shot composition, but struggles to achieve the desired aesthetic, highlighting the challenges of AI in capturing the nuances of artistic expression. We’ll explore the model’s strengths and weaknesses, analyzing its performance in terms of camera position, shot analysis, and aesthetic execution. Through this analysis, we’ll gain insights into the current capabilities and limitations of AI in generating visually compelling images.
Created with: imagen-v2
A Solitary Figure Above the Clouds
A breathtaking scene of a man standing on a mountain peak, bathed in the soft hues of a sunrise or sunset. The vast expanse of clouds below creates a sense of awe and solitude, capturing the essence of adventure and tranquility.
Prompt
poses crossed-arms: determined, confident ; A lone explorer, standing atop a windswept mountain peak; wide shot; Adventure; a vast, breathtaking panorama of snow-capped peaks and swirling clouds; cinematic
Characteristic
Shot : A lone figure stands on a mountain peak overlooking a sea of clouds, with a clear sky above.
Aesthetic Score : 0.7
Mood : serene, contemplative, adventurous
Quality
Entropy : 6.65
Noise : 84
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : No notable errors
Superman Stands Tall Amidst the Flames
A powerful image captures the heroic figure of Superman, standing resolute against a backdrop of fiery destruction. The burning city behind him emphasizes the intensity of the moment, highlighting his unwavering determination and strength in the face of adversity.
Prompt
poses crossed-arms: powerful, stoic ; A superhero, silhouetted against a blazing sunset; medium shot; Heroism; a cityscape with towering skyscrapers and a fiery sky; cinematic
Characteristic
Shot : Superman, the iconic superhero, stands in a city skyline at sunset, arms crossed. The city is glowing, suggesting a potential disaster or a conflict. Superman’s appearance has a painterly, stylized quality with prominent muscle definition and a texture almost like bark.
Aesthetic Score : 0.6
Mood : dramatic, powerful, intense
Quality
Entropy : 6.36
Noise : 51
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.90
Image errors : The lighting and texture on Superman are inconsistent with realistic human anatomy. The background cityscape is overly stylized and lacks detail.
Lost in the Zone: A Portrait of Focus
A young man, bathed in vibrant red and blue light, sits at his desk with a determined expression. His headphones and crossed arms convey an intense focus, capturing the essence of deep concentration.
Prompt
poses crossed-arms: focused, intense ; A group of gamers, huddled around a glowing computer screen; close-up; Gaming; a dimly lit room with neon lights and gaming peripherals; cinematic
Characteristic
Shot : A young man wearing headphones and glasses is sitting at a desk in front of a computer, his arms are crossed and he looks serious.
Aesthetic Score : 0.6
Mood : intense, focused, serious
Quality
Entropy : 6.50
Noise : 87
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, and the lighting is a bit harsh.
Lost in Thought Beneath the Eiffel Tower
A solitary figure stands before the iconic Eiffel Tower, her gaze lost in the melancholic sky. The contrast between her small form and the towering structure evokes a sense of contemplation and romantic longing.
Prompt
poses crossed-arms: awe-struck, contemplative ; A young woman, gazing out at the Eiffel Tower; medium shot; Tourism; a bustling Parisian street with charming cafes and cobblestone streets; cinematic
Characteristic
Shot : A young woman stands in front of the Eiffel Tower, looking up with a thoughtful expression.
Aesthetic Score : 0.7
Mood : pensive, romantic, Parisian
Quality
Entropy : 6.84
Noise : 71
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight amount of noise in the background, possibly due to the low light conditions. There is also a minor issue with the sharpness of the woman’s face.
Tranquility on a Tropical Beach
A serene scene of a man standing on a pristine white sandy beach, bathed in the warm glow of a turquoise ocean and a bright blue sky. Palm trees sway gently in the breeze, adding to the sense of peace and tranquility. This image captures the essence of a perfect tropical getaway.
Prompt
poses crossed-arms: free-spirited, adventurous ; A backpacker, standing on a deserted beach; long shot; Travel; a pristine beach with turquoise waters and palm trees swaying in the breeze; cinematic
Characteristic
Shot : A man standing on a white sandy beach, looking towards the ocean with his arms crossed. The sky is a bright turquoise and there are palm trees in the background.
Aesthetic Score : 0.6
Mood : tropical, relaxing, adventurous
Quality
Entropy : 6.69
Noise : 108
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The sky appears unnatural and overly saturated. There are some minor artifacts in the image, mainly in the sand.
Space Odyssey: A Moment of Suspense
Three astronauts, clad in their space suits, stand against a dark, metallic backdrop, their expressions serious and their gaze fixed on something unseen. A hint of fog adds to the atmosphere of suspense, leaving the viewer wondering what lies ahead in this futuristic odyssey.
Prompt
poses crossed-arms: determined, united ; A team of astronauts, standing in the shadow of a colossal spaceship; medium shot; Heroism; a futuristic spaceport with gleaming metal and swirling nebulae; cinematic
Characteristic
Shot : Three astronauts in space suits standing in front of a dark, mysterious background, possibly a spaceship or a planet. The astronauts have serious expressions on their faces and appear to be in a perilous situation.
Aesthetic Score : 0.7
Mood : serious, suspenseful, futuristic
Quality
Entropy : 5.98
Noise : 92
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has some minor artifacts, especially noticeable on the astronauts’ suits, suggesting potential over-processing. The lighting is a bit flat and could use more contrast to enhance the mood.
Lost in the Digital Maze: A Cyberpunk VR Experience
Four figures, shrouded in the glow of VR headsets, stand in a dimly lit room, their gazes lost in the digital realm. Colorful lights dance in the background, hinting at a world of mystery and intrigue. This cyberpunk scene captures the allure and uncertainty of a future where reality blurs with the virtual.
Prompt
poses crossed-arms: excited, triumphant ; A group of friends, celebrating a victory in a virtual reality game; close-up; Gaming; a brightly lit arcade with flashing lights and immersive VR headsets; cinematic
Characteristic
Shot : Four people wearing VR headsets stand in a dimly lit room with neon lights in the background. They are all looking in different directions, seemingly lost in the virtual world.
Aesthetic Score : 0.7
Mood : futuristic, mysterious, cool
Quality
Entropy : 6.29
Noise : 89
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : Slight noise in the darker areas and slight over-sharpening in the highlights.
Contemplating the Cityscape
A solitary figure, clad in a brown sweater and black beanie, stands on a bridge, gazing out at the sprawling cityscape. The scene evokes a sense of reflection and calm, with the man’s posture and the urban backdrop creating a powerful sense of isolation and contemplation.
Prompt
poses crossed-arms: reflective, introspective ; A lone traveler, standing on a bridge overlooking a bustling city; medium shot; Travel; a vibrant cityscape with towering buildings and a river flowing below; cinematic
Characteristic
Shot : A young man standing on a bridge overlooking a city skyline. The man is wearing a beanie, a brown sweater, and a backpack. He is looking off into the distance.
Aesthetic Score : 0.7
Mood : pensive, urban, contemplative
Quality
Entropy : 6.66
Noise : 96
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors
Contemplating the Vastness: A Moment of Serenity on the Clifftop
Three adventurers stand on a rocky precipice, dwarfed by the majestic mountain range and the expansive valley below. The sun bathes the scene in a warm glow, creating a sense of serenity and awe. This image captures the beauty of nature and the human spirit’s desire to explore and connect with the world around us.
Prompt
poses crossed-arms: accomplished, exhilarated ; A group of hikers, standing at the summit of a mountain; wide shot; Adventure; a panoramic view of rolling hills and lush forests; cinematic
Characteristic
Shot : Three people are standing on a cliff looking out over a mountain valley at sunset.
Aesthetic Score : 0.6
Mood : tranquil, adventurous, inspiring
Quality
Entropy : 6.83
Noise : 109
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight color cast and the sun is overexposed. The overall image is slightly grainy
Friends Strike a Pose in Front of Milan’s Majestic Duomo
A group of friends capture a happy moment in front of the Duomo di Milano, the iconic gothic cathedral. The impressive size and intricate details of the cathedral create a dramatic backdrop for their casual, touristy photo.
Prompt
poses crossed-arms: happy, excited ; A group of tourists, posing for a photo in front of a famous landmark; medium shot; Tourism; a historic landmark with intricate architecture and vibrant colors; cinematic
Characteristic
Shot : A group of friends pose in front of a large cathedral in the daytime.
Aesthetic Score : 0.6
Mood : casual, friendly, touristy
Quality
Entropy : 6.77
Noise : 97
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to have some noise and a slight blur, which may be due to post-processing or the original capture. There is a lack of sharpness in some of the details.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.36, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t fully capture the intended camera positions described in the prompt.
- Shot Analysis: The model scored 0.605, falling within the “good” range. This indicates that the model was able to understand the scene and create a shot that was somewhat aligned with the prompt.
- Aesthetic Analysis: The model scored 0.09, which is significantly higher than the “very good” range of -0.2 to 0.1. This means that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall, the model demonstrated a decent understanding of the scene and shot composition, but struggled to achieve the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/