AI's Artistic Journey: Capturing Poses, But Missing the Scene with Imagen-v3
- 10 minutes read - 1972 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate realistic and visually appealing images is a coveted skill. One area of focus is the creation of images that capture specific poses and scenes. This blog post delves into a case study where a generative AI model was tasked with generating images based on detailed scene descriptions and desired poses. While the model demonstrated a strong understanding of aesthetic elements, it struggled to accurately represent the intended camera positions and scene details. This highlights the ongoing challenges in developing AI models that can seamlessly translate complex descriptions into visually compelling images.
Created with: imagen-v3
Conquering the Summit: A Climber’s Triumph Amidst Majestic Peaks
A lone climber stands victorious on a mountain peak, bathed in the ethereal glow of snow-capped summits and swirling clouds. The image evokes a sense of awe and accomplishment, capturing the epic beauty of the landscape and the climber’s unwavering spirit.
Prompt
poses crossed-arms: determined, confident ; A lone explorer, standing atop a windswept mountain peak; wide shot; Adventure; a vast, breathtaking panorama of snow-capped peaks and swirling clouds; cinematic
Characteristic
Shot : A lone climber stands on a mountain peak with a majestic view of snow-capped peaks and clouds below.
Aesthetic Score : 0.7
Mood : epic, serene, inspirational
Quality
Entropy : 6.73
Noise : 80
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.80
Image errors : The clouds are slightly blurry and lack depth. The snow on the mountains also appears somewhat artificial and unrealistic.
Superman: A Silhouette of Power at Sunset
A dramatic image capturing Superman’s heroic stance against a vibrant city skyline at sunset. The lighting and his crossed arms convey a sense of strength and determination, making for a powerful and evocative scene.
Prompt
poses crossed-arms: powerful, stoic ; A superhero, silhouetted against a blazing sunset; medium shot; Heroism; a cityscape with towering skyscrapers and a fiery sky; cinematic
Characteristic
Shot : Superman standing with crossed arms in front of a city skyline at sunset.
Aesthetic Score : 0.7
Mood : heroic, dramatic, powerful
Quality
Entropy : 6.62
Noise : 62
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some minor artifacts, particularly in the background.
The Eyes of a Champion: Gamers Locked in a Battle of Wits
Three young men, clad in their gaming jerseys, sit before a computer screen, their faces illuminated by a dramatic blue and purple glow. Their crossed arms and focused expressions speak volumes about their intense concentration and competitive spirit. The blurred background adds to the sense of immersion, highlighting the players’ unwavering determination to conquer the virtual battlefield.
Prompt
poses crossed-arms: focused, intense ; A group of gamers, huddled around a glowing computer screen; close-up; Gaming; a dimly lit room with neon lights and gaming peripherals; cinematic
Characteristic
Shot : Three young men wearing gaming jerseys are sitting in front of a computer, their arms are crossed, they are focused and determined. The background is blurred and the image is lit with blue and purple light.
Aesthetic Score : 0.6
Mood : intense, focused, competitive
Quality
Entropy : 6.73
Noise : 88
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No major errors. The image is slightly blurry in some areas, and the colors are a bit oversaturated.
Lost in Parisian Dreams: A Moment of Contemplation
A young woman stands alone on a Parisian street, her gaze lost in the distance. The Eiffel Tower looms in the background, a symbol of romance and nostalgia. The blurred background and her pensive expression create a sense of isolation and contemplation, capturing a dreamy mood.
Prompt
poses crossed-arms: awe-struck, contemplative ; A young woman, gazing out at the Eiffel Tower; medium shot; Tourism; a bustling Parisian street with charming cafes and cobblestone streets; cinematic
Characteristic
Shot : A young woman is standing on a street in Paris, with the Eiffel Tower in the background.
Aesthetic Score : 0.7
Mood : romantic, nostalgic, dreamy
Quality
Entropy : 6.85
Noise : 94
Prompt Clip Score : 0.36
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight blurriness in some areas, particularly in the background.
Escape to Paradise: A Man Finds Serenity on a Tropical Beach
This serene scene captures the essence of adventure and tranquility. A lone figure stands on a pristine white sand beach, surrounded by swaying palm trees and the turquoise waters of a tropical paradise. The man’s relaxed pose and the breathtaking scenery evoke a sense of peace and wonder, inviting you to imagine yourself escaping to this idyllic destination.
Prompt
poses crossed-arms: free-spirited, adventurous ; A backpacker, standing on a deserted beach; long shot; Travel; a pristine beach with turquoise waters and palm trees swaying in the breeze; cinematic
Characteristic
Shot : A man stands on a tropical beach with palm trees in the background, the water is turquoise and the sand is white
Aesthetic Score : 0.7
Mood : serene, peaceful, adventurous
Quality
Entropy : 6.57
Noise : 99
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has a slightly unnatural look to it. The man’s pose and the overall composition are a bit too perfect, which makes the image look staged and slightly unrealistic. The sand appears overly smooth and pristine.
Astronauts on the Brink of Discovery
A group of astronauts, clad in futuristic spacesuits, stand poised before a colossal spaceship, their faces etched with determination. The blurry nebula and distant spacecraft in the background hint at the vastness and uncertainty of their mission, creating a sense of anticipation and awe.
Prompt
poses crossed-arms: determined, united ; A team of astronauts, standing in the shadow of a colossal spaceship; medium shot; Heroism; a futuristic spaceport with gleaming metal and swirling nebulae; cinematic
Characteristic
Shot : A group of astronauts in futuristic spacesuits stand in front of a large spaceship. The background is a blurry nebula and a large spacecraft.
Aesthetic Score : 0.7
Mood : serious, determined, futuristic
Quality
Entropy : 6.67
Noise : 97
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.70
Image errors : The lighting is a little flat, and the astronauts’ faces are not very expressive. The edges of the image look a bit blurry and the background seems a bit overly blurred. The characters seem a bit too perfect and posed.
VR Fun: Smiles and Excitement in a Neon-Lit Arcade
Three friends, beaming with joy, stand in a vibrant VR arcade bathed in blue and red lighting. Their crossed arms suggest a playful challenge, hinting at the immersive experiences awaiting them. The scene captures the excitement and anticipation of stepping into a virtual world.
Prompt
poses crossed-arms: excited, triumphant ; A group of friends, celebrating a victory in a virtual reality game; close-up; Gaming; a brightly lit arcade with flashing lights and immersive VR headsets; cinematic
Characteristic
Shot : Three people wearing VR headsets stand in a dimly lit room with blue and red lighting. They are smiling and have their arms crossed. The scene is likely a gaming arcade or a VR experience center.
Aesthetic Score : 0.6
Mood : happy, excited, playful
Quality
Entropy : 6.72
Noise : 85
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : No obvious image errors
Lost in the Cityscape: A Moment of Contemplation
A young man stands alone on a bridge, his silhouette a stark contrast against the sprawling cityscape and the flowing river. The image evokes a sense of solitude and reflection, capturing the quiet contemplation of urban life.
Prompt
poses crossed-arms: reflective, introspective ; A lone traveler, standing on a bridge overlooking a bustling city; medium shot; Travel; a vibrant cityscape with towering buildings and a river flowing below; cinematic
Characteristic
Shot : A young man stands on a bridge overlooking a river with boats and a cityscape in the background. The scene is framed in a way that emphasizes the man’s solitary figure and the vastness of the cityscape.
Aesthetic Score : 0.6
Mood : reflective, urban, contemplative
Quality
Entropy : 6.62
Noise : 98
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is no visible artifacts or errors in the image.
Conquering the Peak: Eight Friends Celebrate a Majestic Sunset
A group of eight young men stand triumphantly atop a mountain, their smiles reflecting the joy of their adventure. The low angle shot emphasizes their strength and the vastness of the landscape, while the cloudy sunset adds a touch of drama and hope to the scene. This is a moment of shared accomplishment and the promise of more adventures to come.
Prompt
poses crossed-arms: accomplished, exhilarated ; A group of hikers, standing at the summit of a mountain; wide shot; Adventure; a panoramic view of rolling hills and lush forests; cinematic
Characteristic
Shot : A group of eight young men are standing on the top of a mountain, with a view of rolling hills and forest in the distance. They are all wearing hiking gear and backpacks. The sky is cloudy, and the sun is setting. The photo is taken from a low angle, and the men are looking at the camera.
Aesthetic Score : 0.6
Mood : adventurous, joyful, hopeful
Quality
Entropy : 6.77
Noise : 94
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is a bit overexposed, especially the sky and hills in the background. Some noise is visible in the sky.
Young Explorers Embrace the Majesty of a Grand Mosque
A group of seven friends stand in awe before a stunningly ornate mosque, its intricate tilework and towering minarets creating a sense of wonder and adventure. The vibrant scene captures the joy of cultural exploration and the beauty of architectural grandeur.
Prompt
poses crossed-arms: happy, excited ; A group of tourists, posing for a photo in front of a famous landmark; medium shot; Tourism; a historic landmark with intricate architecture and vibrant colors; cinematic
Characteristic
Shot : A group of seven young adults stand in front of a grand, ornate building with intricate tilework. The building has a large archway and minarets on either side.
Aesthetic Score : 0.6
Mood : happy, adventurous, cultural
Quality
Entropy : 6.69
Noise : 105
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors
Conclusion
The results indicate that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t fully capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.43, also below the “good” range. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflects it.
- Aesthetic Analysis: The model scored 0.1, which is within the “very good” range of -0.2 to 0.1. This means the generated image’s aesthetic closely matched the expected aesthetic described in the prompt.
Overall, the model seems to be better at understanding the desired aesthetic than the scene and camera position. It might need further training to improve its ability to accurately interpret and translate camera positions and scene descriptions into images.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/