AI's Artistic Struggle: Capturing the Essence of Poses with Imagen-v3-fast
- 9 minutes read - 1784 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on textual descriptions is a rapidly evolving field. One intriguing aspect of this technology is its capacity to understand and translate pose descriptions into visual representations. This blog post delves into an experiment where an AI model was tasked with generating images based on various pose descriptions, revealing both its strengths and limitations in capturing the essence of human poses.
Created with: imagen-v3-fast
Escape from the Storm: A Lone Figure Races Against the Elements
A solitary figure, cloaked in a long coat, sprints across a desolate wasteland, pursued by a ferocious lightning storm. The dramatic backdrop amplifies the sense of urgency and danger, creating an epic and thrilling scene.
Prompt
poses running: determined, hopeful ; A lone figure in a tattered cloak; wide shot; Heroism; a desolate wasteland with a storm brewing in the distance; cinematic
Characteristic
Shot : A lone figure in a long coat runs across a barren wasteland, pursued by a dramatic lightning storm.
Aesthetic Score : 0.7
Mood : dark, dramatic, epic
Quality
Entropy : 6.89
Noise : 69
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some aliasing and banding artifacts are noticeable in the lightning and sky, especially in the distance, making the image appear slightly digital.
Into the Unknown: A Journey Through the Jungle
A lone figure races through a vibrant jungle, their destination a mysterious ancient temple shrouded in the distance. The scene evokes a sense of adventure, mystery, and hope, as the runner ventures into the unknown.
Prompt
poses running: Intrigued, eager to explore ; A lone figure, backpack slung over their shoulder, stands at the edge of a dense jungle. Ancient ruins peek through the foliage in the distance.; cinematic
Characteristic
Shot : A person is running through a lush jungle towards a mysterious ancient temple in the distance.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, hopeful
Quality
Entropy : 6.53
Noise : 81
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image appears to be AI generated and has some minor artifacts, particularly in the leaves and the background. The shadows are slightly unrealistic.
The Intensity of the Game
A gamer, lost in the heat of competition, leans forward, his hands flying across the keyboard. The reflection in the monitor reveals his focused expression, capturing the thrill and tension of the moment.
Prompt
poses running: intense, focused ; A gamer’s hands on a keyboard and mouse; close-up; Gaming; a brightly lit gaming room with a monitor displaying a virtual world; cinematic
Characteristic
Shot : A gamer is playing a video game in a dark room. He’s focused on the game and his hands are on the keyboard. He’s wearing headphones, and you can see his reflection in the monitor.
Aesthetic Score : 0.5
Mood : intense, focused, competitive
Quality
Entropy : 5.97
Noise : 26
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, and the lighting is dark.
Chasing the Sun Through Cobblestone Streets
Four friends race through a narrow, historic alleyway, their laughter echoing off the ancient stone walls. The energy is palpable, the adventure just beginning. This playful scene captures the thrill of exploration and the joy of shared moments.
Prompt
poses running: energetic, joyful ; A group of tourists running through a bustling marketplace; long shot; Tourism; a vibrant marketplace with colorful stalls and vendors; cinematic
Characteristic
Shot : Four people are running down a narrow street lined with shops on either side. The street is made of cobblestones and the buildings are tall and old.
Aesthetic Score : 0.7
Mood : energetic, adventurous, playful
Quality
Entropy : 6.74
Noise : 107
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors
A Love Story Unfolds on the Shore: A Romantic Stroll by the Ocean
In this captivating scene, a couple, dressed in a blue suit and a red dress, walk hand-in-hand towards the ocean. The sky above them is a beautiful canvas of blue and white, perfectly mirroring their joyful and carefree mood. The dramatic effect of their romantic stroll creates an atmosphere of excitement and anticipation, making this a truly unforgettable moment.
Prompt
poses running: romantic, carefree ; A couple running hand-in-hand along a beach; medium shot; Travel; a beautiful beach with turquoise water and white sand; cinematic
Characteristic
Shot : A couple is walking hand-in-hand on a beach towards the ocean. The man is wearing a blue suit and the woman is wearing a red dress. The sky is blue with white clouds.
Aesthetic Score : 0.7
Mood : romantic, joyful, carefree
Quality
Entropy : 6.69
Noise : 68
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors.
Summer Fun in the Park: Friends Embrace the Joy of the Moment
Capture the essence of carefree summer days with this vibrant image. Four friends, radiating joy and laughter, run through a lush park bathed in warm sunlight. The dynamic composition and natural light evoke a sense of fun and abandon, making this a perfect snapshot of friendship and happiness.
Prompt
poses running: happy, playful ; A group of friends running through a park; wide shot; Groups; a sunny park with green grass and trees; cinematic
Characteristic
Shot : Four friends are running through a park on a sunny day. They are all smiling and laughing. The trees are green and the grass is lush.
Aesthetic Score : 0.7
Mood : joyful, carefree, summery
Quality
Entropy : 6.73
Noise : 111
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible artifacts or errors in the image.
Superheroes in Motion: A Dramatic Nighttime Chase
Two superheroes, bathed in the glow of city lights, race through the urban landscape. The foreground hero’s dynamic pose and the blurred background create a sense of speed and urgency, while the dramatic lighting adds to the heroic and futuristic mood.
Prompt
poses running: powerful, confident ; A superhero in a bright costume; close-up; Heroism; a city skyline with skyscrapers and flashing lights; cinematic
Characteristic
Shot : Two superheroes running through a city at night. One is in the foreground and the other is in the background.
Aesthetic Score : 0.7
Mood : dramatic, heroic, futuristic
Quality
Entropy : 6.56
Noise : 54
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.95
Image errors : The image is slightly blurry and the lighting is a bit flat.
A Solitary Journey Through a Snowy Wilderness
A lone figure races across a snow-covered valley, their destination a towering, snow-capped mountain. The vastness of the landscape and the figure’s isolation evoke a sense of awe and melancholy, hinting at an adventurous journey.
Prompt
poses running: determined, adventurous ; A lone explorer running through a snow-covered mountain pass; long shot; Adventure; a majestic mountain range with snow-capped peaks; cinematic
Characteristic
Shot : A lone figure runs through a snowy valley towards a towering, snow-capped mountain. The sky is a muted blue, and the overall atmosphere is serene and somewhat melancholic.
Aesthetic Score : 0.7
Mood : serene, melancholic, adventurous
Quality
Entropy : 6.84
Noise : 76
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.90
Image errors : There are some minor artifacts in the image, particularly around the edges of the mountain and the figure. The snow appears slightly blurry in places.
Escape from the Colossus: Humanity’s Last Stand in a Futuristic City
A tense and suspenseful scene unfolds as a group of people desperately flee from towering alien creatures in a futuristic cityscape. The image captures the urgency and danger of their situation, leaving viewers on the edge of their seats.
Prompt
poses running: immersive, exciting ; A gamer’s avatar running through a virtual world; close-up; Gaming; a vibrant and detailed virtual world with fantastical creatures; cinematic
Characteristic
Shot : A group of people are running away from giant alien creatures in a futuristic city.
Aesthetic Score : 0.7
Mood : tense, suspenseful, futuristic
Quality
Entropy : 6.56
Noise : 63
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some blurriness, especially in the foreground, which can be distracting.
Feel the Wind in Your Hair: A Cyclist’s Mountain Adventure
Capture the thrill of the open road as a cyclist races down a mountain pass, bathed in sunshine with breathtaking views. The motion blur adds a sense of speed and excitement, evoking a feeling of freedom and adventure.
Prompt
poses running: Exhilarated, adventurous ; A lone cyclist speeds along a winding mountain road, the sun glinting off the asphalt as they crest a hill, revealing a breathtaking panorama of valleys and peaks.; cinematic
Characteristic
Shot : A cyclist rides down a mountain road in a sunny day, with beautiful views of the mountains in the background
Aesthetic Score : 0.7
Mood : inspiring, adventurous, free
Quality
Entropy : 6.53
Noise : 66
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No notable errors, the blur effect is consistent and well applied
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
Camera Position:
- Score: 0.41
- Interpretation: This score falls below the “good” range of 0.5 to 0.75. It suggests that the model didn’t perfectly capture the intended camera positions described in the prompt.
Shot Analysis:
- Score: 0.6
- Interpretation: This score falls within the “good” range of 0.5 to 0.75. It indicates that the model was able to understand and translate the scene description from the prompt into the generated image fairly well.
Aesthetic Analysis:
- Score: 0.09
- Interpretation: This score is significantly higher than the “very good” range of -0.2 to 0.1. It suggests that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall:
While the model demonstrated good understanding of camera positions and scene descriptions, it struggled to achieve the desired aesthetic. This suggests that the model might need further training to better understand and translate aesthetic preferences into visual outputs.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/