AI's Artistic Struggle: Capturing the Essence of Poses with Imagen-v2
- 9 minutes read - 1915 wordsTable of Contents
Dramatic poses are a powerful tool in visual storytelling, conveying emotions and narratives through body language. From heroic stances to contemplative gazes, these poses have been used for centuries to evoke specific feelings and engage viewers. However, replicating these poses accurately and aesthetically is a complex task, even for advanced AI models. This blog post explores the challenges and successes of an AI model attempting to generate images based on pose descriptions, highlighting the model’s strengths and weaknesses in capturing the essence of dramatic poses.
Created with: imagen-v2
A Knight’s Lonely Vigil: A Melancholic Tale of Epic Proportions
A solitary knight stands amidst a dusty field, his gaze fixed on a distant castle. The cloudy sky and his dramatic pose evoke a sense of loneliness and epic grandeur. This scene, scored 0.7 for aesthetic appeal, captures a melancholic mood, leaving viewers to ponder the knight’s story.
Prompt
poses three-quarter-pose: determined, resolute, heroic ; A lone knight, standing tall on a windswept hilltop; three-quarter pose; Heroism; a vast, stormy landscape with a distant castle in the background; cinematic
Characteristic
Shot : A knight in full armor stands on a grassy hill, gazing towards a castle in the distance. The sky is overcast with a moody, gray sky.
Aesthetic Score : 0.7
Mood : epic, dramatic, medieval
Quality
Entropy : 6.90
Noise : 59
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.70
Image errors : The armor on the knight has a slight plastic look. There are some minor blurring artifacts in the background.
Lost in the Jungle: A Treasure Hunter’s Quest
A lone explorer navigates the dense jungle, map in hand, seeking the secrets of an ancient temple. The sun’s glare and misty background create an atmosphere of mystery and adventure, hinting at the treasures that may lie hidden within.
Prompt
poses three-quarter-pose: adventurous, curious, hopeful ; An intrepid explorer, silhouetted against the setting sun, holding a map; three-quarter pose; Adventure; a dense jungle with ancient ruins in the distance; cinematic
Characteristic
Shot : A man in a safari outfit stands in a jungle environment, holding a map and looking towards the right side of the frame.
Aesthetic Score : 0.7
Mood : adventurous, mysterious, determined
Quality
Entropy : 6.70
Noise : 54
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts, particularly in the background, and the lighting feels a bit artificial. The details on the man’s clothes could be more defined.
Lost in the Game: A Moment of Intense Focus
A young man is completely absorbed in his video game, the dim lighting highlighting his focused expression and the intensity of his gameplay. The scene captures the thrill and immersion of gaming, showcasing a moment of pure concentration and dedication.
Prompt
poses three-quarter-pose: focused, intense, exhilarated ; A gamer, eyes glued to the screen, fingers flying across the keyboard; three-quarter pose; Gaming; a brightly lit gaming room with neon lights and a futuristic cityscape projected on the wall; cinematic
Characteristic
Shot : A young man wearing headphones is sitting at a desk, using a computer. The background is a blurry cityscape, lit by neon lights.
Aesthetic Score : 0.7
Mood : focused, intense, futuristic
Quality
Entropy : 6.49
Noise : 68
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The background appears slightly blurry and out of focus, the image also looks slightly overexposed.
Parisian Dreams: Capturing the Magic of the Eiffel Tower
A woman stands in awe before the iconic Eiffel Tower, her wide eyes and open mouth reflecting the joy and excitement of the Parisian experience. The bustling street and red awning behind her add to the vibrant atmosphere, creating a captivating scene that evokes a sense of wonder and adventure.
Prompt
poses three-quarter-pose: amazed, joyful, curious ; A tourist, gazing in awe at the Eiffel Tower, camera in hand; three-quarter pose; Tourism; a bustling Parisian street with cafes and shops lining the sidewalk; cinematic
Characteristic
Shot : A young woman is standing in front of the Eiffel Tower in Paris, France. She is looking up at the tower with a surprised expression on her face. The photo is taken from a low angle, and the tower is framed in the background.
Aesthetic Score : 0.6
Mood : joyful, surprised, romantic
Quality
Entropy : 6.77
Noise : 86
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some blur and noise, particularly in the background. There are also some artifacts around the edges of the image.
A Hiker’s Perspective: Finding Serenity Amidst Majestic Peaks
A lone hiker stands on a mountaintop, dwarfed by the vast, snow-capped peaks and a winding river below. The scene evokes a sense of awe and wonder, capturing the serenity and adventure of exploring nature’s grandeur.
Prompt
poses three-quarter-pose: free, exhilarated, adventurous ; A backpacker, standing on a mountain peak, arms outstretched, enjoying the view; three-quarter pose; Travel; a breathtaking panorama of snow-capped mountains and valleys; cinematic
Characteristic
Shot : A man in a hiking outfit is standing on a mountain peak and gazing at the valley with a river flowing through it. There are snow-capped mountains in the background.
Aesthetic Score : 0.8
Mood : serene, adventurous, contemplative
Quality
Entropy : 6.71
Noise : 83
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors.
Starry Night Campfire: A Moment of Tranquility
A cozy scene of four friends gathered around a campfire under a breathtaking starry sky. The warmth of the fire and the serenity of the night create a sense of peace and wonder. This image captures the essence of a perfect evening in nature.
Prompt
poses three-quarter-pose: happy, relaxed, connected ; A group of friends, laughing and sharing stories around a campfire; three-quarter pose; Groups; a serene forest clearing with stars twinkling in the night sky; cinematic
Characteristic
Shot : A group of four people are sitting around a campfire in a forest at night, with the Milky Way visible in the sky.
Aesthetic Score : 0.7
Mood : cozy, intimate, adventurous
Quality
Entropy : 6.48
Noise : 116
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : The Milky Way appears somewhat grainy and unrealistic.
Superman’s Triumph in a World of Ruin
A dramatic image captures Superman standing over a defeated villain in a post-apocalyptic cityscape. The composition emphasizes Superman’s dominance, highlighting his heroic power and the bleakness of the world he protects.
Prompt
poses three-quarter-pose: powerful, victorious, confident ; A superhero, standing triumphantly over a defeated villain; three-quarter pose; Heroism; a cityscape with smoke and debris in the background; cinematic
Characteristic
Shot : A superhero, likely Superman, stands over a fallen villain in a post-apocalyptic cityscape. The sky is filled with smoke and debris, suggesting a recent battle.
Aesthetic Score : 0.7
Mood : dramatic, powerful, victorious
Quality
Entropy : 6.68
Noise : 68
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.80
Image errors : The background is slightly blurry and lacks detail, and the cityscape appears somewhat generic. The villain’s face is also somewhat obscured.
Lost in the Majesty: Hiking Through a Dramatic Mountain Landscape
Two adventurers navigate a breathtaking mountain range, the wide-angle lens capturing the vastness of the snow-capped peaks and the valley below. The cloudy, gray sky adds a sense of drama and serenity to this adventurous scene.
Prompt
poses three-quarter-pose: determined, focused, adventurous ; A group of adventurers, navigating a treacherous mountain path; three-quarter pose; Adventure; a rugged mountain range with snow-covered peaks and a deep valley below; cinematic
Characteristic
Shot : Two figures are walking on a mountain ridge, with snow-capped peaks in the background and a cloudy sky. The scene is reminiscent of a dramatic adventure film or fantasy novel.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, dramatic
Quality
Entropy : 6.75
Noise : 82
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.30
Image errors : There are slight signs of compression artifacts and a bit of noise in the shadows.
Focused and Competitive: Three Young Men Locked in a Digital Battle
A trio of young men huddle around a computer screen, their faces illuminated by warm artificial light. The intensity of their gaze suggests a fierce competition, with anticipation hanging heavy in the air. Are they strategizing, collaborating, or battling it out in a virtual arena? The scene is charged with a palpable sense of focus and determination.
Prompt
poses three-quarter-pose: focused, competitive, excited ; A group of gamers, huddled around a table, strategizing their next move; three-quarter pose; Gaming; a dimly lit room with flickering computer screens and a stack of pizza boxes; cinematic
Characteristic
Shot : Three young men are sitting at a table, looking at a computer screen. They are in a dark room with some light coming from behind them. The room is decorated in a gaming style, with a gaming chair and a black computer desk.
Aesthetic Score : 0.6
Mood : intense, focused, competitive
Quality
Entropy : 6.48
Noise : 97
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.40
Image errors : The image has some minor artifacts, particularly in the hair and clothing of the subjects, which might indicate some degree of image processing.
Charming City Stroll: A Moment of Joy in a European City
A young woman radiates happiness as she stands amidst the vibrant colors and charming architecture of a European city street. The bustling market in the background adds to the lively atmosphere, creating a scene that is both relaxed and captivating.
Prompt
poses three-quarter-pose: happy, joyful, memorable ; standing in front of a famous landmark, smiling for a photo; three-quarter pose; Tourism; a vibrant city square with colorful buildings and street performers; cinematic
Characteristic
Shot : A young woman with long brown hair is standing in a European city square in front of colorful old buildings.
Aesthetic Score : 0.6
Mood : happy, casual, urban
Quality
Entropy : 6.59
Noise : 98
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some blurriness and graininess. There is some noise in the image.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.3, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.58, which is considered good. This indicates that the model was able to understand the scene and create a shot that was relatively close to what was described in the prompt.
- Aesthetic Analysis: The model scored 0.33, which is considered below average. This means that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall, the model seems to be better at understanding the scene and shot composition than it is at capturing the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/