AI's Artistic Struggle: Capturing the Essence of Poses with Imagen-v3-fast
- 9 minutes read - 1871 wordsTable of Contents
Dramatic poses are a powerful tool in visual storytelling, conveying emotions, actions, and character traits. From the heroic stance of a knight to the contemplative gaze of a traveler, poses can add depth and meaning to any image. In this blog post, we explore the challenges of using AI to generate images with specific poses, analyzing the results of an experiment where an AI model was tasked with creating images based on detailed scene descriptions. We’ll delve into the model’s strengths and weaknesses in capturing camera position, shot analysis, and aesthetic appeal, shedding light on the evolving capabilities of AI in the realm of artistic expression.
Created with: imagen-v3-fast
Knights of the Storm
Three knights in black armor stand defiant against a stormy sky, their presence both powerful and mysterious. The light breaking through the clouds adds a dramatic touch to this scene of impending conflict.
Prompt
poses three-quarter-pose: determined, resolute, heroic ; A lone knight, standing tall on a windswept hilltop; three-quarter pose; Heroism; a vast, stormy landscape with a distant castle in the background; cinematic
Characteristic
Shot : Three knights in full armor stand on a rocky outcrop, facing the viewer. The sky is dark and stormy, with a hint of light breaking through the clouds. The landscape is a rolling, misty, green and brown hills. The knights are all wearing black armor and cloaks.
Aesthetic Score : 0.7
Mood : mysterious, dramatic, powerful
Quality
Entropy : 6.87
Noise : 73
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.90
Image errors : No obvious errors.
Lost in the Jungle’s Embrace: A Silhouette of Adventure
A lone explorer stands on a rock, silhouetted against the setting sun, amidst a lush jungle. Ancient ruins peek through the mist, hinting at a forgotten past. This evocative scene captures the spirit of adventure, mystery, and nostalgia.
Prompt
poses three-quarter-pose: adventurous, curious, hopeful ; An intrepid explorer, silhouetted against the setting sun, holding a map; three-quarter pose; Adventure; a dense jungle with ancient ruins in the distance; cinematic
Characteristic
Shot : A lone explorer stands on a rock, facing towards the setting sun, in the middle of a jungle. Ruins of ancient temples are visible in the background, shrouded in mist. The scene evokes a sense of mystery, adventure, and exploration.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, nostalgic
Quality
Entropy : 6.46
Noise : 79
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image seems to have some slight aliasing artifacts in the background vegetation, but overall the image is of good quality.
The Focus Is On
A young man, lost in the digital world, sits in a gaming chair, headphones on, eyes fixed on the screen. The dramatic lighting and his intense expression capture the focused energy of a gamer in the zone.
Prompt
poses three-quarter-pose: focused, intense, exhilarated ; A gamer, eyes glued to the screen, fingers flying across the keyboard; three-quarter pose; Gaming; a brightly lit gaming room with neon lights and a futuristic cityscape projected on the wall; cinematic
Characteristic
Shot : A young man is sitting in a gaming chair, wearing headphones and typing on a keyboard. There is a large monitor behind him.
Aesthetic Score : 0.6
Mood : focused, intense, serious
Quality
Entropy : 6.55
Noise : 53
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slight amount of noise, particularly in the background.
Capturing Parisian Magic: A Moment of Surprise
A woman in a brown coat stands on a Parisian street, the Eiffel Tower a majestic backdrop. Her surprised expression, captured by her own camera, hints at a moment of unexpected joy and wonder. The scene evokes a sense of romance and Parisian charm.
Prompt
poses three-quarter-pose: amazed, joyful, curious ; A tourist, gazing in awe at the Eiffel Tower, camera in hand; three-quarter pose; Tourism; a bustling Parisian street with cafes and shops lining the sidewalk; cinematic
Characteristic
Shot : A woman in a brown coat is standing in a Parisian street with the Eiffel Tower in the background. She is holding a camera up to her face, and looks surprised.
Aesthetic Score : 0.7
Mood : surprised, romantic, Parisian
Quality
Entropy : 6.64
Noise : 56
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.30
Image errors : No major errors
Conquering the Peak: A Moment of Triumph and Serenity
A lone hiker stands triumphantly on a mountain summit, arms outstretched, embracing the breathtaking panorama of snow-capped peaks and a sprawling valley below. The scene evokes a sense of awe, adventure, and inspiration, capturing the essence of human achievement against the backdrop of nature’s grandeur.
Prompt
poses three-quarter-pose: free, exhilarated, adventurous ; A backpacker, standing on a mountain peak, arms outstretched, enjoying the view; three-quarter pose; Travel; a breathtaking panorama of snow-capped mountains and valleys; cinematic
Characteristic
Shot : A lone hiker stands on a mountain peak, arms outstretched, overlooking a valley with snow-capped mountains in the distance.
Aesthetic Score : 0.8
Mood : serene, adventurous, inspirational
Quality
Entropy : 6.77
Noise : 66
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors in the image.
Campfire Companionship: A Night Under the Stars
Four friends gather around a crackling campfire, sharing stories and laughter under the starry sky. The warm glow of the fire creates a cozy and inviting atmosphere, making this a perfect scene for a night of friendship and adventure.
Prompt
poses three-quarter-pose: happy, relaxed, connected ; A group of friends, laughing and sharing stories around a campfire; three-quarter pose; Groups; a serene forest clearing with stars twinkling in the night sky; cinematic
Characteristic
Shot : A group of four friends are sitting around a campfire in a forest at night. There are two tents set up in the background.
Aesthetic Score : 0.7
Mood : cozy, warm, friendly
Quality
Entropy : 6.29
Noise : 75
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight blur around the edges and some noise.
Superman Triumphant: Justice Prevails in the City’s Shadows
A dramatic scene unfolds in a smoke-filled alleyway as Superman stands victorious over a defeated villain. The hero’s imposing figure and the city backdrop create a powerful image of justice and strength.
Prompt
poses three-quarter-pose: powerful, victorious, confident ; A superhero, standing triumphantly over a defeated villain; three-quarter pose; Heroism; a cityscape with smoke and debris in the background; cinematic
Characteristic
Shot : Superman stands in a city alleyway with a defeated villain lying at his feet, smoke and debris scattered on the ground, city buildings are seen on both sides of the alley
Aesthetic Score : 0.7
Mood : heroic, dramatic, powerful
Quality
Entropy : 6.59
Noise : 66
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some artifacts and blurring are visible, especially around the edges of the image. The shadows and highlights seem a bit artificial, particularly on Superman’s cape.
Awe-Inspiring Mountain Trek: A Journey of Serenity and Adventure
Two figures traverse a winding mountain path, their journey framed by a breathtaking panorama. The valley below unfolds, a tapestry of emerald green and silver river, cradled by snow-capped peaks. The sun bathes the scene in golden light, evoking a sense of serenity and inspiring adventure.
Prompt
poses three-quarter-pose: determined, focused, adventurous ; A group of adventurers, navigating a treacherous mountain path; three-quarter pose; Adventure; a rugged mountain range with snow-covered peaks and a deep valley below; cinematic
Characteristic
Shot : Two figures walk on a mountain path, overlooking a valley with a river flowing down the center. The valley is surrounded by snow-capped mountains. The sun is shining and the sky is clear.
Aesthetic Score : 0.8
Mood : serene, inspiring, adventurous
Quality
Entropy : 6.87
Noise : 90
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 1.00
Image errors : None, the image is well-rendered.
The Blue Circle Beckons: Four Gamers Locked in Intense Competition
The glow of the blue circle illuminates the faces of four young men, their hands moving with lightning speed as they battle in a virtual world. The intensity of their focus and the competitive spirit in the air are palpable, creating a scene of pure gaming excitement.
Prompt
poses three-quarter-pose: focused, competitive, excited ; A group of gamers, huddled around a table, strategizing their next move; three-quarter pose; Gaming; a dimly lit room with flickering computer screens and a stack of pizza boxes; cinematic
Characteristic
Shot : Four young men wearing headsets are sitting at a desk, playing a video game. There is a blue glowing circle in the background with a symbol inside.
Aesthetic Score : 0.6
Mood : intense, focused, competitive
Quality
Entropy : 6.57
Noise : 61
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly blurry in some areas. Some of the shadows are not natural.
Golden Hour Joy: Two Friends Leap for Laughter at Sunset
Capture the spirit of carefree joy as two men leap in front of a majestic cathedral bathed in golden sunset light. The dramatic contrast of light and shadow, coupled with their playful poses, creates a vibrant and heartwarming image.
Prompt
poses three-quarter-pose: Exuberant, carefree, adventurous ; A lone figure stands before a grand cathedral, bathed in golden sunlight, a wide grin illuminating their face as they pose for a photo.; cinematic
Characteristic
Shot : Two men are jumping in front of a large cathedral with a golden hue. The image is taken at sunset, with the light reflecting off the building.
Aesthetic Score : 0.6
Mood : happy, playful, joyful
Quality
Entropy : 6.75
Noise : 100
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are some minor artifacts in the image, such as the slight blurring of the edges of the subjects.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.3, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.545, which is considered good. This indicates that the model was able to understand the scene and create a shot that was relatively close to what was described in the prompt.
- Aesthetic Analysis: The model scored 0.31, which is considered below average. This means that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall, the model demonstrated a good understanding of the scene and shot composition, but struggled to accurately capture the intended camera position and aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/