AI Captures Poses, But Misses the Mood with Imagen-v3
- 9 minutes read - 1818 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunningly realistic and imaginative visuals. However, these models are not without their limitations. One area where they often struggle is in capturing the desired aesthetic of an image. This is particularly evident when it comes to generating images based on specific poses and scenes. For example, a prompt describing a lone adventurer standing atop a windswept mountain peak might result in an image with the correct pose and camera angle, but fail to convey the sense of drama and solitude intended by the prompt. This discrepancy highlights the challenges of teaching AI to understand and replicate the nuances of human artistic expression.
Created with: imagen-v3
A Moment of Solitude on the Mountaintop
A lone hiker stands on a mountain peak, bathed in dramatic light, gazing out over a breathtaking panorama of mountains and clouds. The scene evokes a sense of serenity, contemplation, and adventure, capturing the awe-inspiring beauty of nature.
Prompt
poses ankle-cross: Determined, confident, facing the unknown ; A lone adventurer, standing atop a windswept mountain peak; wide shot; Adventure; Dramatic sky with swirling clouds; cinematic
Characteristic
Shot : A lone hiker stands on a mountain peak, looking out over a vast expanse of mountains and clouds.
Aesthetic Score : 0.7
Mood : serene, contemplative, adventurous
Quality
Entropy : 6.86
Noise : 97
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable artifacts or errors.
Superman: A Silhouette of Hope Against the Setting Sun
A powerful image captures Superman standing tall on a rooftop, his silhouette outlined against the fiery hues of a setting sun. The cityscape stretches out below, creating a dramatic backdrop that evokes feelings of hope and heroism.
Prompt
poses ankle-cross: Powerful, heroic, standing tall ; A superhero, silhouetted against a blazing sunset; medium shot; Heroism; City skyline with towering buildings; cinematic
Characteristic
Shot : Superman standing on a rooftop, overlooking a city at sunset.
Aesthetic Score : 0.7
Mood : epic, heroic, hopeful
Quality
Entropy : 6.11
Noise : 61
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts in the sky and the city skyline, which may be due to over-processing.
Immersed in the Future: A Gamer’s Neon Oasis
A young man, lost in the digital realm, sits back in his gaming chair, bathed in the cool glow of neon lights. The VR headset, a portal to another world, adds a futuristic edge to this dynamic scene.
Prompt
poses ankle-cross: Immersed, concentrated, in the zone ; A gamer, intensely focused on a virtual reality headset; close-up; Gaming; Futuristic, neon-lit gaming room; cinematic
Characteristic
Shot : A young man wearing a VR headset is sitting in a gaming chair in a dimly lit room with neon lights. The man is wearing black pants and a dark t-shirt.
Aesthetic Score : 0.6
Mood : futuristic, cool, dynamic
Quality
Entropy : 6.28
Noise : 68
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts, including some blurriness around the edges of the man and the VR headset. The background is a bit too plain.
Tranquility Amidst Ancient Wonders
A woman finds peace and contemplation as she gazes upon three ancient stone stupas bathed in the soft light of the setting sun. The scene evokes a sense of serenity and wonder, inviting viewers to share in the moment of quiet reflection.
Prompt
poses ankle-cross: Awe-struck, contemplative, taking in the beauty ; A tourist, gazing out at a breathtaking vista; medium shot; Tourism; Ancient ruins with a panoramic view; cinematic
Characteristic
Shot : A woman is sitting on a stone ledge with her back to the camera, gazing out at a panoramic view of three ancient stone stupas. The sky is a soft, hazy blue with hints of pink and orange from the setting sun.
Aesthetic Score : 0.7
Mood : tranquil, serene, contemplative
Quality
Entropy : 6.83
Noise : 88
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors in the image
Lost in the Red Sands: A Hiker’s Journey Through Solitude
A lone figure traverses a breathtaking desert landscape, the red dunes stretching endlessly under a clear blue sky. This serene scene evokes a sense of adventure and contemplation, highlighting the vastness and isolation of the desert.
Prompt
poses ankle-cross: Free-spirited, adventurous, embracing the unknown ; A backpacker, standing at the edge of a vast desert; wide shot; Travel; Endless sand dunes stretching into the horizon; cinematic
Characteristic
Shot : A lone hiker walks across a vast desert landscape with red sand dunes. The sky is blue with white clouds.
Aesthetic Score : 0.8
Mood : serene, adventurous, contemplative
Quality
Entropy : 6.58
Noise : 90
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : no visible artifacts or errors
Nighttime Fun with Friends: Capturing Joy in Every Smile
This image radiates pure joy! Four friends, lit by festive lights, walk down a bustling street, their smiles and playful energy captured in a moment of pure happiness. The blurred background adds a sense of movement and depth, making the scene feel alive and vibrant.
Prompt
poses ankle-cross: Joyful, carefree, enjoying each other’s company ; A group of friends, laughing and celebrating; medium shot; Groups; Vibrant, bustling street scene with colorful lights; cinematic
Characteristic
Shot : A group of four friends are walking down a street at night, all looking at the camera and smiling. There are festive lights in the background and a blurry background of people.
Aesthetic Score : 0.7
Mood : fun, joyful, playful
Quality
Entropy : 6.59
Noise : 105
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : Minor noise is visible in some areas, especially in the background.
A Knight’s Stand: Epic Silhouette in a Mysterious Setting
A lone knight, silhouetted against a distant castle, stands in a doorway, sword and shield at the ready. The dramatic lighting and balanced composition create a sense of epic mystery and anticipation.
Prompt
poses ankle-cross: Stoic, vigilant, protecting the realm ; A lone warrior, standing guard at a castle gate; medium shot; Heroism; Majestic castle with a moat and drawbridge; cinematic
Characteristic
Shot : A lone knight in full armor stands in a doorway, facing the viewer. Behind him is a castle in the distance. The knight is holding a sword and a shield, and he is in a defensive stance.
Aesthetic Score : 0.6
Mood : epic, dramatic, mysterious
Quality
Entropy : 6.26
Noise : 105
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some artifacts, particularly around the edges of the knight’s armor and the castle. There is also a slight blurriness to the image, which may be due to post-processing. The image looks overly processed.
Footprints in the Dark: A Cozy Mystery in the Woods
Two pairs of feet, clad in brown boots, cross in the foreground, hinting at a shared moment of warmth and intrigue. The background, blurred and shrouded in mystery, reveals a flickering fire and the silhouette of a figure, adding to the sense of a hidden story unfolding in the heart of the forest.
Prompt
poses ankle-cross: Intrigued, curious, sharing stories ; A group of explorers, huddled around a campfire; close-up; Adventure; Dense forest with flickering flames; cinematic
Characteristic
Shot : Two pairs of feet in brown boots are crossed, in the foreground, with a person out of focus in the background. The scene is likely set in a forest with a fire in the background.
Aesthetic Score : 0.4
Mood : dark, cozy, mysterious
Quality
Entropy : 5.50
Noise : 79
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly underexposed and the focus is a little bit soft.
The Thrill of Victory: Gamer’s Excitement Captured in a Single Shot
This photo embodies the pure joy and energy of gaming. The young man’s raised fist and outstretched leg speak volumes about his excitement, while his focused gaze reveals the intensity of the moment. The playful mood is palpable, making this a perfect snapshot of the passion that drives gamers.
Prompt
poses ankle-cross: Excited, victorious, celebrating success ; A gamer, triumphantly raising their hands after winning a game; close-up; Gaming; Brightly lit gaming console with flashing lights; cinematic
Characteristic
Shot : A young man, dressed in a gaming jersey, sits at a desk in front of a computer, reacting excitedly to something on the screen. His leg is extended and his shoe is pointed towards the camera, one hand is raised in the air.
Aesthetic Score : 0.5
Mood : excitement, energetic, playful
Quality
Entropy : 6.20
Noise : 74
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some minor artifacts and compression artifacts are noticeable. Especially on the dark background and the man’s face.
Silhouettes of Love Against the City Lights
A couple stands on a bridge, their silhouettes framed against the twinkling cityscape. The soft focus background adds a touch of intimacy and mystery to this romantic scene.
Prompt
poses ankle-cross: Intimate, romantic, enjoying the view together ; A couple, standing on a balcony overlooking a bustling city; medium shot; Travel; Romantic cityscape with twinkling lights; cinematic
Characteristic
Shot : A couple standing on a bridge or elevated walkway at night, looking out at the city lights. The background is out of focus.
Aesthetic Score : 0.6
Mood : romantic, urban, intimate
Quality
Entropy : 6.32
Noise : 102
Prompt Clip Score : 0.36
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no obvious errors in the image.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.5, which is considered good. This means the generated image’s camera position closely matched the prompt’s instructions.
- Shot Analysis: The model scored 0.55, also considered good. This indicates the generated image’s shot composition was fairly aligned with the prompt’s description.
- Aesthetic Analysis: The model scored 0.17, which is below average. This suggests the generated image’s aesthetic style deviated from the expected aesthetic described in the prompt.
Overall, the model demonstrated a good understanding of camera positioning and shot composition, but struggled to capture the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/