AI's Artistic Eye: Capturing Poses, But Missing the Shot with Imagen-v2
- 9 minutes read - 1801 wordsTable of Contents
In the realm of artificial intelligence, generative models are pushing the boundaries of creativity. These models can generate images, text, and even music based on user prompts. One intriguing application is the ability to create images based on specific poses and scene descriptions. This blog post delves into the results of a generative AI model tasked with this challenge, exploring its strengths and weaknesses in capturing the essence of a scene.
Created with: imagen-v2
A Solitary Figure Against the Majestic Peaks
A man stands silhouetted against a dramatic mountain range, bathed in the glow of a cloudy sky. The scene evokes a sense of isolation, contemplation, and adventure, capturing the raw beauty of nature and the human spirit’s yearning for exploration.
Prompt
poses hands-in-pockets: determined, confident ; A lone adventurer, standing on a mountain peak; wide shot; heroism; dramatic sky with clouds; cinematic
Characteristic
Shot : A man standing on a mountain peak, looking out over a vast mountain range. The sky is overcast with clouds.
Aesthetic Score : 0.6
Mood : dramatic, contemplative, adventurous
Quality
Entropy : 6.95
Noise : 82
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are some minor artifacts in the image, particularly in the clouds and mountains.
Lost in the Jungle: A Solitary Explorer Faces the Unknown
A lone adventurer stands before a crumbling stone structure, the path ahead shrouded in the dense jungle. The air is thick with mystery, and the explorer’s isolation amplifies the intrigue surrounding the ruins. This captivating scene evokes a sense of adventure and the unknown, leaving you wondering what secrets lie hidden within the jungle’s depths.
Prompt
poses hands-in-pockets: curious, excited ; A young explorer, gazing at a vast jungle; medium shot; adventure; lush green foliage and ancient ruins; cinematic
Characteristic
Shot : A lone explorer stands in the middle of a path that leads to a decaying stone structure, surrounded by dense foliage in a jungle environment.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, foreboding
Quality
Entropy : 6.84
Noise : 114
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.70
Image errors : Some parts of the image exhibit slight artifacts, particularly in the foliage, which may be due to digital manipulation or the inherent limitations of the image source.
Lost in the Game: A Moment of Intense Focus
A young man, immersed in his video game, exudes an aura of cool intensity. The neon-lit background and his focused expression create a palpable sense of suspense, drawing the viewer into his world of virtual reality.
Prompt
poses hands-in-pockets: focused, intense ; A gamer, sitting at a desk with a controller in hand; close-up; gaming; neon lights and computer screens; cinematic
Characteristic
Shot : A young man wearing headphones and a hoodie is sitting in a chair, holding a video game controller. He is looking intensely at something out of frame. The background is blurred and features neon lighting.
Aesthetic Score : 0.7
Mood : intense, focused, gamer
Quality
Entropy : 6.14
Noise : 68
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some minor artifacts, particularly in the background. There is some over-sharpening, and the colors are a little bit too saturated.
A Smile in the Shadows: A Moment of Melancholy Beauty
A woman stands before a weathered, ornate building, her smile a touch too bright against the overcast sky. The scene evokes a sense of nostalgia and contemplation, with the play of light and shadow adding an air of mystery. The perfect smile feels staged, leaving a lingering question about the true nature of her joy.
Prompt
poses hands-in-pockets: amazed, happy ; A tourist, admiring a famous landmark; medium shot; tourism; bustling city streets and iconic architecture; cinematic
Characteristic
Shot : A woman standing in front of a large, ornate building with a car parked in the foreground. The building appears to be in a state of disrepair, with some of the details missing or damaged. The woman is looking up at the building, and her expression is one of wonder and awe. The overall mood of the image is one of mystery and intrigue, with a hint of melancholy.
Aesthetic Score : 0.6
Mood : mysterious, melancholy, intriguing
Quality
Entropy : 6.76
Noise : 111
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.30
Image errors : Some minor image errors are present in the image, such as minor pixelation and slight color banding. These are most apparent around the edges of the building and in the shadows. The building also appears to have some minor artifacts, likely due to image processing.
A Solitary Journey Through Tranquil Fields
A man, backpack in tow, walks along a dirt road adorned with vibrant red flowers. His gaze is fixed on the rolling green hills in the distance, evoking a sense of tranquility and contemplation. The scene captures the essence of a solitary journey through nature, inviting viewers to imagine their own explorations.
Prompt
poses hands-in-pockets: free, adventurous ; A backpacker, walking along a scenic road; medium shot; travel; rolling hills and vibrant wildflowers; cinematic
Characteristic
Shot : A man with a backpack walks along a dirt path lined with wildflowers. The background is a rolling, green hills with a soft, hazy sky.
Aesthetic Score : 0.7
Mood : tranquil, contemplative, adventurous
Quality
Entropy : 6.73
Noise : 77
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable artifacts or errors.
Golden Hour Camaraderie on the Beach
Five friends bask in the warm glow of sunset on a tranquil beach, their relaxed smiles and the gentle crashing waves creating a sense of peaceful camaraderie.
Prompt
poses hands-in-pockets: relaxed, joyful ; A group, standing on a beach at sunset; wide shot; groups; golden sand and crashing waves; cinematic
Characteristic
Shot : Five young men standing on a beach at sunset.
Aesthetic Score : 0.6
Mood : casual, friendly, nostalgic
Quality
Entropy : 6.83
Noise : 81
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some minor color fringing visible around the edges of the subjects.
Firefighter Stands Tall Against Blazing Inferno
A firefighter, clad in full gear, stands resolutely in front of a raging fire, his dark uniform stark against the flames. The scene evokes a sense of heroism and drama, highlighting the bravery of those who face danger to protect others.
Prompt
poses hands-in-pockets: brave, determined ; A firefighter, standing in front of a building; medium shot; heroism; smoke and flames; cinematic
Characteristic
Shot : A firefighter in full gear stands in front of a blazing fire, with a serious expression on his face. He is standing in a doorway, with a concrete wall to his right and a wall of flames to his left.
Aesthetic Score : 0.7
Mood : serious, determined, heroic
Quality
Entropy : 6.75
Noise : 100
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The fire appears to be somewhat artificial, lacking natural movement and detail.
Lost in the Shadows: A Moment of Adventure in the Cave
Two explorers, their faces illuminated by headlamps, stand in the depths of a mysterious cave. The play of light and shadow creates an atmosphere of intrigue and adventure, hinting at the secrets hidden within the darkness.
Prompt
poses hands-in-pockets: cautious, curious ; explorers, navigating a dark cave; medium shot; adventure; stalactites and stalagmites; cinematic
Characteristic
Shot : Two figures wearing helmets and brown jackets are standing in a cave with a lot of stalactites. The lighting is dim and the focus is on the figures.
Aesthetic Score : 0.6
Mood : mysterious, adventurous, dark
Quality
Entropy : 6.28
Noise : 72
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are slight artifacts in the image around the edges, as if there were some light leaking.
Confetti Rain and Smiles: Capturing the Joy of Celebration
A young man basks in the warmth of the moment, confetti raining down as he beams with happiness. The scene exudes joy, celebration, and a sense of warmth, perfectly capturing the spirit of the occasion.
Prompt
poses hands-in-pockets: excited, triumphant ; celebrating a victory with friends; close-up; gaming; celebratory confetti and flashing lights; cinematic
Characteristic
Shot : A young man with a tattoo on his arm is standing in front of a dark background with glowing lights and what looks like confetti or paper falling around him.
Aesthetic Score : 0.7
Mood : mysterious, confident, hopeful
Quality
Entropy : 6.71
Noise : 114
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed and the background is a bit blurry.
Love Story at the Brandenburg Gate: A Romantic Escape in Berlin
A young couple embraces the history and beauty of Berlin, standing before the iconic Brandenburg Gate. Their casual attire and the clear blue sky create a nostalgic and adventurous mood, capturing the essence of a romantic getaway.
Prompt
poses hands-in-pockets: happy, united ; standing in front of a famous monument; wide shot; tourism; historical landmark and sunny sky; cinematic
Characteristic
Shot : Two people standing in front of the Brandenburg Gate in Berlin, Germany.
Aesthetic Score : 0.6
Mood : cool, urban, adventurous
Quality
Entropy : 6.59
Noise : 90
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some lens flare is visible at the top of the photo. This is a common issue with older cameras and lenses.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is considered below average. This suggests that the model didn’t accurately capture the intended camera positions described in the prompt.
- Shot Analysis: The model scored 0.53, which is considered average. This indicates that the model was able to understand the scene in the prompt to a reasonable degree, but not exceptionally well.
- Aesthetic Analysis: The model scored 0.1, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model seems to be better at understanding the aesthetic style of the prompt than it is at accurately capturing the camera positions and shot composition.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/