AI's Artistic Journey: Capturing Poses, But Missing the Shot with Imagen-v3
- 9 minutes read - 1768 wordsTable of Contents
In the realm of artificial intelligence, image generation has made significant strides. One fascinating area of exploration is the ability of AI models to understand and recreate poses within specific scenes. This blog post delves into the results of an experiment where an AI model was tasked with generating images based on descriptions of poses and scenes. While the model demonstrates impressive ability to capture the desired aesthetic, it falls short in accurately representing camera angles and shot types. We delve into the reasons behind this discrepancy and discuss the potential for future advancements in AI image generation.
Created with: imagen-v3
Contemplating the Vastness: A Hiker Finds Tranquility on a Mountaintop
A lone hiker sits on the edge of a cliff, dwarfed by the majestic mountains bathed in the golden light of the setting sun. The scene evokes a sense of tranquility, contemplation, and adventure, capturing the awe-inspiring beauty of nature.
Prompt
poses crossed-legs: determined, contemplative ; A lone adventurer, sitting on a cliff edge; wide shot; Adventure; a vast, breathtaking mountain range; cinematic
Characteristic
Shot : A lone hiker sits on the edge of a cliff, overlooking a vast mountainous landscape. The sky is clear and the mountains are bathed in the soft light of the setting sun.
Aesthetic Score : 0.7
Mood : tranquil, contemplative, adventurous
Quality
Entropy : 6.90
Noise : 97
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors
A Lone Warrior Stands Amidst the Ruins of War
A powerful image captures the aftermath of a brutal battle. A lone warrior, clad in armor, stands amidst the fallen, his sword held high. Smoke and fire fill the background, while a distant city serves as a reminder of the world beyond the carnage. This epic scene evokes a sense of awe, wonder, and profound loss.
Prompt
poses crossed-legs: triumphant, confident ; A victorious warrior, standing tall on a battlefield; medium shot; Heroism; fallen enemies and a burning city in the background; cinematic
Characteristic
Shot : A lone warrior stands amidst a battlefield littered with the bodies of fallen soldiers. He is in full armor, with a helmet and sword. The background is filled with smoke and fire, and a city can be seen in the distance.
Aesthetic Score : 0.7
Mood : epic, dramatic, grim
Quality
Entropy : 6.49
Noise : 100
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some minor errors in the lighting and composition. The lighting is a bit uneven, and the composition is a bit static.
The Gamer’s Focus: A Moment of Intense Concentration
A low-angle shot captures a young man immersed in his gaming world. The dim lighting and his intense gaze create a sense of mystery and suspense, highlighting the seriousness and focus of his gaming session.
Prompt
poses crossed-legs: intense, focused ; A gamer, intensely focused on a screen; close-up; Gaming; a dimly lit room with glowing monitors and gaming peripherals; cinematic
Characteristic
Shot : A young man is sitting in a gaming chair, wearing headphones, focused on the screen in front of him.
Aesthetic Score : 0.6
Mood : intense, focused, serious
Quality
Entropy : 6.48
Noise : 76
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors.
Friends Embrace the City Skyline in a Moment of Joy and Wonder
Five friends, dressed in vibrant colors, share a moment of laughter and camaraderie on a ledge overlooking a sprawling cityscape. The vastness of the city creates a sense of awe and adventure, while the bright colors and excited expressions capture the joy and vibrancy of their friendship.
Prompt
poses crossed-legs: excited, awe-struck ; A group of tourists, admiring a breathtaking view; medium shot; Tourism; a panoramic vista of a bustling city skyline; cinematic
Characteristic
Shot : Five friends sit in a row on a ledge overlooking a vast cityscape, with a beautiful blue sky and fluffy clouds.
Aesthetic Score : 0.7
Mood : joyful, vibrant, adventurous
Quality
Entropy : 6.96
Noise : 99
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no noticeable artifacts or errors in the image.
Finding Peace in Motion: A Moment of Tranquility on the Train
A pair of feet rest on the windowsill of a train, taking in the serene countryside scenery. The image evokes a sense of relaxed contemplation, capturing the essence of a journey and the peace found in the simple act of observation.
Prompt
poses crossed-legs: reflective, nostalgic ; A traveler, gazing out of a train window; close-up; Travel; a blur of passing landscapes and towns; cinematic
Characteristic
Shot : A person’s feet are resting on the windowsill of a train, with the view of a countryside through the window in the background.
Aesthetic Score : 0.7
Mood : relaxed, contemplative, journey
Quality
Entropy : 5.87
Noise : 70
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : None, the image is clear and well-exposed.
Campfire Camaraderie: Friends Gather Under the Stars
A group of friends share laughter and stories around a crackling campfire, the warm glow illuminating their faces and creating a joyful atmosphere. The scene evokes a sense of friendship, warmth, and shared experiences under the night sky.
Prompt
poses crossed-legs: joyful, relaxed ; A group of friends, laughing and sharing stories around a campfire; medium shot; Groups; a serene forest setting with twinkling stars above; cinematic
Characteristic
Shot : A group of friends are gathered around a campfire in a forest at night. They are laughing and talking, and the firelight illuminates their faces.
Aesthetic Score : 0.8
Mood : joyful, friendly, warm
Quality
Entropy : 5.77
Noise : 92
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors.
A Moment of Reflection: Astronaut Gazes at Earth from Space
A lone astronaut, silhouetted against the vibrant glow of Earth, sits in the window of a spaceship. The image evokes a sense of reflection, awe, and solitude, highlighting the vastness of space and the fragility of our planet.
Prompt
poses crossed-legs: awe-inspired, contemplative ; A lone astronaut, gazing at Earth from a spaceship window; close-up; Heroism; a vast, blue planet against the backdrop of space; cinematic
Characteristic
Shot : A lone astronaut sits in the window of a spaceship gazing at the earth.
Aesthetic Score : 0.6
Mood : reflective, awe, solitude
Quality
Entropy : 5.14
Noise : 68
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.80
Image errors : The astronaut’s face is blurry and the earth looks slightly artificial.
Shadows and Secrets: A Tense Encounter in the Cave Depths
Five explorers huddle in the flickering torchlight of a dimly lit cave, their faces etched with apprehension. The atmosphere crackles with suspense as they navigate the unknown, their leader’s gaze drawing the viewer into the heart of the mystery.
Prompt
poses crossed-legs: suspenseful, cautious ; A group of explorers, huddled together in a dark cave; medium shot; Adventure; flickering torches illuminating the rough stone walls; cinematic
Characteristic
Shot : A group of five people are sitting in a dimly lit cave, illuminated by torches. They appear to be explorers or adventurers. The scene is tense, and the characters appear to be on edge.
Aesthetic Score : 0.7
Mood : suspenseful, mysterious, adventurous
Quality
Entropy : 6.04
Noise : 84
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed, and there is some noise in the shadows.
Confetti Shower for a Champion Gamer!
A young man, radiating joy, celebrates his victory in a gaming chair surrounded by confetti. The vibrant lighting and his triumphant pose capture the excitement of the moment.
Prompt
poses crossed-legs: exuberant, joyful ; A gamer, celebrating a victory with a triumphant fist pump; close-up; Gaming; a brightly lit room with a celebratory confetti explosion; cinematic
Characteristic
Shot : A young man wearing a blue shirt and black pants sits in a gaming chair with his arms raised in victory. He is surrounded by confetti.
Aesthetic Score : 0.7
Mood : joyful, excited, celebratory
Quality
Entropy : 6.77
Noise : 84
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable image errors. The confetti may be slightly overexposed.
Street Food Feast: A Moment of Casual Delight
Three friends gather around a brightly lit street food stall, enjoying their meals in a casual and friendly atmosphere. The play of light and shadow adds a touch of drama to the scene, capturing the vibrant energy of the moment.
Prompt
poses crossed-legs: lively, adventurous ; A group of travelers, sharing a meal at a bustling street market; medium shot; Travel; vibrant colors and aromas of exotic food stalls; cinematic
Characteristic
Shot : Three people are sitting at a street food stall, eating their meals. The stall is lit by bright lights and there are other stalls in the background.
Aesthetic Score : 0.6
Mood : casual, friendly, hungry
Quality
Entropy : 6.73
Noise : 109
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is some graininess in the image, especially in the darker areas.
Conclusion
The results show that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.37, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.525, which is considered average. This indicates that the model was able to understand the scene and shot type described in the prompt, but not exceptionally well.
- Aesthetic Analysis: The model scored 0.07, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model seems to be better at understanding the aesthetic style than the camera position and shot composition. This suggests that the model might need further training to improve its ability to accurately interpret and implement camera positions and shot types.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/