AI Captures the Essence of Poses, But Struggles with Camera Placement with Imagen-v2
- 9 minutes read - 1746 wordsTable of Contents
In the realm of artificial intelligence, generative models are pushing the boundaries of creativity. These models can generate images, text, and even music based on textual prompts. One intriguing area of exploration is the ability of these models to understand and capture poses within images. This blog post delves into the performance of a generative AI model in creating images based on prompts that describe poses and aesthetics. We analyze the model’s strengths and weaknesses, highlighting its ability to capture the essence of poses while facing challenges in accurately capturing camera positions.
Created with: imagen-v2
A Solitary Figure Contemplates the Vastness of Nature
A single figure stands on a mountain peak, dwarfed by the towering landscape and endless sky. The scene evokes a sense of serenity and contemplation, while the dramatic contrast between the figure and their surroundings highlights the power and beauty of nature.
Prompt
poses thoughtful-pose: determined, contemplative ; Lone figure standing on a mountain peak; wide shot; heroism; dramatic sky with clouds; cinematic
Characteristic
Shot : A lone figure stands on a mountain peak, looking out over a vast, hazy landscape. The sky is filled with dramatic, swirling clouds.
Aesthetic Score : 0.8
Mood : dramatic, contemplative, solitary
Quality
Entropy : 6.72
Noise : 80
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed and the colors are a bit washed out.
Lost in the Jungle: A Moment of Contemplation
A lone explorer, shrouded in the mystery of the jungle, ponders his next move. The dramatic lighting and his thoughtful pose create a sense of intrigue and adventure. What secrets lie ahead?
Prompt
poses thoughtful-pose: curious, adventurous ; Explorer looking at a map, surrounded by ancient ruins; medium shot; adventure; jungle foliage; cinematic
Characteristic
Shot : A man in an explorer’s outfit sits in a lush jungle, seemingly contemplating his surroundings. The image has a sense of mystery and adventure.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, contemplative
Quality
Entropy : 6.70
Noise : 88
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.70
Image errors : There are some minor artifacts present in the image, particularly around the subject’s clothing and the jungle foliage. These are most noticeable in the shadows and highlights.
The Focus Is On
A young man, lost in the game, sits in his gaming chair bathed in colorful lights. His focused expression and the dramatic lighting create a sense of intensity and anticipation.
Prompt
poses thoughtful-pose: intense, focused ; Gamer intensely focused on a screen, hands on a controller; close-up; gaming; neon lights and gaming peripherals; cinematic
Characteristic
Shot : A young man wearing headphones, possibly a gamer, is sitting in a dark room with colorful lights, he is looking to the side with a thoughtful expression
Aesthetic Score : 0.6
Mood : intense, focused, moody
Quality
Entropy : 6.37
Noise : 60
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts, especially on the hair and headphones. There’s also a slight blur on the background which looks a bit unnatural.
Silhouetted Against the Sunset, a Moment of Contemplation
A man sits on a ledge, his chin resting on his hand, gazing out at the city skyline as the sun sets. The warm glow of the setting sun casts long shadows, highlighting his thoughtful expression. This image captures a moment of quiet contemplation, bathed in the serene beauty of the evening light.
Prompt
poses thoughtful-pose: awe-struck, contemplative ; Tourist gazing at a breathtaking cityscape; medium shot; tourism; bustling city streets; cinematic
Characteristic
Shot : A young man is leaning on a railing, gazing out over the cityscape of New York City. The Empire State Building is visible in the background.
Aesthetic Score : 0.7
Mood : pensive, reflective, contemplative
Quality
Entropy : 6.66
Noise : 91
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no obvious artifacts or errors in the image.
Silhouettes of Love at Sunset
A young couple embraces the romantic serenity of a sunset on a rocky cliff, their silhouettes painted against the soft pink and orange sky. The tranquil scene evokes a sense of peace and connection, capturing the beauty of love amidst nature’s breathtaking canvas.
Prompt
poses thoughtful-pose: relaxed, introspective ; Backpackers sitting on a cliff overlooking a vast ocean; wide shot; travel; sunset sky; cinematic
Characteristic
Shot : A couple sits on a cliff overlooking the ocean at sunset, looking out at the horizon. They are wearing backpacks, suggesting a journey.
Aesthetic Score : 0.7
Mood : serene, contemplative, romantic
Quality
Entropy : 6.71
Noise : 92
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : No obvious artifacts or errors
Campfire Tales Under the Milky Way
A group of friends huddle around a crackling campfire, their faces illuminated by the warm glow, as they share stories under a breathtaking starry sky. The Milky Way stretches across the night, adding a touch of adventure to this cozy and intimate scene.
Prompt
poses thoughtful-pose: intimate, nostalgic ; Group of friends huddled around a campfire, sharing stories; medium shot; groups; starry night sky; cinematic
Characteristic
Shot : A group of friends gather around a campfire under a starry sky, silhouetted against the night sky. The Milky Way is visible overhead.
Aesthetic Score : 0.7
Mood : nostalgic, cozy, adventurous
Quality
Entropy : 6.09
Noise : 122
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some minor artifacts, such as noise in the sky.
Lost in the City Lights: A Moment of Melancholy
A woman in a leather jacket sits pensively, her gaze lost in the distant city lights. The blurred background creates a dreamy atmosphere, amplifying the feeling of loneliness and longing. This image captures a moment of introspective melancholy, a reflection of urban life’s complexities.
Prompt
poses thoughtful-pose: reflective, hopeful ; A lone figure standing on a bridge, looking out at the city lights; medium shot; heroism; cityscape at night; cinematic
Characteristic
Shot : A woman in a leather jacket is looking out over a city skyline at night. The city lights are blurred in the background.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, pensive
Quality
Entropy : 6.25
Noise : 99
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a few minor image artifacts, such as some slight blurring around the edges of the woman’s face.
Lost in the Mist: A Journey into the Unknown
Three young men venture deep into a dense, misty forest, their expressions hinting at a sense of mystery and anticipation. The dim light and the surrounding greenery create an atmosphere of suspense, leaving viewers wondering what awaits them in the shadows.
Prompt
poses thoughtful-pose: determined, cautious ; A group of adventurers navigating a dense forest; wide shot; adventure; lush green foliage; cinematic
Characteristic
Shot : Three men walking through a lush, overgrown forest. The air is thick with mist, creating an eerie and atmospheric setting.
Aesthetic Score : 0.6
Mood : mysterious, adventurous, suspenseful
Quality
Entropy : 6.70
Noise : 116
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry, especially in the background. There is also some noise in the image, particularly in the shadows.
Fueled by Passion: A Gamer’s Intensity in Blue and Orange
A close-up portrait captures the raw energy of a young gamer, his fist clenched, eyes focused, bathed in dramatic blue and orange lighting. The scene evokes a sense of intense focus and determination, highlighting the passion that drives him.
Prompt
poses thoughtful-pose: triumphant, excited ; A gamer celebrating a victory, fist raised in the air; close-up; gaming; vibrant gaming setup; cinematic
Characteristic
Shot : A man wearing headphones and a purple hoodie is looking off to the side with a determined expression. He is lit by blue and orange lights, suggesting a gaming or technology-focused setting.
Aesthetic Score : 0.7
Mood : intense, focused, determined
Quality
Entropy : 6.44
Noise : 51
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image appears slightly oversharpened, especially around the edges of the subject’s face and hair.
Silhouettes of Love at Sunset
A couple stands hand-in-hand on a sun-drenched beach, their figures silhouetted against the fiery sunset. The scene evokes a sense of romance, serenity, and hope, capturing the beauty of a shared moment under the golden sky.
Prompt
poses thoughtful-pose: peaceful, hopeful ; A family standing on a beach, watching the sunrise; wide shot; tourism; golden sunrise over the ocean; cinematic
Characteristic
Shot : A couple standing on a beach at sunset, the man has his arm around the woman.
Aesthetic Score : 0.7
Mood : romantic, peaceful, nostalgic
Quality
Entropy : 6.16
Noise : 94
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight blurring on the subject’s faces and the background, there’s a strong orange cast that might be due to an overactive filter or saturation, possible overexposure causing highlights to be blown out
Conclusion
The results show that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic aspect of the image. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.525, which is considered average. This indicates that the model was able to understand the scene and shot composition in the prompt to a reasonable degree.
- Aesthetic Analysis: The model scored 0.02, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model demonstrated a good understanding of the scene and shot composition, but struggled with accurately capturing the intended camera position. The generated image also closely matched the desired aesthetic style.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/