AI's Artistic Struggle: Capturing the Essence of Poses with Imagen-v2
- 9 minutes read - 1828 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on textual prompts has become increasingly sophisticated. However, capturing the nuances of artistic style, particularly in the realm of poses, remains a challenge. This blog post delves into the results of an experiment where an AI model was tasked with generating images based on specific poses and scenes, revealing both its strengths and weaknesses in capturing the desired aesthetic.
Created with: imagen-v2
A Solitary Journey Through Majestic Mountains
A lone hiker traverses a winding path through a breathtaking mountain valley, the river below mirroring the serenity of the scene. Dramatic light and shadow play enhance the vastness of the landscape and the hiker’s sense of solitude, evoking a mood of adventure and contemplation.
Prompt
poses interactive-pose: Determined, hopeful, adventurous ; A lone adventurer; wide shot; Adventure; Majestic mountain range with a winding path leading to a hidden valley; cinematic
Characteristic
Shot : A lone hiker walks on a winding dirt path through a mountainous valley, with a river snaking through the valley floor.
Aesthetic Score : 0.8
Mood : serene, adventurous, hopeful
Quality
Entropy : 6.48
Noise : 101
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are minor sharpening artifacts noticeable on the rocks and edges, a slightly unnatural color saturation in the valley, and a slight blur in the distant mountains.
Cozy Night In: Couple’s Playful Video Game Session
A dimly lit room sets the scene for a cozy and intimate evening. Two people, likely a couple, are engrossed in a video game, their faces illuminated by the screen’s glow. The neon sign in the background adds a touch of urban coolness, creating a sense of mystery and playful energy.
Prompt
poses interactive-pose: Excited, focused, competitive ; A group of friends; medium shot; Gaming; A dimly lit room with a large screen displaying a video game, surrounded by controllers and snacks; cinematic
Characteristic
Shot : Two young people are sitting on a couch, playing video games. There are snacks on the coffee table in front of them.
Aesthetic Score : 0.7
Mood : relaxed, playful, casual
Quality
Entropy : 6.41
Noise : 76
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The lighting is a bit too dark and the colors are a bit washed out. The image is also a bit blurry.
Superman at Sunset: A Hero’s Silhouette Against the City
A dramatic image capturing Superman’s power and heroism as he stands against a breathtaking sunset cityscape. The lighting and pose create a sense of awe and inspire a feeling of hope and strength.
Prompt
poses interactive-pose: Confident, powerful, heroic ; A superhero; close-up; Heroism; A cityscape with towering buildings and a dramatic sunset in the background; cinematic
Characteristic
Shot : A close-up shot of Superman standing on a rooftop, looking into the distance, with a cityscape in the background
Aesthetic Score : 0.7
Mood : heroic, powerful, determined
Quality
Entropy : 6.51
Noise : 55
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some noticeable artifacts in the background, particularly in the sky and the cityscape. The lighting is also somewhat unnatural.
A Whimsical Market Under a Mysterious Glow
Step into a bustling fantasy market, where vibrant life and strange wares fill the streets. The lighting casts long shadows, adding a touch of mystery to the scene. This is a place where the ordinary meets the extraordinary, and the air is thick with wonder.
Prompt
poses interactive-pose: Happy, joyful, curious ; family; medium shot; Tourism; A bustling marketplace with colorful stalls and vibrant street performers; cinematic
Characteristic
Shot : A bustling marketplace in a fantasy setting with three people walking through the narrow street.
Aesthetic Score : 0.6
Mood : magical, whimsical, nostalgic
Quality
Entropy : 6.76
Noise : 102
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some blurring and artifacts, especially in the background and details. There are some inconsistencies in the colors and lighting.
A Moment of Reflection on the Road Ahead
A woman, lost in thought, stands in a grassy field, her backpack a symbol of adventure. The winding road ahead promises new experiences, while the serene sky and gentle clouds create a sense of peace and tranquility. Her profile view and the mysterious path ahead invite contemplation and intrigue.
Prompt
poses interactive-pose: Free, adventurous, contemplative ; A traveler; close-up; Travel; A scenic landscape with rolling hills, a clear blue sky, and a winding road leading to the horizon; cinematic
Characteristic
Shot : A woman with a backpack is standing on a hill and looking at a winding road ahead. The sky is blue and there are clouds in the distance.
Aesthetic Score : 0.7
Mood : serene, contemplative, hopeful
Quality
Entropy : 6.86
Noise : 52
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some blurring, particularly around the woman’s face. The overall image has a slightly artificial look.
Glowing Circle, Powerful Poses: A Dance of Mystery
A group of dancers command the stage in a dark setting, illuminated by a vibrant orange circle. Their dramatic poses and the mysterious glow create a captivating and powerful atmosphere.
Prompt
poses interactive-pose: Energetic, expressive, joyful ; A group of dancers; wide shot; Groups; A brightly lit stage with a vibrant backdrop, showcasing a performance; cinematic
Characteristic
Shot : A group of dancers are performing on a stage lit with a large, orange spotlight. The dancers are silhouetted against the bright light, creating a dramatic and ethereal effect.
Aesthetic Score : 0.7
Mood : dramatic, ethereal, mysterious
Quality
Entropy : 6.77
Noise : 77
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some minor artifacts in the background, particularly around the edges of the dancers. The lighting is also a bit uneven, with some areas of the image being overexposed.
Lost in the Mist: A Moment of Tranquility in the Forest
A solitary figure finds peace amidst the ethereal beauty of a misty forest. Sunlight filters through the trees, casting a soft glow on the scene and creating a sense of mystery and depth. The lone figure, perched on a mossy rock, adds a touch of scale and isolation, inviting contemplation and wonder.
Prompt
poses interactive-pose: Calm, peaceful, introspective ; A lone hiker; medium shot; Adventure; A dense forest with towering trees and dappled sunlight filtering through the leaves; cinematic
Characteristic
Shot : A lone figure sits on a rock in a misty forest, surrounded by tall trees. The light is soft and diffused, creating a sense of peace and tranquility.
Aesthetic Score : 0.7
Mood : serene, tranquil, contemplative
Quality
Entropy : 6.95
Noise : 109
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.80
Image errors : There is some blurriness in the image, particularly in the background. The colors are slightly desaturated, and there is a slight lack of detail in some areas.
A Game of Shadows: Friends Gather in a Dimly Lit Room
A group of friends huddle around a table, engrossed in a board game. The warm, inviting lighting casts long shadows, adding an air of mystery and suspense to the scene. The low camera angle makes the characters seem larger than life, drawing the viewer into their intimate world.
Prompt
poses interactive-pose: Fun, playful, competitive ; A group of friends; close-up; Gaming; A dimly lit room with a table covered in board games and snacks; cinematic
Characteristic
Shot : Three people are sitting around a table, playing a board game. They appear to be engrossed in the game, and the scene is lit by warm, soft light.
Aesthetic Score : 0.6
Mood : intrigued, focused, casual
Quality
Entropy : 6.62
Noise : 76
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors.
Sunset Romance on the Beach
A couple embraces on a sun-drenched beach, their silhouettes framed against the fiery hues of the setting sun. The intimate moment is captured with a dreamy aesthetic, evoking feelings of love and connection.
Prompt
poses interactive-pose: Romantic, intimate, peaceful ; A couple; close-up; Tourism; A romantic sunset over a beach with the ocean waves crashing in the background; cinematic
Characteristic
Shot : A couple is standing on a beach, embracing each other while facing the sunset. The man is shirtless and the woman is wearing a bra and a skirt.
Aesthetic Score : 0.7
Mood : romantic, intimate, dreamy
Quality
Entropy : 6.65
Noise : 81
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.60
Image errors : The image appears to have been edited, with some unnatural color saturation and smoothing.
Silhouettes of Passion: A Band Ignites the Stage
Capture the raw energy of a live performance as a band takes the stage, bathed in dramatic spotlights. The crowd’s excitement is palpable, creating a vibrant atmosphere of passion and drama.
Prompt
poses interactive-pose: Energetic, passionate, inspiring ; A group of musicians; wide shot; Groups; A concert stage with a large crowd cheering in the background; cinematic
Characteristic
Shot : A band performing on stage in front of a large crowd. The stage is lit by spotlights, and the band members are silhouetted against the bright lights. The crowd is mostly obscured in the darkness. The stage has a weathered, somewhat rundown appearance.
Aesthetic Score : 0.6
Mood : energetic, powerful, shadowy
Quality
Entropy : 6.86
Noise : 103
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable image artifacts or errors.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.45, which is considered okay. This means the generated image’s camera position was somewhat different from what was requested in the prompt.
- Shot Analysis: The model scored 0.535, which is also considered okay. This indicates that the generated image’s shot composition was somewhat different from what was requested in the prompt.
- Aesthetic Analysis: The model scored 0.07, which is considered pretty bad. This means the generated image’s aesthetic was significantly different from what was expected based on the prompt.
Overall, the model seems to be struggling with understanding and implementing the desired aesthetic of the image. It’s doing a decent job with camera position and shot analysis, but there’s room for improvement in capturing the intended visual style.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/