AI's Artistic Journey: Capturing Poses, But Missing the Shot with Imagen-v3-fast
- 9 minutes read - 1845 wordsTable of Contents
In the realm of artificial intelligence, image generation has made significant strides. However, achieving a perfect balance between artistic vision and technical accuracy remains a challenge. This blog post examines the results of an AI model tasked with generating images based on scene descriptions and poses, highlighting its strengths and weaknesses. The model demonstrates a remarkable ability to capture the aesthetic style of poses, but struggles with accurately representing camera positions and scene composition. This discrepancy reveals the ongoing journey of AI in mastering the art of visual storytelling.
Created with: imagen-v3-fast
A Hiker’s Journey to the Majestic Peak
A lone hiker traverses a stone path winding towards a snow-capped mountain peak, bathed in warm sunlight. The scene evokes a sense of serenity, inspiration, and adventure, emphasizing the vastness of nature and the thrill of exploration.
Prompt
poses interactive-pose: Determined, hopeful, adventurous ; A lone adventurer; wide shot; Adventure; Majestic mountain range with a winding path leading to a hidden valley; cinematic
Characteristic
Shot : A lone hiker walks on a stone path leading into a mountain valley. The path winds up towards a majestic, snow-capped mountain peak. The scene is bathed in warm sunlight, and there are fluffy clouds in the sky.
Aesthetic Score : 0.8
Mood : serene, inspiring, adventurous
Quality
Entropy : 6.63
Noise : 87
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.90
Image errors : The clouds have a slightly ‘painted’ or artificial appearance. Some of the mountains, particularly in the background, lack detail.
Friends Gather for a Night of Excitement and Fun
Capture the joy of a game night with this image of four friends engrossed in a video game. The woman in the yellow sweater radiates excitement, her gaze directly connecting with the viewer. The playful mood and casual setting create a sense of camaraderie and shared enjoyment.
Prompt
poses interactive-pose: Excited, focused, competitive ; A group of friends; medium shot; Gaming; A dimly lit room with a large screen displaying a video game, surrounded by controllers and snacks; cinematic
Characteristic
Shot : Four friends are playing a video game together. They are all sitting on a couch in a living room. The woman in the yellow sweater is the most excited, and she is looking directly at the camera. The man in the blue shirt is also looking at the camera. The other two friends are looking at the television.
Aesthetic Score : 0.7
Mood : excited, playful, casual
Quality
Entropy : 6.50
Noise : 70
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant image errors.
Heroic Silhouette Against the Setting Sun
A powerful superhero stands tall against the backdrop of a vibrant city skyline at sunset. The dramatic lighting and composition evoke a sense of heroism, determination, and grandeur, capturing the essence of a true champion.
Prompt
poses interactive-pose: Confident, powerful, heroic ; A superhero; close-up; Heroism; A cityscape with towering buildings and a dramatic sunset in the background; cinematic
Characteristic
Shot : A superhero standing in front of a city skyline at sunset
Aesthetic Score : 0.7
Mood : heroic, determined, powerful
Quality
Entropy : 6.75
Noise : 88
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some minor artifacts and blurring are visible in the background and on the superhero’s costume.
Street Performer Brings Life to a Charming Alleyway
A vibrant street performer in a colorful costume captivates a crowd in a narrow, historic street. The camera’s perspective, looking down the alley, emphasizes the depth and scale of the scene, creating a festive and lively atmosphere.
Prompt
poses interactive-pose: Energetic, vibrant, chaotic ; A medium shot of a bustling marketplace, showcasing a kaleidoscope of colors and textures, with street performers captivating the crowd.; cinematic
Characteristic
Shot : A street performer in a colorful costume is entertaining a crowd of people in a narrow street lined with old buildings.
Aesthetic Score : 0.7
Mood : festive, lively, charming
Quality
Entropy : 6.58
Noise : 101
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : No notable errors.
A Journey Begins at Sunset
A woman stands on a hillside, bathed in the warm glow of the setting sun, gazing at a winding road leading towards distant mountains. The scene evokes a sense of reflection, hope, and the promise of new beginnings.
Prompt
poses interactive-pose: Free, adventurous, contemplative ; A traveler; close-up; Travel; A scenic landscape with rolling hills, a clear blue sky, and a winding road leading to the horizon; cinematic
Characteristic
Shot : A woman standing on a hillside, looking at a winding road that leads towards a mountain range in the distance. The scene is bathed in the warm glow of a setting sun.
Aesthetic Score : 0.6
Mood : reflective, hopeful, introspective
Quality
Entropy : 6.87
Noise : 69
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has slight overexposure in the sky, causing some loss of detail in the clouds. The road appears a bit too smooth and lacks texture.
Young Ensemble Takes Center Stage with Energetic Performance
A group of eight young adults, bathed in spotlight, exude playful energy as they pose dramatically in front of a dimly lit stage set. The vibrant lighting and theatrical staging create a captivating visual, highlighting the group’s youthful exuberance and captivating presence.
Prompt
poses interactive-pose: Energetic, expressive, joyful ; A group of dancers; wide shot; Groups; A brightly lit stage with a vibrant backdrop, showcasing a performance; cinematic
Characteristic
Shot : A group of eight young adults, seven men and one woman, standing in a line, arms extended, in front of a stage set with stairs and a stained glass window. The set is dimly lit and the group is well-lit, making them pop out.
Aesthetic Score : 0.4
Mood : energetic, playful, theatrical
Quality
Entropy : 6.43
Noise : 83
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable artifacts or errors.
Lost in the Mist: A Solitary Figure Seeks Hope in the Forest
A single figure, dwarfed by the towering trees and shrouded in mist, walks a narrow path through a sun-dappled forest. The scene evokes a sense of mystery, hope, and serenity, highlighting the figure’s vulnerability in the vastness of nature.
Prompt
poses interactive-pose: Calm, peaceful, introspective ; A lone hiker; medium shot; Adventure; A dense forest with towering trees and dappled sunlight filtering through the leaves; cinematic
Characteristic
Shot : A solitary figure walks down a path in a misty forest, the sunlight shining through the trees in the distance.
Aesthetic Score : 0.75
Mood : mysterious, hopeful, serene
Quality
Entropy : 6.38
Noise : 72
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.80
Image errors : The figure’s hair and clothing are slightly blurry, as if the image is a painting, not a photograph. The lighting is a bit unrealistic, and the trees have a slightly repetitive, cartoonish look.
Friends Gather for a Game of Mystery
A dimly lit table sets the scene for a suspenseful board game night. Three friends lean in, their expressions hinting at the playful tension of the game. The cozy atmosphere and focused gazes create a sense of intrigue and anticipation.
Prompt
poses interactive-pose: Fun, playful, competitive ; A group of friends; close-up; Gaming; A dimly lit room with a table covered in board games and snacks; cinematic
Characteristic
Shot : A group of three friends are playing a board game at a dimly lit table. The image is cropped slightly off-center on the left side, but otherwise is well-composed.
Aesthetic Score : 0.6
Mood : cozy, suspenseful, playful
Quality
Entropy : 6.47
Noise : 69
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some minor artifacts, such as noise and a slight blur in the background. The lighting is a little uneven.
Sunset Romance: A Couple’s Embrace on the Beach
A captivating image of a couple lost in a loving embrace, bathed in the soft glow of a setting sun. The blurred beach background adds a sense of intimacy and tranquility, capturing the essence of a romantic moment.
Prompt
poses interactive-pose: Romantic, intimate, peaceful ; A couple; close-up; Tourism; A romantic sunset over a beach with the ocean waves crashing in the background; cinematic
Characteristic
Shot : A couple is embracing in a romantic pose with a blurred beach background.
Aesthetic Score : 0.8
Mood : romantic, intimate, loving
Quality
Entropy : 6.63
Noise : 65
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors.
Band Ignites Stage with Joyful Performance
A six-piece band captivates a cheering crowd with their energetic performance. The musicians, bathed in bright stage lights against a dark night sky, exude pure joy as they raise their arms in celebration. The image captures the vibrant energy and celebratory mood of the event.
Prompt
poses interactive-pose: Energetic, passionate, inspiring ; A group of musicians; wide shot; Groups; A concert stage with a large crowd cheering in the background; cinematic
Characteristic
Shot : A band of six people on a stage with a large audience behind them. The band members are all kneeling and raising their arms in the air, and they are all smiling. The audience is also smiling and cheering. The stage is lit up with bright lights, and the background is a dark night sky.
Aesthetic Score : 0.7
Mood : joyful, celebratory, energetic
Quality
Entropy : 5.97
Noise : 71
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors
Conclusion
The results show that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.36, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.55, which is considered average. This indicates that the model was able to understand the scene in the prompt to a reasonable degree, but not exceptionally well.
- Aesthetic Analysis: The model scored 0.095, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model seems to be better at understanding the aesthetic style than the camera position and scene composition. This suggests that the model might need further training to improve its ability to accurately interpret and implement camera positions and shot types.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/