AI Captures the Essence, But Misses the Shot with Imagen-v3
- 10 minutes read - 1980 wordsTable of Contents
The world of generative AI is rapidly evolving, with models capable of creating stunningly realistic images from text prompts. However, these models are not without their limitations. This blog post examines a recent experiment where a generative AI model was tasked with creating images based on specific prompts, focusing on the model’s ability to capture both the aesthetic style and technical details of the scene. The results reveal a fascinating insight into the strengths and weaknesses of current AI image generation technology.
One of the key findings is that the model excels at capturing the overall mood and aesthetic style of the prompt. For example, when asked to create an image of a lone figure silhouetted against a setting sun, the model successfully generated an image that conveyed a sense of heroism and vastness. However, the model struggled with accurately representing the camera position and shot composition, often deviating from the intended perspective. This suggests that while AI models are becoming increasingly adept at understanding and replicating artistic styles, they still have room for improvement in terms of technical precision.
Created with: imagen-v3
Silhouetted Against the Setting Sun: A Lone Figure in the Desert
A solitary figure stands silhouetted against a vibrant sunset, their back to the viewer, holding a tool or weapon. The vast, desolate desert landscape creates a sense of mystery and isolation, emphasizing the figure’s smallness in the expanse. The dramatic lighting and contemplative mood evoke a sense of loneliness and introspection.
Prompt
poses leaning: epic, hopeful ; A lone figure, silhouetted against a setting sun; wide shot; heroism; a vast, desolate landscape; cinematic
Characteristic
Shot : A lone figure stands silhouetted against a vibrant sunset, their back to the viewer, holding a tool or weapon. The setting appears to be a vast desert landscape, likely a dry lakebed or playa.
Aesthetic Score : 0.7
Mood : dramatic, contemplative, desolate
Quality
Entropy : 6.29
Noise : 57
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors, the image is clean and sharp
Into the Unknown: Adventurers Brave a Shadowy Cave
A group of intrepid explorers venture deep into a dark and foreboding cave, their torches casting flickering shadows that dance on the rough walls. Their expressions are etched with a mixture of anticipation and apprehension, hinting at the dangers that may lie ahead. The scene is both thrilling and unsettling, promising an adventure filled with mystery and suspense.
Prompt
poses leaning: suspenseful, adventurous ; A group of adventurers, their faces illuminated by flickering torchlight; medium shot; adventure; a dark, mysterious cave; cinematic
Characteristic
Shot : A group of adventurers are walking through a dark cave, illuminated by torches. The characters are facing the camera. They look alert and cautious, as if they are aware of danger.
Aesthetic Score : 0.6
Mood : intense, mysterious, adventurous
Quality
Entropy : 5.99
Noise : 87
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some minor artifacts, particularly around the edges of the characters. These artifacts are not very noticeable, but they are present. There are also some minor color banding issues in the shadows.
The Glow of Focus: A Close-Up on Typing in the Dark
A dimly lit room, a keyboard bathed in green light, and hands moving with purpose. This image captures the intensity and focus of someone deeply engaged in their work, highlighting the techy atmosphere and the intimate connection between human and machine.
Prompt
poses leaning: intense, focused ; A gamer’s hands, fingers flying across a keyboard; close-up; gaming; a brightly lit gaming setup; cinematic
Characteristic
Shot : A person is typing on a keyboard in a dimly lit room. The keyboard is illuminated with green backlighting, highlighting the hands and fingers.
Aesthetic Score : 0.6
Mood : intense, focused, techy
Quality
Entropy : 6.36
Noise : 68
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image quality is good, but the lighting is a bit uneven, casting some shadows that obscure details.
Silhouette of Love: A Dreamy Night in the City
Experience the magic of a romantic night as a couple stands silhouetted against the sparkling city skyline. The dreamy scene, filled with contemplation and wonder, is a testament to the enchanting power of love and the beauty of cityscapes at night.
Prompt
poses leaning: romantic, awe-inspiring ; A couple leaning on a railing, gazing out at a breathtaking cityscape; medium shot; tourism; a vibrant, bustling city; cinematic
Characteristic
Shot : A couple is silhouetted against a city skyline at night, looking out at the view from a hilltop
Aesthetic Score : 0.7
Mood : romantic, dreamy, contemplative
Quality
Entropy : 6.78
Noise : 110
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no significant artifacts or errors in the image.
Contemplating the Peaks: A Hiker Finds Tranquility on a Winding Mountain Road
A lone hiker, backpack in tow, leans against a post on the side of a mountain road, taking in the breathtaking vista of a winding mountain pass. The vastness of the mountains, the sense of isolation, and the dramatic composition create a tranquil and contemplative mood, hinting at the adventurous spirit of the journey.
Prompt
poses leaning: reflective, adventurous ; A backpacker, leaning against a weathered signpost, looking out at a winding mountain road; medium shot; travel; a scenic mountain range; cinematic
Characteristic
Shot : A lone hiker with a backpack leans against a post on the side of a mountain road, looking out over a winding mountain pass.
Aesthetic Score : 0.7
Mood : tranquil, contemplative, adventurous
Quality
Entropy : 6.75
Noise : 93
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable artifacts or errors.
Laughter and Light: Friends Embrace the Joy of European Travel
A group of six young adults stroll down a charming cobblestone street, their laughter echoing through the air. The warm light and relaxed postures radiate a sense of happiness and camaraderie, capturing the essence of carefree friendship and the joy of exploring a new city.
Prompt
poses leaning: joyful, carefree ; A group of friends, laughing and leaning on each other, as they walk down a cobblestone street; wide shot; groups; a charming, historic town; cinematic
Characteristic
Shot : A group of six young adults walking down a cobblestone street in a European city, laughing and enjoying each other’s company.
Aesthetic Score : 0.7
Mood : happy, carefree, friendly
Quality
Entropy : 6.85
Noise : 95
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors.
A Solitary Figure Contemplates the Stormy Sea
A lone figure stands on a cliff edge, arms outstretched, facing the tumultuous waves and dark sky. The dramatic contrast between the small figure and the vastness of the sea creates a powerful image of vulnerability and contemplation. The stormy weather adds to the sense of danger and uncertainty, making this a truly captivating scene.
Prompt
poses leaning: powerful, defiant ; A lone figure, standing on a cliff edge, arms outstretched, leaning into the wind; wide shot; heroism; a dramatic, stormy sea; cinematic
Characteristic
Shot : A lone figure stands on a cliff edge overlooking a stormy sea, arms outstretched. The dark sky and churning waves create a dramatic and powerful image.
Aesthetic Score : 0.8
Mood : dramatic, powerful, contemplative
Quality
Entropy : 6.39
Noise : 92
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is a slight blurriness in the image, particularly around the figure, it could be due to movement. The composition could be improved, the figure is slightly off center and could be moved to the center to add more power to the image
Lost in the Wilderness: A Campfire Under a Starless Sky
Four adventurers huddle around a flickering campfire, their faces illuminated by the dancing flames. The dense forest surrounding them is shrouded in darkness, creating an atmosphere of mystery and suspense. Will they find their way out, or will the shadows of the night consume them?
Prompt
poses leaning: intimate, suspenseful ; A group of explorers, huddled around a campfire, sharing stories; medium shot; adventure; a dense, mysterious forest; cinematic
Characteristic
Shot : Four people are gathered around a campfire in a dense forest, likely at night. They are wearing clothing that suggests an expedition or adventure.
Aesthetic Score : 0.6
Mood : mysterious, suspenseful, adventurous
Quality
Entropy : 5.69
Noise : 94
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly grainy, and there is some noise in the darker areas. The lighting is somewhat uneven, with the faces of the people around the campfire being less well-lit than the fire itself.
The Moment He Knew He Won
A young gamer, bathed in the vibrant glow of his setup, stares at his screen with a mix of surprise and intense focus. The blue and purple lights cast dramatic shadows, highlighting the moment of triumph.
Prompt
poses leaning: intense, focused ; A gamer’s face, illuminated by the glow of a monitor, eyes wide with excitement; close-up; gaming; a dimly lit room; cinematic
Characteristic
Shot : A young man wearing headphones and a gaming jersey is looking at a screen with a surprised expression. The scene is lit with blue and purple lights, typical of gaming setups.
Aesthetic Score : 0.6
Mood : intense, focused, surprised
Quality
Entropy : 5.98
Noise : 69
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some minor artifacts, particularly around the edges of the headphones and the man’s hair.
Sunset Romance on the Beach
A couple finds solace and intimacy as the sun dips below the horizon, painting the sky in vibrant hues of orange and pink. The vast ocean serves as a breathtaking backdrop, emphasizing the quiet beauty of their moment.
Prompt
poses leaning: peaceful, heartwarming ; leaning on each other, watching a sunset over a vast ocean; wide shot; travel; a serene, sandy beach; cinematic
Characteristic
Shot : A couple is sitting on a beach at sunset, with their backs to the camera. The sun is setting in the distance, and the sky is a beautiful orange and pink.
Aesthetic Score : 0.7
Mood : romantic, peaceful, cozy
Quality
Entropy : 6.71
Noise : 80
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, particularly in the sky. The shadows are also a bit flat. The couple’s skin tones are slightly muted.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is considered below average. This suggests that the model didn’t accurately capture the intended camera positions described in the prompt.
- Shot Analysis: The model scored 0.44, also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create the expected shot composition.
- Aesthetic Analysis: The model scored 0.08, which is considered very good. This means the generated image closely matched the desired aesthetic style.
Overall, the model seems to be better at understanding and capturing the aesthetic style of the prompt than it is at accurately representing camera positions and shot composition.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/