AI's Artistic Struggle: Capturing the Scene, Not the Feeling with Imagen-v3
- 9 minutes read - 1752 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on text prompts is a rapidly evolving field. While impressive progress has been made, there are still challenges in accurately translating complex visual concepts into realistic images. This blog post delves into the results of a generative AI model tasked with creating images based on detailed scene descriptions, highlighting its strengths and weaknesses in capturing the essence of a scene.
Created with: imagen-v3
Solitude on the Summit: A Moment of Contemplation
A lone figure stands on the peak of a mountain, gazing out at a vast, cloudy sky. The mountains are shrouded in mist, creating an atmosphere of serenity and solitude. The dramatic contrast between the dark sky and the bright light evokes a sense of awe and wonder, while the lone figure adds a sense of scale and perspective to the scene.
Prompt
poses thoughtful-pose: determined, contemplative ; Lone figure standing on a mountain peak; wide shot; heroism; dramatic sky with clouds; cinematic
Characteristic
Shot : A lone figure stands on the peak of a mountain, looking out at a vast, cloudy sky. The mountains are shrouded in mist, and the overall atmosphere is one of solitude and contemplation.
Aesthetic Score : 0.8
Mood : serene, contemplative, solitary
Quality
Entropy : 6.67
Noise : 83
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors. Image is well-composed and there are no compression artifacts or distortions.
Lost in the Jungle: A Man’s Quest for Discovery
A lone explorer, shrouded in the verdant embrace of a jungle, meticulously studies a map before an ancient stone building. His focused expression and the mysterious surroundings create a palpable sense of suspense and anticipation, hinting at a thrilling adventure yet to unfold.
Prompt
poses thoughtful-pose: curious, adventurous ; Explorer looking at a map, surrounded by ancient ruins; medium shot; adventure; jungle foliage; cinematic
Characteristic
Shot : A man in a jungle, wearing a backpack, is studying a map in front of an ancient stone building.
Aesthetic Score : 0.6
Mood : mysterious, adventurous, contemplative
Quality
Entropy : 6.48
Noise : 86
Prompt Clip Score : 0.38
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors.
Lost in the Neon Glow: A Gamer’s Intense Focus
A young man, headphones on and controller in hand, is completely immersed in a video game. The dimly lit room, bathed in neon light, creates a futuristic atmosphere, highlighting the intensity of his gaming experience.
Prompt
poses thoughtful-pose: intense, focused ; Gamer intensely focused on a screen, hands on a controller; close-up; gaming; neon lights and gaming peripherals; cinematic
Characteristic
Shot : A young man is playing video games in a dimly lit room. He is wearing headphones and is holding a game controller. The room is lit by neon lights, giving it a futuristic feel. The focus is on the man’s face and the controller.
Aesthetic Score : 0.6
Mood : focused, intense, futuristic
Quality
Entropy : 6.49
Noise : 83
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors, except minor blur around the left hand. Overall, the image is sharp and well-defined.
A Solitary Figure Contemplates the Urban Night
A lone individual stands on a stone platform, silhouetted against the dazzling lights of a sprawling cityscape. The scene evokes a sense of awe and wonder, capturing the serenity and contemplative mood of the urban night.
Prompt
poses thoughtful-pose: awe-struck, contemplative ; Tourist gazing at a breathtaking cityscape; medium shot; tourism; bustling city streets; cinematic
Characteristic
Shot : A lone person stands on a stone platform overlooking a vast cityscape at night, looking out towards a distant skyline of skyscrapers and lights.
Aesthetic Score : 0.7
Mood : serene, contemplative, urban
Quality
Entropy : 6.67
Noise : 106
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight chromatic aberration visible in the edges of the image
Golden Hour Romance on the Cliffside
A couple finds tranquility and adventure perched on a dramatic cliff overlooking a vast ocean at sunset. The scene evokes a sense of awe and perspective, capturing the beauty of golden hour in a serene and romantic setting.
Prompt
poses thoughtful-pose: relaxed, introspective ; Backpackers sitting on a cliff overlooking a vast ocean; wide shot; travel; sunset sky; cinematic
Characteristic
Shot : A couple is sitting on a cliff overlooking a vast body of water, possibly the ocean, during golden hour.
Aesthetic Score : 0.7
Mood : tranquil, serene, adventurous
Quality
Entropy : 6.97
Noise : 102
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors.
Campfire Glow: Friends Gather for a Cozy Night in the Woods
A group of friends huddle around a crackling campfire, their faces illuminated by the warm glow. The scene evokes a sense of intimacy and warmth, with the firelight creating a dramatic contrast against the surrounding darkness. The cold night air is palpable, but the warmth of the fire and the company of friends create a cozy and inviting atmosphere.
Prompt
poses thoughtful-pose: intimate, nostalgic ; Group of friends huddled around a campfire, sharing stories; medium shot; groups; starry night sky; cinematic
Characteristic
Shot : A group of friends gathered around a campfire at night in the woods. The fire is casting a warm glow on their faces and creating a cozy atmosphere. They are all wearing warm clothing, which suggests that it is cold outside.
Aesthetic Score : 0.7
Mood : cozy, intimate, warm
Quality
Entropy : 5.20
Noise : 93
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : None, the image is well-exposed and sharp.
Silhouetted in the City
A solitary figure stands against the vibrant backdrop of a city skyline, lost in contemplation as the urban lights twinkle below. The silhouette evokes a sense of mystery and solitude, capturing the essence of a quiet moment amidst the bustling city.
Prompt
poses thoughtful-pose: reflective, hopeful ; A lone figure standing on a bridge, looking out at the city lights; medium shot; heroism; cityscape at night; cinematic
Characteristic
Shot : A man stands silhouetted against a cityscape at night, looking out over a railing at the city lights.
Aesthetic Score : 0.6
Mood : solitude, contemplative, urban
Quality
Entropy : 5.44
Noise : 83
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry and the exposure is a bit too dark. The person in the foreground is out of focus, making the image less appealing.
Lost in the Foggy Jungle
A group of four adventurers navigate a dense, verdant jungle shrouded in mist. The atmosphere is thick with mystery and foreboding, leaving the path ahead uncertain. Will they find their way out, or will the jungle claim them?
Prompt
poses thoughtful-pose: determined, cautious ; A group of adventurers navigating a dense forest; wide shot; adventure; lush green foliage; cinematic
Characteristic
Shot : A group of four people are walking through a dense jungle, following a path. The jungle is very lush and green, with many trees and vines. There is a lot of fog in the air, making it difficult to see far ahead.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, foreboding
Quality
Entropy : 6.59
Noise : 103
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image seems to have some slight blurring, making the details in the distance look soft.
Victory Dance! Gamer Celebrates Triumph with Joyful Fist Pump
This image captures the pure joy of victory. A young man, headphones on, sits before his computer, beaming with happiness and raising his fists in the air. The vibrant lighting and his energetic expression create a sense of excitement and triumph, perfectly encapsulating the thrill of winning a video game.
Prompt
poses thoughtful-pose: triumphant, excited ; A gamer celebrating a victory, fist raised in the air; close-up; gaming; vibrant gaming setup; cinematic
Characteristic
Shot : A young man is sitting in front of a computer, wearing headphones, celebrating a victory in a video game. He has a happy expression on his face and his fists are raised in the air.
Aesthetic Score : 0.7
Mood : joyful, triumphant, energetic
Quality
Entropy : 6.33
Noise : 83
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No major errors detected.
Silhouetted Serenity: A Man Contemplates the Sunset
A solitary figure sits on a cliff, bathed in the golden hues of a setting sun. The scene evokes a sense of peace and contemplation, as the man’s silhouette against the vibrant sky creates a powerful image of solitude and introspection.
Prompt
poses thoughtful-pose: Solitude, anticipation ; A lone figure silhouetted against the horizon, watching the sun rise over the vast, shimmering ocean.; cinematic
Characteristic
Shot : A man sitting on a cliff overlooking the ocean at sunset.
Aesthetic Score : 0.7
Mood : serene, contemplative, peaceful
Quality
Entropy : 5.28
Noise : 68
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable artifacts or errors in the image.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.49, which is considered below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflects it.
- Aesthetic Analysis: The model scored 0.04, which is considered very good. This means the generated image closely matched the expected aesthetic style.
Overall, the model seems to be better at capturing the desired aesthetic than understanding the scene and camera position. This suggests that the model might need further training to improve its ability to interpret and translate prompts into accurate visual representations.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/