AI Captures the Scene, But Misses the Mark on Style with Dall-e-3
- 9 minutes read - 1824 wordsTable of Contents
In the realm of artificial intelligence, image generation has emerged as a captivating field, with models capable of creating stunning visuals from text prompts. However, while these models excel at understanding scene composition and camera angles, they often fall short in capturing the desired aesthetic style. This discrepancy highlights the ongoing challenge of imbuing AI with a nuanced understanding of artistic expression. This blog post delves into the results of an experiment that sheds light on this challenge, exploring the strengths and limitations of current AI image generation models.
Created with: dall-e-3
Silhouetted Against the Flame: A Moment of Contemplation
A man’s profile, stark against a fiery sunset, evokes a sense of intensity and contemplation. The dramatic lighting and mountain range backdrop create a powerful and mysterious scene.
Prompt
poses close-up: epic, determined ; A lone figure, silhouetted against a blazing sunset; close-up; heroism; a vast, desolate landscape; cinematic
Characteristic
Shot : A man in silhouette with a beard, facing away from the camera, standing in front of a red and orange sunset. The sun is setting over a mountain range in the distance, creating a warm and inviting feel.
Aesthetic Score : 0.6
Mood : dramatic, contemplative, hopeful
Quality
Entropy : 6.43
Noise : 101
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some slight digital artifacts, particularly in the background, and some graininess. The colors are a little too saturated.
Unveiling Secrets: A Journey Through Time
A close-up shot captures the essence of exploration as hands trace a vintage map, surrounded by antique globes and lanterns. The mysterious lighting and composition create a sense of intrigue, inviting you to discover the hidden stories within.
Prompt
poses close-up: intrigued, adventurous ; A weathered map, its edges frayed, with a finger tracing a route; close-up; adventure; a dimly lit room filled with antique maps and globes; cinematic
Characteristic
Shot : A close-up of a hand tracing a map, with other antique globes and maps in the background, lit by a lantern.
Aesthetic Score : 0.8
Mood : mysterious, historical, adventurous
Quality
Entropy : 6.51
Noise : 99
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some minor artifacts, particularly in the shadows.
In the Zone: A Gamer’s Intense Focus
A young man is immersed in a video game, his face illuminated by the glow of multiple monitors. The dimly lit room and close-up perspective capture the intensity and focus of his competitive spirit. The blue and purple hues add a dramatic touch to the scene, highlighting the gamer’s dedication and passion.
Prompt
poses close-up: intense, focused ; A gamer’s hands, fingers flying across a keyboard, eyes glued to the screen; close-up; gaming; a dimly lit room with neon lights reflecting on the screen; cinematic
Characteristic
Shot : A man is playing a video game in a dimly lit room with other players in the background, focused on his hands typing on the keyboard.
Aesthetic Score : 0.6
Mood : intense, focused, competitive
Quality
Entropy : 6.53
Noise : 85
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.30
Image errors : No visible errors in the image.
Through the Lens: A Dreamy Escape to the Clouds
A hand cradles a camera lens, framing a breathtaking vista of mountain peaks shrouded in mist. Tiny planes trace paths across the sky, mirroring the scene captured within the lens itself. This surreal and self-referential image evokes a sense of adventure and wonder, inviting you to step into the photographer’s dreamy perspective.
Prompt
poses close-up: awe-inspiring, wonder ; A hand holding a camera, capturing a breathtaking vista; close-up; tourism; a panoramic view of a mountain range with clouds swirling below; cinematic
Characteristic
Shot : A hand holding a camera lens, framing a view of a mountain range with clouds and a sunrise in the distance.
Aesthetic Score : 0.7
Mood : dreamy, serene, majestic
Quality
Entropy : 6.62
Noise : 94
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.80
Image errors : The edges of the lens appear slightly blurred, and the reflection in the lens is somewhat distorted.
Ready for Adventure: A Still Life of Travel Preparation
A passport, compasses, a globe, and other travel essentials are arranged on a table, surrounded by maps and paper, creating a nostalgic and hopeful scene of anticipation for an upcoming adventure.
Prompt
poses close-up: nostalgic, adventurous ; A passport, open to a page with a stamp from a foreign country; close-up; travel; a cluttered backpack overflowing with travel essentials; cinematic
Characteristic
Shot : A flat lay of travel essentials, including a passport, compasses, a globe, a map, a pen, and a backpack.
Aesthetic Score : 0.8
Mood : adventurous, nostalgic, vintage
Quality
Entropy : 6.50
Noise : 109
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.60
Image errors : There are some minor artifacts in the image, particularly around the edges of the objects.
United by Fire: A Powerful Image of Strength and Togetherness
This captivating image depicts a diverse group of individuals holding hands around a roaring bonfire, their faces illuminated by the warm glow. The scene evokes a sense of unity, warmth, and togetherness, highlighting the power of connection in the face of adversity.
Prompt
poses close-up: warm, connected ; A group of hands, clasped together in a circle, symbolizing unity; close-up; groups; a campfire burning brightly in the background; cinematic
Characteristic
Shot : A group of people stand in a circle around a campfire. Their hands are clasped together, forming a ring around the flames.
Aesthetic Score : 0.7
Mood : hopeful, unity, warmth
Quality
Entropy : 6.36
Noise : 101
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.80
Image errors : The fire has a slight unnatural, blown-up effect, the hands appear a bit blurry.
A Tear Reflects the Battlefield
A close-up shot captures the raw emotion of a man’s grief, a single tear tracing a path down his cheek. The reflection in his eye reveals a scene of devastation, mirroring the turmoil within. The image evokes a profound sense of sadness and loss, leaving a lasting impact on the viewer.
Prompt
poses close-up: tragic, poignant ; A single tear rolling down a hero’s cheek, reflecting the weight of their sacrifice; close-up; heroism; a battlefield littered with fallen comrades; cinematic
Characteristic
Shot : A close-up of a person’s eye with a single tear rolling down their cheek. The tear is reflecting a scene of war, with soldiers and a battlefield in the background.
Aesthetic Score : 0.7
Mood : melancholy, somber, poignant
Quality
Entropy : 6.06
Noise : 89
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.80
Image errors : The water droplets in the tear are not realistic and appear too large and perfectly round. The reflection of the battlefield is somewhat blurry and lacks detail.
Lost and Found: A Compass Points the Way
A close-up shot of an antique compass, its needle pointing north, held in a hand against a backdrop of sun-drenched foliage. The warm light and blurry background create a sense of mystery and adventure, hinting at a journey yet to be taken.
Prompt
poses close-up: uncertain, suspenseful ; A compass needle spinning wildly, pointing in all directions; close-up; adventure; a dense jungle with sunlight filtering through the canopy; cinematic
Characteristic
Shot : A close-up of a compass in a jungle setting with the sun shining through the leaves.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, hopeful
Quality
Entropy : 6.67
Noise : 108
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.50
Image errors : There is a slight amount of graininess in the image.
Immersed in the Action: A Gamer’s Hands Grip the Controller
A close-up shot captures the intensity of a gamer’s experience. Their hands, gripping a video game controller, are the focal point, while a blurred background of a futuristic action scene adds a sense of drama and excitement. The lighting and composition create a mood of intensity and immersion, transporting the viewer into the heart of the game.
Prompt
poses close-up: exhilarated, competitive ; A joystick, gripped tightly in a gamer’s hand, as they navigate a virtual world; close-up; gaming; a brightly lit arcade with flashing lights and sounds; cinematic
Characteristic
Shot : A person is playing a video game, holding a controller, in a setting reminiscent of a futuristic sci-fi battle with glowing lights and a blurred figure in the background
Aesthetic Score : 0.7
Mood : intense, futuristic, dramatic
Quality
Entropy : 6.82
Noise : 105
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some minor artifacts in the background, potentially caused by compression or AI generation
A Vintage Tag, A Mystery Unfolds
A gloved hand clutches a weathered luggage tag, its details obscured by the blur of a bustling airport terminal. The scene whispers of secrets and journeys, leaving you to wonder about the story behind this enigmatic artifact.
Prompt
poses close-up: hopeful, anticipatory ; A luggage tag, with a handwritten note attached, signifying a journey to a new destination; close-up; travel; a bustling airport terminal with people rushing around; cinematic
Characteristic
Shot : A gloved hand holding a vintage luggage tag with a blurred background of a busy airport terminal. The image has a mysterious and suspenseful atmosphere.
Aesthetic Score : 0.6
Mood : suspense, mystery, vintage
Quality
Entropy : 6.80
Noise : 112
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.60
Image errors : The image appears to have some noise and compression artifacts, particularly in the blurred background. There is a bit of over sharpening in the image.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.57, which is considered good. This indicates that the model was able to understand the scene described in the prompt and create a shot that aligns with it.
- Aesthetic Analysis: The model scored 0.06, which is considered below average. This suggests that the generated image didn’t match the expected aesthetic style described in the prompt.
Overall, the model demonstrated a good understanding of the scene and shot composition, but struggled to accurately capture the intended camera position and aesthetic style.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://openai.com/index/dall-e-3/