AI's Artistic Struggle: Capturing Emotion in Visuals with Dall-e-3
- 9 minutes read - 1874 wordsTable of Contents
The ability to express emotions through art is a uniquely human trait. While AI has made significant strides in generating realistic images, capturing the subtle nuances of human emotion remains a significant challenge. This blog post delves into an experiment that explores the limitations of AI in translating aesthetic descriptions, particularly those related to emotional expression, into visual art. We’ll examine the results of a generative AI model tasked with creating images based on specific scene descriptions, focusing on the model’s performance in capturing camera position, shot analysis, and aesthetic elements, particularly those related to emotional expression.
Created with: dall-e-3
Silhouetted in the Desert Sun
A solitary figure stands in a vast desert landscape, bathed in the golden light of the setting sun. The dramatic contrast of light and shadow creates a sense of mystery and isolation, inviting contemplation of the vastness of the world.
Prompt
facial-expressions Determination: Solitude and resilience ; A lone figure; eye-level; Single Person; A vast, desolate landscape; cinematic
Characteristic
Shot : A man stands in a vast, dry, desert landscape with a dramatic cloudy sky and sun rays shining through.
Aesthetic Score : 0.7
Mood : lonely, mysterious, dramatic
Quality
Entropy : 6.75
Noise : 93
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.60
Image errors : The desert landscape appears slightly blurry and the man’s face is obscured by shadows.
Last Stand in the Ashes
A lone, muscular figure, forged of metal, kneels amidst the fiery ruins of a city. Their gaze, unwavering and intense, speaks of a hero facing overwhelming odds. The scene is a testament to the enduring spirit of defiance in the face of destruction.
Prompt
facial-expressions Determination: Courage and unwavering resolve ; A hero standing tall; low-angle; Hero; A burning city in the background; cinematic
Characteristic
Shot : A muscular, futuristic cyborg-like figure kneels in front of a burning cityscape. The figure is lit by a warm orange glow from the flames behind him.
Aesthetic Score : 0.7
Mood : dark, intense, heroic
Quality
Entropy : 6.43
Noise : 104
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some slight artifacts can be noticed in the image, particularly around the figure’s edges, which appear slightly jagged.
The Weight of Industry: A Man’s Struggle in the Factory
A dramatic image captures the intensity of labor in an industrial setting. A man, his face etched with determination, pushes a heavy cart, his effort highlighted by the dramatic lighting. The scene evokes a sense of urgency and struggle, showcasing the weight of industry on the individual worker.
Prompt
facial-expressions Determination: Grit and perseverance ; A worker pushing a heavy cart; eye-level; Normal People; A bustling factory floor; cinematic
Characteristic
Shot : A man in overalls is pushing a heavy cart in a factory, surrounded by other workers wearing hardhats. The scene is chaotic and full of movement.
Aesthetic Score : 0.7
Mood : intense, determined, industrial
Quality
Entropy : 6.57
Noise : 106
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has some minor artifacts in the background and a few blurred areas. The lighting is somewhat uneven, with some areas being overexposed and others being underexposed.
Lost in the Neon Glow: A Gamer’s Focus
A woman sits immersed in a dimly lit gaming room, bathed in the vibrant glow of neon lights and computer screens. The atmosphere is electric, the mood intense, and her focused gaze draws you into the heart of the action. This is a world where the future is now, and the stakes are high.
Prompt
facial-expressions Determination: Concentration and drive ; A gamer intensely focused on a screen; close-up; Gamer; A dimly lit room with glowing monitors; cinematic
Characteristic
Shot : A woman is sitting in a dimly lit room with red and blue lights, playing a video game. The room looks like a gaming arcade with other people playing in the background.
Aesthetic Score : 0.7
Mood : intense, focused, dramatic
Quality
Entropy : 6.57
Noise : 82
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : The lighting and colors are too saturated and unnatural. The image seems to be AI generated, with some artifacts in the hair and skin.
A Window to Melancholy: Woman Gazes at Desolate Landscape
A woman in a headscarf stands by a window, her soft features contrasting with the harshness of the desolate landscape outside. A field of cars sits under a stormy sky, creating a sense of unease and mystery. The window acts as a barrier, isolating the woman and the viewer, leaving us to contemplate the weight of her gaze.
Prompt
facial-expressions Determination: Inner strength and hope ; A woman staring out a window; eye-level; Single Person; A stormy sky; cinematic
Characteristic
Shot : A woman in a hijab is looking out of a window at a distant, bleak cityscape, possibly a refugee camp or an urban dystopia.
Aesthetic Score : 0.7
Mood : melancholy, hopeful, mysterious
Quality
Entropy : 6.16
Noise : 97
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has a slight blurriness, particularly in the cityscape, which could be due to post-processing or a soft focus effect.
Triumphant Warrior: A Tale of Victory and Sacrifice
A powerful image captures the aftermath of a fierce battle. A bloodied warrior, possibly a king, stands victorious amidst the fallen, his sword raised high. Backlit by a golden glow, he embodies the epic struggle and the heavy cost of victory.
Prompt
facial-expressions Determination: Victory and unwavering resolve ; A hero raising a sword; low-angle; Hero; A battlefield with fallen enemies; cinematic
Characteristic
Shot : A warrior stands in a battlefield with fallen soldiers around him, holding a sword, with a dramatic lighting and composition
Aesthetic Score : 0.7
Mood : dramatic, intense, epic
Quality
Entropy : 6.86
Noise : 86
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.40
Image errors : The image has some minor artifacts, such as blurriness around the edges of the character and some light artifacts in the background. The lighting is also a bit too harsh, which can cause some parts of the image to be overexposed.
Fear in the Flames: Family Huddles as Home Burns
A powerful image captures the raw emotion of a family facing a devastating fire. The contrast between the bright flames and their somber faces creates a dramatic scene, highlighting their vulnerability and fear.
Prompt
facial-expressions Determination: Resilience and unity ; A family huddled together; eye-level; Normal People; A burning house in the background; cinematic
Characteristic
Shot : A family huddled together in fear as their house burns in the background. The image is framed to emphasize the family’s emotional reaction to the fire.
Aesthetic Score : 0.7
Mood : dramatic, tense, fearful
Quality
Entropy : 6.68
Noise : 99
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to be digitally enhanced or altered, which may detract from its realism and artistic value.
Caught in the Moment: Excitement and Shock on a Man’s Face
A close-up shot captures a man’s intense reaction, his mouth agape and eyes wide with excitement or shock. Dramatic lighting accentuates the emotion, creating a thrilling and captivating image.
Prompt
facial-expressions Determination: Excitement and focus ; A gamer’s hands furiously typing on a keyboard; close-up; Gamer; A brightly lit gaming room; cinematic
Characteristic
Shot : A man is sitting at a computer and typing on the keyboard. He is looking at the screen with a wide-eyed, excited expression. The scene is lit with blue and red lights.
Aesthetic Score : 0.3
Mood : intense, dramatic, focused
Quality
Entropy : 6.42
Noise : 92
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts, such as the blurry background and the unnatural lighting. The man’s expression looks somewhat artificial.
A Solitary Figure in the Rain-Drenched Forest
A man, shrouded in mystery, walks a path through a rain-soaked forest. The sun breaks through the clouds, casting dramatic shadows and hinting at a hidden presence in the distance. This evocative scene evokes a sense of contemplation, suspense, and the unknown.
Prompt
facial-expressions Determination: Hope and perseverance ; A lone figure walking towards a distant light; eye-level; Single Person; A dark, foreboding forest; cinematic
Characteristic
Shot : A man with a backpack stands in a forest path, looking ahead towards another man walking away in the distance, during a heavy downpour. The image has a stylized look, as if it were a comic book panel or illustration.
Aesthetic Score : 0.7
Mood : mysterious, atmospheric, suspenseful
Quality
Entropy : 6.42
Noise : 80
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some artifacts, particularly in the rain, which appears a bit flat and unrealistic. The trees and foliage in the background lack detail and depth, looking somewhat artificial.
Heroic Silhouette: A Sunset Symphony of Hope
A powerful superhero stands tall against the fiery backdrop of a setting sun, their silhouette a symbol of hope and anticipation. This epic scene captures the essence of heroism and the promise of a brighter future.
Prompt
facial-expressions Determination: Confidence and unwavering resolve ; A hero standing on a rooftop; high-angle; Hero; A city skyline bathed in sunlight; cinematic
Characteristic
Shot : A superhero standing on a rooftop overlooking a city at sunrise.
Aesthetic Score : 0.6
Mood : epic, hopeful, powerful
Quality
Entropy : 6.74
Noise : 101
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.80
Image errors : The city skyline is a bit blurry and the lighting is a little flat.
Conclusion
The analysis shows that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
Camera Position:
- Score: 0.4
- Interpretation: This score falls below the “good” range of 0.5 to 0.75. It suggests that the model didn’t perfectly capture the intended camera position described in the prompt.
Shot Analysis:
- Score: 0.6
- Interpretation: This score falls within the “good” range of 0.5 to 0.75. It indicates that the model was able to understand the scene described in the prompt and create a shot that aligns with it to a decent degree.
Aesthetic Analysis:
- Score: 0.15
- Interpretation: This score is significantly higher than the “very good” range of -0.2 to 0.1. It suggests that the generated image’s aesthetic deviated considerably from the expected aesthetic described in the prompt.
Overall:
The model demonstrates a good understanding of camera position and shot composition, but struggles to accurately capture the desired aesthetic. This suggests that the model might need further training to better understand and translate aesthetic descriptions into visual elements.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://openai.com/index/dall-e-3/