AI's Struggle with Facial Expressions: A Look at the Limitations of Generative Models with Midjourney
- 9 minutes read - 1814 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and telling stories. They can add depth and complexity to any image, making it more engaging and relatable. However, generating realistic and expressive facial expressions remains a challenge for AI models. While they excel in technical aspects like camera positioning and shot composition, they often struggle to capture the subtle nuances of human emotion. This article explores the limitations of generative AI in depicting facial expressions, using a recent experiment as a case study. We will analyze the model’s performance, highlighting its strengths and weaknesses, and discuss the implications for the future of AI-generated imagery.
Created with: midjourney
Lost in the Shadows: A Figure Walks into the Unknown
A solitary figure navigates a dimly lit, narrow hallway, the textured walls and wet floor adding to the sense of mystery. The play of light and shadow creates a dramatic effect, drawing the viewer into the figure’s journey into the unknown. This image evokes a mood of loneliness, somberness, and intrigue.
Prompt
Fear Fear, anxiety: Unease, paranoia ; A lone figure; eye-level; Single Person; a dark, deserted alleyway; cinematic
Characteristic
Shot : A solitary figure walks down a long, dark hallway. The walls are textured and the floor is wet, suggesting the hallway is located in an old building.
Aesthetic Score : 0.7
Mood : dark, mysterious, lonely
Quality
Entropy : 5.03
Noise : 98
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors in the image.
A Hero Emerges from the Fog
A lone superhero stands silhouetted against the misty cityscape, the Empire State Building looming in the background. The dramatic lighting and the hero’s powerful stance evoke a sense of mystery and hope, promising an epic battle to come.
Prompt
Fear Determined, apprehensive: Dread, anticipation ; A superhero standing alone on a rooftop; eye-level; Hero; a cityscape shrouded in fog; cinematic
Characteristic
Shot : A superhero stands on a rooftop overlooking a city shrouded in fog.
Aesthetic Score : 0.7
Mood : dramatic, mysterious, hopeful
Quality
Entropy : 5.70
Noise : 109
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some slight blurring around the edges and some of the buildings in the background look a bit pixelated.
Lost in the Mist: A Solitary Figure Walks Through the Night
A lone figure walks down a quiet street, shrouded in fog and illuminated by the soft glow of streetlights. The scene evokes a sense of loneliness, mystery, and melancholy, with shadows and silhouettes adding to the intrigue. The dramatic use of light and darkness draws the viewer’s attention to the figure, leaving them wondering about their story.
Prompt
Fear Fearful, cautious: Vulnerability, isolation ; A woman walking down a dimly lit street; eye-level; Normal Person; a deserted street with flickering streetlights; cinematic
Characteristic
Shot : A lone figure walks down a deserted street at night. The street is dimly lit by streetlamps, and the figure is silhouetted against the darkness.
Aesthetic Score : 0.6
Mood : mysterious, lonely, melancholic
Quality
Entropy : 5.89
Noise : 91
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some noise in the background, particularly around the trees and the sky.
Intense Gaze Behind the Screen
A close-up shot of a young man’s face, bathed in a moody blue-green light, creates a sense of suspense and intrigue. His intense gaze, directed straight at the viewer, leaves you wondering what secrets lie behind the screen.
Prompt
Fear Wide-eyed, panicked: Disquiet, unease ; A gamer hunched over their computer; close-up; Gamer; a flickering monitor displaying a disturbing image; cinematic
Characteristic
Shot : A close-up shot of a man’s face with an intense expression. He is looking directly at the viewer, and his eyes are wide open and focused.
Aesthetic Score : 0.7
Mood : intense, mysterious, dramatic
Quality
Entropy : 5.95
Noise : 62
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image appears to have been slightly over-sharpened, creating a slightly artificial look.
A Moment of Despair: Woman Weeps in Darkness
A poignant image captures a woman consumed by sadness, her face buried in her hands as she sits alone in a dimly lit room. The dramatic lighting and her posture evoke a sense of profound melancholy and despair.
Prompt
Fear Terrified, crying: Terror, helplessness ; hiding ; low-angle; Single Person; a dark room with shadows creeping in; cinematic
Characteristic
Shot : A woman is sitting in a dimly lit room, her head in her hands. She is crying. A piece of fabric hangs in the background.
Aesthetic Score : 0.2
Mood : sad, lonely, desperate
Quality
Entropy : 5.20
Noise : 55
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is a little bit blurry, especially in the background. The lighting is uneven and there are some artifacts in the shadows.
Dragon’s Fury: A Knight’s Last Stand
A lone knight faces an epic battle against a colossal, fire-breathing dragon in a desolate landscape. The dragon’s menacing grin and the smoke-filled air create a dramatic and suspenseful scene, highlighting the knight’s vulnerability against overwhelming power.
Prompt
Fear Fearful, determined: Desperation, courage ; A hero facing a monstrous creature; eye-level; Hero; a crumbling battlefield with smoke and debris; cinematic
Characteristic
Shot : A dragon, emerging from a cloud of smoke, roars menacingly at a lone knight standing on a desolate landscape.
Aesthetic Score : 0.7
Mood : epic, dark, dramatic
Quality
Entropy : 6.51
Noise : 96
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.80
Image errors : The dragon’s scales and the knight’s armor appear somewhat blurry and lack detail.
Lightning Strikes a Moment of Suspense
A group of young people find themselves caught in a downpour, illuminated by a dramatic lightning strike. The scene evokes a sense of mystery and suspense, leaving the viewer wondering what unfolds next.
Prompt
Fear Worried, apprehensive: Anxiety, uncertainty ; A group of people huddled together in a darkened room; eye-level; Normal People; a storm raging outside with thunder and lightning; cinematic
Characteristic
Shot : A group of five young people are huddled together in the rain, a lightning strike illuminates the scene from behind. The image has a dark, mysterious feel.
Aesthetic Score : 0.7
Mood : dark, mysterious, suspenseful
Quality
Entropy : 5.60
Noise : 85
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are some minor artifacts in the image, but they are not very noticeable.
Panic in the Red and Blue Light
A young man sits frozen before his computer screen, bathed in an unsettling red and blue glow. His expression speaks of fear and impending danger, creating a scene of intense suspense.
Prompt
Fear Startled, panicked: Shock, adrenaline ; A gamer’s hands shaking as they play a horror game; close-up; Gamer; a screen displaying a jump scare; cinematic
Characteristic
Shot : A person is screaming in fear, possibly playing a horror video game. The scene is lit with red and blue light.
Aesthetic Score : 0.5
Mood : intense, scary, suspenseful
Quality
Entropy : 5.30
Noise : 55
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor noise artifacts in the image, particularly in the shadows. There is also some slight blurriness.
Silhouetted Against Despair: A Lone Figure on the Edge of the World
A solitary figure stands on a precipice, their silhouette stark against a desolate landscape. Stormy clouds gather overhead, mirroring the melancholic mood of the scene. The dramatic composition evokes a sense of isolation and profound loneliness.
Prompt
Fear Despondent, resigned: Loneliness, despair ; A lone figure standing at the edge of a cliff; eye-level; Single Person; a vast, empty landscape with a stormy sky; cinematic
Characteristic
Shot : A lone figure stands on the edge of a cliff overlooking a vast, desolate landscape. The sky is overcast with dark, stormy clouds.
Aesthetic Score : 0.7
Mood : dramatic, lonely, contemplative
Quality
Entropy : 6.57
Noise : 95
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.60
Image errors : The image appears to have some artifacts and errors in the background, particularly in the area of the clouds and landscape.
A City in Flames: One Figure Stands Amidst the Ruins
A solitary figure stands amidst a cityscape consumed by fire and smoke. The scene evokes a sense of apocalyptic desolation, with the lone figure dwarfed by the overwhelming destruction. The silhouette against the burning skyline creates a poignant and impactful image of despair.
Prompt
Fear Sad, resolute: Loss, determination ; A hero standing amidst a burning city; eye-level; Hero; a chaotic scene with smoke and flames; cinematic
Characteristic
Shot : A lone figure stands in the ruins of a city, engulfed in smoke and fire. The silhouette of the man in the foreground draws the viewer’s attention, creating a sense of isolation and despair.
Aesthetic Score : 0.7
Mood : gloomy, apocalyptic, desolate
Quality
Entropy : 6.29
Noise : 94
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.80
Image errors : The smoke and fire textures are slightly repetitive, and the figure’s silhouette is a bit flat. There are some inconsistencies in the shading of the ruins and rubble in the foreground.
Conclusion
The analysis shows that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
Camera Position:
- Score: 0.25
- Interpretation: This score indicates that the model’s ability to understand and implement camera positions in the generated image is below average. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
Shot Analysis:
- Score: 0.52
- Interpretation: This score indicates that the model’s ability to understand and create the desired shot composition is average. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
Aesthetic Analysis:
- Score: 0.15
- Interpretation: This score indicates that the model’s ability to match the expected aesthetic of the image is below average. A score between -0.2 and 0.1 would be considered very good.
Overall:
The model seems to be better at understanding and implementing camera positions and shot composition than it is at achieving the desired aesthetic. This suggests that the model might be struggling with capturing the overall visual style or mood intended by the prompt.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://midjourney.com