AI's Facial Expressions: A Step Forward, But Still Room for Growth with Imagen-v3-fast
- 9 minutes read - 1777 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and adding depth to storytelling. In the realm of AI-generated imagery, capturing these nuances is a significant challenge. This analysis delves into the performance of a generative AI model in creating images with specific facial expressions, exploring its strengths and weaknesses in capturing the desired aesthetic.
Created with: imagen-v3-fast
Fear in the Shadows: A Man’s Terrified Expression in a Dark Alley
A close-up shot captures a man’s face contorted in fear, his eyes wide with shock. He stands in a dimly lit alley, his hooded jacket blending into the darkness. The scene evokes a sense of suspense and apprehension, leaving the viewer wondering what has caused his terror.
Prompt
facial-expressions Fear: Unease, paranoia ; A lone figure; eye-level; Single Person; a dark, deserted alleyway; cinematic
Characteristic
Shot : A man in a dark hooded jacket stands in an alley, looking startled. The background shows a city street at night. His facial expression is a mix of fear and shock, his eyes are wide open, and his mouth is slightly ajar.
Aesthetic Score : 0.6
Mood : suspenseful, dark, apprehensive
Quality
Entropy : 6.42
Noise : 65
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.90
Image errors : The man’s eyes appear to be slightly blurry and the lighting on his face is too artificial.
Superman: Guardian of the Night
A dramatic shot of Superman standing tall on a rooftop, overlooking a sprawling cityscape bathed in the glow of the night. The hero’s determined expression and the grandeur of the scene evoke a sense of power and contemplation.
Prompt
facial-expressions Fear: Dread, anticipation ; A superhero standing alone on a rooftop; eye-level; Hero; a cityscape shrouded in fog; cinematic
Characteristic
Shot : Superman standing on a rooftop overlooking a city skyline at night.
Aesthetic Score : 0.6
Mood : dramatic, heroic, contemplative
Quality
Entropy : 6.80
Noise : 69
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to have some minor artifacts, particularly in the clouds and the city skyline. The texture on Superman’s costume is slightly blurry, giving it a plastic appearance.
Lost in the Shadows: A Woman’s Worried Gaze in a Suspenseful Cityscape
A woman stands alone in the middle of a dimly lit street, her worried expression and the surrounding cityscape creating a palpable sense of suspense and unease. The scene evokes feelings of anxiety and apprehension, leaving the viewer wondering what secrets lie hidden in the shadows.
Prompt
facial-expressions Fear: Vulnerability, isolation ; A woman walking down a dimly lit street; eye-level; Normal Person; a deserted street with flickering streetlights; cinematic
Characteristic
Shot : A woman standing in the middle of a street, with a background of buildings and streetlights. She is wearing a dark jacket and is looking at the camera with a worried expression.
Aesthetic Score : 0.6
Mood : suspenseful, anxious, apprehensive
Quality
Entropy : 6.53
Noise : 35
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors, but there are a few minor compression artifacts in the background
The Moment He Knew
A young man, headphones on, stares intently at his computer screen. His face, a mix of excitement and concentration, hints at a moment of revelation. The close-up shot amplifies the intensity, leaving the viewer wondering what he’s discovered.
Prompt
facial-expressions Fear: Disquiet, unease ; A gamer hunched over their computer; close-up; Gamer; a flickering monitor displaying a disturbing image; cinematic
Characteristic
Shot : A young man wearing headphones is looking intently at a computer screen, his face is a mix of excitement and concentration.
Aesthetic Score : 0.6
Mood : intense, focused, surprised
Quality
Entropy : 6.05
Noise : 44
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is a slight blurriness around the edges of the image, indicating potential noise reduction or sharpening artifacts.
Fear Lurks in the Shadows
A woman hides behind a door, her face etched with fear as a hand reaches out from the darkness. The narrow doorway and ominous shadows create a palpable sense of suspense and dread.
Prompt
facial-expressions Fear: Terror, helplessness ; hiding ; low-angle; Single Person; a dark room with shadows creeping in; cinematic
Characteristic
Shot : A woman is hiding behind a door, with her face visible and a hand appearing behind her in the doorway, creating a sense of suspense.
Aesthetic Score : 0.6
Mood : suspenseful, eerie, frightened
Quality
Entropy : 6.37
Noise : 78
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant image errors.
Terror in the Shadows: Monster Charges Towards Fearful Knight
A chilling scene unfolds as a monstrous humanoid creature, adorned with horns and sharp teeth, lunges towards a terrified armored man. The dark and blurry background amplifies the sense of danger, while a sliver of orange light in the distance offers a glimmer of hope. The contrasting expressions of fear and aggression, coupled with the dramatic use of light and shadow, create a powerful and suspenseful image.
Prompt
facial-expressions Fear: Desperation, courage ; A hero facing a monstrous creature; eye-level; Hero; a crumbling battlefield with smoke and debris; cinematic
Characteristic
Shot : A humanoid monster with horns and sharp teeth is charging at a scared-looking armored man. The background is dark and blurry, with a hint of orange light in the distance. The man’s expression is one of terror.
Aesthetic Score : 0.7
Mood : dark, intense, suspenseful
Quality
Entropy : 6.78
Noise : 72
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some aliasing is visible on the monster’s fur, and the background appears slightly blurry and lacking detail.
Fear Grips Five as Storm Unleashes Terror
A group of five individuals, huddled together in the face of an unseen threat, find themselves caught in a raging storm. Their expressions of fear and the ominous lightning strikes in the background create a palpable sense of suspense and dread.
Prompt
facial-expressions Fear: Anxiety, uncertainty ; A group of people huddled together in a darkened room; eye-level; Normal People; a storm raging outside with thunder and lightning; cinematic
Characteristic
Shot : A group of five people, four women and one man, are huddled together in fear, looking off-screen, likely at a source of danger. It appears to be a stormy night with lightning strikes in the background, and rain falling.
Aesthetic Score : 0.6
Mood : fear, suspense, tension
Quality
Entropy : 6.46
Noise : 90
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.30
Image errors : Some of the lightning strikes appear pixelated, and the rain is a bit too uniform in its distribution. The image overall is quite realistic.
Terrified in the Shadows: A Close-Up on Fear
A young man sits alone in a dimly lit room, his face etched with terror. His clasped hands and the close-up shot amplify the sense of anxiety and suspense, leaving the viewer questioning what lurks in the darkness.
Prompt
facial-expressions Fear: Shock, adrenaline ; A gamer’s hands shaking as they play a horror game; close-up; Gamer; a screen displaying a jump scare; cinematic
Characteristic
Shot : A young man is sitting in a dimly lit room, looking terrified. His hands are clasped together in front of him.
Aesthetic Score : 0.5
Mood : fear, anxiety, suspense
Quality
Entropy : 6.71
Noise : 39
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some slight noise and artifacts. It’s also a little bit out of focus.
Fear in the Storm’s Eye
A young woman, shrouded in a hoodie, stands alone in a desolate landscape, her gaze fixed on a menacing storm sky. The image evokes a sense of dread and anticipation, hinting at an impending threat.
Prompt
facial-expressions Fear: Loneliness, despair ; A lone figure standing at the edge of a cliff; eye-level; Single Person; a vast, empty landscape with a stormy sky; cinematic
Characteristic
Shot : A young woman in a hoodie is standing in a desolate landscape, looking up in fear at a stormy sky.
Aesthetic Score : 0.6
Mood : fearful, dramatic, ominous
Quality
Entropy : 6.78
Noise : 79
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.80
Image errors : The background appears slightly blurry and unrealistic. There are some minor artifacts in the woman’s hair and clothing.
Caught in the Crossfire: Man’s Shocked Expression Amidst Blazing Inferno
A man, his face stained with blood, stares directly at the camera with a look of sheer terror. The fiery backdrop behind him intensifies the dramatic and suspenseful mood, leaving viewers on the edge of their seats.
Prompt
facial-expressions Fear: Loss, determination ; A hero standing amidst a burning city; eye-level; Hero; a chaotic scene with smoke and flames; cinematic
Characteristic
Shot : A man with blood on his face is looking at the camera with a shocked expression. He is in front of a fiery backdrop.
Aesthetic Score : 0.7
Mood : dramatic, intense, suspenseful
Quality
Entropy : 6.70
Noise : 67
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is a little bit blurry, especially on the man’s face. The blood on his face is a little too dark and doesn’t look realistic.
Conclusion
The results show that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t fully capture the intended camera position in the prompt.
- Shot Analysis: The model scored 0.6, which falls within the “good” range. This indicates that the model was able to understand the scene in the prompt reasonably well.
- Aesthetic Analysis: The model scored 0.17, which is significantly higher than the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall, the model shows promise in understanding the scene and camera position, but needs improvement in generating images that match the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-3/