AI's Facial Expressions: A Mixed Bag of Success with Imagen-v2
- 9 minutes read - 1785 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and intentions in visual storytelling. Generative AI models are increasingly being used to create images with realistic facial expressions, but how well do they capture the nuances of human emotion? This blog post delves into the performance of a generative AI model in understanding and generating facial expressions across a range of scenes and aesthetics. We’ll explore the model’s strengths and weaknesses, analyzing its ability to capture camera position, shot composition, and overall aesthetic appeal.
Created with: imagen-v2
Lost in the Neon Maze: A Woman’s Worried Gaze in a City of Secrets
A woman stands alone in a bustling, neon-lit street, her worried expression hinting at a hidden story. The vibrant lights and the crowd’s anonymity create a sense of suspense and mystery, leaving you wondering what secrets lie beneath the surface.
Prompt
facial-expressions Confusion: Disoriented, overwhelmed ; A lone figure; eye-level; Single Person; a bustling city street with neon signs and crowds; cinematic
Characteristic
Shot : A woman stands in a bustling city street, with neon signs and crowds in the background. Her face is illuminated by the artificial lights.
Aesthetic Score : 0.7
Mood : mysterious, urban, melancholic
Quality
Entropy : 6.74
Noise : 92
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.80
Image errors : The skin texture appears artificial. The lighting seems too intense, and the color balance feels unnatural, likely due to heavy editing.
A Lone Warrior Contemplates the Vast Desert
An epic and adventurous scene unfolds as a solitary warrior stands atop a rocky outcrop, gazing out over a desolate desert landscape. A small oasis shimmers in the distance, offering a glimmer of hope amidst the vast emptiness. The warrior’s pose and the dramatic scale of the surroundings evoke a sense of solitude and the promise of thrilling adventures to come.
Prompt
facial-expressions Confusion: Doubt, uncertainty ; A lone adventurer, their worn leather armor patched with scavenged materials, stands atop a crumbling stone tower. The wind whips through the ruins of a forgotten city, carrying the scent of dust and decay. In the distance, a shimmering oasis shimmers in the harsh desert sun.; cinematic
Characteristic
Shot : A lone female warrior stands on a rocky cliff in a desert landscape. There is a green oasis in the distance.
Aesthetic Score : 0.7
Mood : epic, dramatic, mysterious
Quality
Entropy : 6.62
Noise : 108
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.80
Image errors : There are no major image errors, but the textures on the character’s clothing and the rocks look slightly artificial.
A Look of Concern in the Face of Uncertainty
A woman in a suit, her face etched with worry, gazes upwards in an office setting. The blurred background adds to the sense of tension and anticipation, leaving the viewer wondering what she is looking at and what the future holds.
Prompt
facial-expressions Confusion: Lost, unmoored ; A woman in a business suit; eye-level; Normal People; a sterile office with fluorescent lights and cubicles; cinematic
Characteristic
Shot : A woman in a business suit is looking upwards, likely in an office setting.
Aesthetic Score : 0.7
Mood : serious, intense, apprehensive
Quality
Entropy : 6.91
Noise : 94
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors are present.
Caught in the Moment: A Face of Intense Focus and Surprise
A close-up shot captures a young man lost in his world, headphones on, his expression a blend of concentration and surprise. The intensity of the moment is palpable, leaving the viewer on the edge of their seat, wondering what unfolds next.
Prompt
facial-expressions Confusion: Frustration, bewilderment ; A gamer with headphones on; close-up; Gamer; a dimly lit room with a computer screen displaying a complex game interface; cinematic
Characteristic
Shot : Close-up portrait of a young man wearing headphones, looking slightly worried or surprised.
Aesthetic Score : 0.6
Mood : intense, focused, suspenseful
Quality
Entropy : 6.14
Noise : 93
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears slightly over-sharpened, leading to some halos around edges. Some minor noise is present, particularly in the darker areas.
The Shadow in the City
A man shrouded in mystery, his fedora casting a shadow over his intense gaze. The city lights blur behind him, adding to the air of intrigue and danger. This image evokes a sense of brooding mystery, leaving you wondering what secrets lie hidden in the shadows.
Prompt
facial-expressions Confusion: Suspicious, wary ; A man in a trench coat; eye-level; Single Person; a foggy alleyway with flickering streetlights; cinematic
Characteristic
Shot : A man in a fedora and trench coat stands in a dimly lit environment with an out-of-focus light source behind him.
Aesthetic Score : 0.8
Mood : mysterious, intense, film noir
Quality
Entropy : 6.81
Noise : 70
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image appears to be slightly over-sharpened, resulting in some artifacts in the face and clothing textures. The lighting is slightly artificial, and the overall image has a slightly flat and staged feel.
The Knight’s Watch: A Shadowy Gaze in the Forest
A knight in full armor stands amidst a dark, foreboding forest, his gaze fixed directly on the viewer. The scene is steeped in mystery and tension, leaving you wondering what secrets lie hidden in the shadows.
Prompt
facial-expressions Confusion: Disillusioned, lost ; A knight in shining armor; eye-level; Hero; a dark forest with twisted trees and ominous shadows; cinematic
Characteristic
Shot : A knight in armor, likely in a forest, with a dramatic, moody lighting.
Aesthetic Score : 0.7
Mood : dramatic, mysterious, serious
Quality
Entropy : 6.47
Noise : 110
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.50
Image errors : The image has some noise and artifacting, especially in the shadows.
Family Tension: A Messy Kitchen Reflects a Troubled Home
A snapshot of a family gathered around a cluttered table, their body language speaks volumes of tension and discomfort. The messy kitchen setting amplifies the sense of chaos and stress, hinting at a heated moment within the family.
Prompt
facial-expressions Confusion: Awkward, uncomfortable ; A family at a dinner table; eye-level; Normal People; a brightly lit kitchen with mismatched plates and silverware; cinematic
Characteristic
Shot : A family sits around a kitchen table in a cluttered kitchen. There are dishes and other things on the table, including a glass pitcher. The people in the image look like they are having a tense conversation.
Aesthetic Score : 0.5
Mood : tense, uneasy, uncomfortable
Quality
Entropy : 6.78
Noise : 110
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry and there is some graininess.
Adrenaline Rush: Gamer’s Shock at the Edge of Victory
A young woman’s face is etched with surprise and focus as she navigates a thrilling video game. The explosion on the TV screen and the blurred background create a sense of intense action and suspense, capturing the raw emotion of a close call in the digital world.
Prompt
facial-expressions Confusion: Overwhelmed, disoriented ; A gamer holding a controller; close-up; Gamer; a brightly lit room with a TV screen displaying a chaotic game scene; cinematic
Characteristic
Shot : A young woman, possibly in her 20s, with blonde hair, is playing a video game, looking at the screen in the background. She is holding a game controller in her hands.
Aesthetic Score : 0.6
Mood : intense, focused, suspenseful
Quality
Entropy : 6.58
Noise : 60
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.60
Image errors : There are some minor artifacts in the woman’s hair, particularly around her forehead. The lighting in the scene also appears somewhat uneven.
Lost in the City: A Moment of Anxiety
A woman stands amidst the bustling city, her worried gaze fixed on something unseen. The blurred background and low lighting heighten the sense of suspense, leaving the viewer wondering what has caused her distress.
Prompt
facial-expressions Confusion: Lost, alienated ; A woman walking down a crowded street; eye-level; Single Person; a bustling city street with people rushing past; cinematic
Characteristic
Shot : A woman with short brown hair is standing in a city street, looking up with a worried expression. The background is blurred, suggesting a bustling crowd and a sense of urgency.
Aesthetic Score : 0.7
Mood : suspenseful, anxious, worried
Quality
Entropy : 6.86
Noise : 80
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors
Superman Stands Tall, Hopeful Against the Night
A dramatic image captures Superman, bathed in moonlight, gazing upwards with a determined expression. The city lights below and the vastness of the night sky create a sense of heroic grandeur and hopeful anticipation.
Prompt
facial-expressions Confusion: Doubt, questioning ; A superhero standing on a rooftop; eye-level; Hero; a cityscape with twinkling lights and a full moon; cinematic
Characteristic
Shot : A man dressed as Superman stands against a cityscape and a large full moon, looking upward with a pensive expression.
Aesthetic Score : 0.7
Mood : heroic, contemplative, dramatic
Quality
Entropy : 6.56
Noise : 88
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.80
Image errors : The subject’s skin appears slightly plastic and unreal. There are some subtle artifacts in the background, especially around the cityscape.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.33, which is below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.54, which is considered good. This indicates that the model was able to understand the scene and create a shot that was somewhat aligned with the prompt.
- Aesthetic Analysis: The model scored 0.11, which is considered very good. This means that the generated image’s aesthetic closely matched the expected aesthetic described in the prompt.
Overall, the model demonstrated a good understanding of the scene and its aesthetic, but struggled with accurately capturing the intended camera position.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-2/