AI's Struggle with Facial Expressions: A Deep Dive into Generative Models with Imagen-v3-fast
- 10 minutes read - 1968 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and intentions. They play a crucial role in human communication, allowing us to understand each other’s feelings and motivations. In the realm of artificial intelligence, particularly in generative models, the ability to accurately depict facial expressions is a significant challenge. This article explores the limitations of current generative AI models in capturing the nuances of human facial expressions, highlighting the need for further development in this area. We will examine the results of a recent experiment that tested the model’s ability to generate images based on prompts that included specific facial expressions. The analysis reveals that while the model shows promise in understanding scene composition and camera positioning, it struggles to capture the desired aesthetic and emotional depth through facial expressions. This gap in performance highlights the need for further research and development in this area, particularly in the realm of understanding and generating realistic and emotionally nuanced facial expressions.
Created with: imagen-v3-fast
Lost in the Rain: A Hooded Figure Walks the Desolate Streets
A solitary figure, shrouded in darkness, navigates a deserted city street under a torrential downpour. The moody lighting and relentless rain create an atmosphere of mystery and isolation, leaving the viewer to wonder about the man’s secrets and his destination.
Prompt
facial-expressions Shame: Desolate, lonely, regretful ; A lone figure, hunched over, walking down a deserted street; eye-level; Single Person; Rain-slicked pavement and flickering streetlights; cinematic
Characteristic
Shot : A hooded man walks alone down a deserted city street at night, during a heavy downpour.
Aesthetic Score : 0.6
Mood : dark, mysterious, brooding
Quality
Entropy : 6.65
Noise : 96
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image is slightly overexposed, and the rain effect is a bit artificial.
The Weight of the World: Superman’s Silent Sorrow
A close-up of Superman’s face reveals the toll of his heroism. The setting sun casts a golden glow over the cityscape, highlighting the weariness in his eyes and the scars etched on his face. This image captures the melancholy and burden of responsibility that comes with being a hero.
Prompt
facial-expressions Shame: Melancholy, disillusioned, burdened ; A superhero, their mask removed, revealing a face etched with pain; eye-level; Hero; A cityscape bathed in the glow of a setting sun; cinematic
Characteristic
Shot : A close-up of Superman’s face, with a cityscape in the background, The setting sun is casting a golden glow over the city, suggesting the end of a battle. The man looks sorrowful and tired, showing the toll of his heroism.
Aesthetic Score : 0.7
Mood : melancholy, heroic, sad
Quality
Entropy : 6.78
Noise : 77
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is a bit blurry in some areas. There are some minor artifacts in the background cityscape and the Superman costume, suggesting some processing might be needed. Some of the details are a bit too sharp and cartoonish.
Lost in Thought: A Moment of Solitude
A woman sits alone at a restaurant table, her face hidden by her hands. The image evokes a sense of sadness, loneliness, and contemplation, creating a dramatic and poignant scene.
Prompt
facial-expressions Shame: Embarrassed, defeated, self-loathing ; A woman, her face buried in her hands, sitting alone at a crowded diner table; eye-level; Normal Person; The bustling activity of the diner, a stark contrast to her isolation; cinematic
Characteristic
Shot : A woman is sitting at a table in a restaurant, her face hidden by her hands.
Aesthetic Score : 0.5
Mood : sad, lonely, contemplative
Quality
Entropy : 6.91
Noise : 59
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is a slight blur in the background.
Lost in the Game: A Moment of Intense Focus
A man sits in a dimly lit room, his face illuminated by the screen of his video game console. He holds the controller with a serious expression, completely absorbed in the virtual world. The lighting creates a sense of mystery and suspense, highlighting the intensity of his focus.
Prompt
facial-expressions Shame: Empty, defeated, lost in a digital world ; A gamer, staring blankly at a screen, his controller lying idle; eye-level; Gamer; A dimly lit room filled with gaming paraphernalia, a sense of disconnection; cinematic
Characteristic
Shot : A man is sitting at a desk in a dimly lit room, holding a video game controller, looking down at it with a serious expression.
Aesthetic Score : 0.4
Mood : focused, serious, intense
Quality
Entropy : 6.12
Noise : 37
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
A Look of Concern in the Shadows
A man, his face etched with worry, stares directly into the camera. The dimly lit room and blurred figures in the background create an atmosphere of tension and suspicion. What secrets lie hidden in the shadows?
Prompt
facial-expressions Shame: Anxious, self-conscious, out of place ; A man, standing in a crowded room, his eyes darting nervously around; eye-level; Single Person; A party scene, filled with laughter and conversation, but he feels isolated; cinematic
Characteristic
Shot : A man in a blue shirt looks directly at the camera with a worried expression. He is in a dimly lit room with other people in the background out of focus.
Aesthetic Score : 0.6
Mood : tense, concerned, suspicious
Quality
Entropy : 6.71
Noise : 68
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is some noise in the image, particularly in the shadows.
Intense Focus: A Man’s Serious Gaze
A close-up shot captures the intensity of a man’s expression as he looks down, his serious face framed by the hood of his dark green jacket. The mood is one of contemplation and thoughtfulness, creating a sense of drama and intrigue.
Prompt
facial-expressions Shame: Disheartened, disillusioned, questioning his purpose ; A hero, standing on a rooftop, looking down at the city below; not too close; Hero; A panoramic view of the city, but he feels small and insignificant; cinematic
Characteristic
Shot : A close-up shot of a man’s face, the man is wearing a dark green jacket with a hood, his face is serious and he is looking down
Aesthetic Score : 0.7
Mood : serious, intense, thoughtful
Quality
Entropy : 6.51
Noise : 59
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.95
Image errors : The image has some minor artifacts around the edges of the man’s face and hair, but it’s not particularly noticeable. The background is also a bit blurry and lacks detail.
The Weight of Loneliness
A young woman sits alone at a kitchen table, her plate of food untouched, reflecting a sense of sadness and isolation. The image captures a moment of quiet despair, leaving the viewer to ponder the weight of her emotions.
Prompt
facial-expressions Shame: Depressed, unmotivated, lost in her thoughts ; A woman, sitting at her kitchen table, staring at a plate of untouched food; eye-level; Normal Person; A cluttered kitchen, a reflection of her inner turmoil; cinematic
Characteristic
Shot : A young woman sits at a kitchen table, looking dejected. There is a plate of food in front of her.
Aesthetic Score : 0.4
Mood : sad, lonely, pensive
Quality
Entropy : 6.80
Noise : 56
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors
Lost in the Code: A Moment of Intense Focus
A young man, bathed in the cool glow of his computer screen, is completely absorbed in his work. Headphones on, fingers flying across the keyboard, his gaze is unwavering, reflecting a deep concentration and dedication to the task at hand.
Prompt
facial-expressions Shame: Despair, addiction, a sense of being lost ; A gamer, hunched over his keyboard, his fingers flying across the keys, but his eyes are filled with sadness; eye-level; Gamer; A brightly lit gaming room, but he feels trapped in a digital world; cinematic
Characteristic
Shot : A young man is sitting in front of a computer, wearing headphones and typing on a keyboard. The room is dimly lit and the image has a slightly cool color tone.
Aesthetic Score : 0.6
Mood : focused, concentrated, serious
Quality
Entropy : 6.35
Noise : 44
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable artifacts or errors. The image is well-exposed and sharp.
Lost in Thought: A Moment of Introspection
A man in a green jacket stands alone, his gaze downcast, against a blurred backdrop of two other figures. The scene evokes a sense of pensive contemplation and a hint of melancholy, leaving the viewer to wonder about his thoughts and the story behind his solitude.
Prompt
facial-expressions Shame: Rejected, isolated, a sense of being unwanted ; A man, walking away from a group of people, his head down, his shoulders slumped; eye-level; Single Person; A bustling street, but he feels alone and invisible; cinematic
Characteristic
Shot : A man in a green jacket is standing in front of a blurred background of two other men. The man appears to be thoughtful or melancholic.
Aesthetic Score : 0.6
Mood : pensive, introspective, moody
Quality
Entropy : 6.81
Noise : 58
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible image errors.
A Warrior’s Gaze: Intensity and Determination in a Medieval Portrait
This close-up portrait captures the serious expression of a man in armor, his gaze intense and determined. The blurred background hints at a medieval cityscape, adding to the sense of drama and tension created by the lighting and the subject’s powerful presence.
Prompt
facial-expressions Shame: Guilt, regret, a sense of responsibility ; A hero, standing in the ruins of a battle, his armor dented and his face covered in grime; not too close; Hero; A scene of destruction, a reminder of the cost of his actions; cinematic
Characteristic
Shot : Close up portrait of a man in armor with a serious expression. The background is blurry and out of focus, with a suggestion of a medieval cityscape.
Aesthetic Score : 0.7
Mood : serious, intense, determined
Quality
Entropy : 6.65
Noise : 75
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some minor artifacts, particularly around the man’s hair and beard.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.36, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t fully capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.63, which falls within the “good” range. This indicates that the model was able to understand the scene described in the prompt and create a shot that aligns with it.
- Aesthetic Analysis: The model scored 0.20, which is significantly below the “very good” range of -0.2 to 0.1. This suggests that the generated image didn’t match the expected aesthetic style described in the prompt.
Overall, the model shows promise in understanding scene composition and camera positioning, but needs improvement in capturing the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-3/