AI's Facial Expressions: A Mixed Bag of Success with Imagen-v3-fast
- 9 minutes read - 1800 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and intentions. In the realm of generative AI, the ability to accurately depict these expressions is crucial for creating realistic and engaging images. This blog post delves into the performance of a generative AI model in capturing facial expressions across diverse scenes, analyzing its strengths and weaknesses in understanding camera position, shot composition, and aesthetic style. We’ll explore examples of how the model excels in certain areas while highlighting areas where it needs improvement, providing insights into the ongoing development of AI-powered image generation.
Created with: imagen-v3-fast
Lost in Thought: A Moment of Unease in a Busy Cafe
A young woman with long brown hair sits alone in a cafe, her concerned expression and the blurred background hinting at a hidden worry. The scene evokes a sense of pensive thought and unspoken anxieties, leaving the viewer to wonder what troubles her mind.
Prompt
facial-expressions Embarrassment: Awkward and self-conscious ; A single woman; eye-level; Single Persons; A crowded cafe with loud chatter and laughter; cinematic
Characteristic
Shot : A young woman with long brown hair is seated in a cafe, looking concerned and off to the side
Aesthetic Score : 0.6
Mood : pensive, concerned, thoughtful
Quality
Entropy : 6.83
Noise : 52
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some slight blurriness, especially in the background
Superman Faces Crisis in the Heart of the City
A tense scene unfolds as Superman, his iconic symbol emblazoned on his chest, stands amidst a throng of distressed citizens. The mood is heavy with intensity and drama, highlighting the weight of responsibility that rests on the Man of Steel’s shoulders.
Prompt
facial-expressions Embarrassment: Humiliated and exposed ; A superhero in a full costume; eye-level; Heroes; A bustling city street with people staring; cinematic
Characteristic
Shot : Superman, in his costume, stands in the middle of a city street, surrounded by a crowd of people, seemingly in distress.
Aesthetic Score : 0.6
Mood : intense, dramatic, serious
Quality
Entropy : 6.62
Noise : 64
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to be slightly over-sharpened, and there are some minor compression artifacts present.
A Shadow of Doubt: Anxiety at a Formal Gathering
A man in a suit, his face etched with worry, stands amidst the formality of an event. The scene is tense, the mood heavy with anticipation. His isolated figure and the dramatic framing create a sense of unease, leaving the viewer wondering what secrets lie beneath the surface.
Prompt
facial-expressions Embarrassment: Mortified and ashamed ; A man in a business suit; eye-level; Normal People; A formal dinner party with elegant guests; cinematic
Characteristic
Shot : A man in a suit is looking worried or anxious, likely at a formal event.
Aesthetic Score : 0.7
Mood : tense, worried, formal
Quality
Entropy : 6.51
Noise : 39
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.00
Image errors : No major artifacts or errors are visible in the image.
Lost in the Code: A Moment of Intense Focus
A young man, headphones on, is completely absorbed in his work, the dim lighting highlighting his concentrated expression. The scene captures the intensity and immersion of a tech-focused moment.
Prompt
facial-expressions Embarrassment: Cringing and defeated ; A gamer in a gaming chair; eye-level; Gamer; A dimly lit room with flashing screens and empty pizza boxes; cinematic
Characteristic
Shot : A young man wearing headphones is seated in a dimly lit room, concentrating on a computer screen.
Aesthetic Score : 0.6
Mood : focused, intense, tech
Quality
Entropy : 6.17
Noise : 45
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight grainy texture, likely from the low lighting and/or compression.
A Moment of Reflection: Bride’s Thoughtful Gaze Amidst the Celebration
A bride stands alone in a dimly lit room, her gaze fixed on something beyond the camera’s view. Her thoughtful expression and the muted lighting create a sense of mystery and introspection, hinting at a deeper story unfolding amidst the joyous wedding reception.
Prompt
facial-expressions Embarrassment: Lonely and out of place ; A woman in a wedding dress; eye-level; Single Persons; A crowded wedding reception with happy couples; cinematic
Characteristic
Shot : A bride standing in a dimly lit room, looking away from the camera with a thoughtful expression. Other people are visible in the background, suggesting a wedding reception.
Aesthetic Score : 0.6
Mood : melancholy, thoughtful, introspective
Quality
Entropy : 6.63
Noise : 45
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears slightly overexposed, causing some details in the background to be lost. There is some noise in the shadows, likely from the lighting conditions.
Superman Inspires Hope Amidst the Crowd
A powerful image captures the essence of heroism as a Superman figure stands tall, bathed in dramatic lighting, inspiring a crowd with their raised arms. The scene evokes feelings of hope and determination, showcasing the enduring power of a symbol.
Prompt
facial-expressions Embarrassment: Embarrassed and self-conscious ; A superhero in a cape; eye-level; Heroes; A cheering crowd at a victory parade; cinematic
Characteristic
Shot : A man dressed as Superman stands in front of a crowd of people with their arms raised in the air
Aesthetic Score : 0.6
Mood : heroic, hopeful, dramatic
Quality
Entropy : 6.46
Noise : 55
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some slight blurring and graininess.
Lost in Thought: A Moment of Solitude in a Dimly Lit Restaurant
A woman sits alone at a table, her face etched with contemplation. The selective focus draws attention to her pensive expression, highlighting a sense of loneliness and introspection in the dimly lit restaurant.
Prompt
facial-expressions Embarrassment: Uncomfortable and out of place ; A woman in a casual outfit; eye-level; Normal People; A fancy restaurant with white tablecloths and expensive wine; cinematic
Characteristic
Shot : A woman sits at a table in a dimly lit restaurant. She is the only one in focus and appears to be in deep thought. The background is blurred out, highlighting the woman’s expression.
Aesthetic Score : 0.7
Mood : melancholy, introspective, thoughtful
Quality
Entropy : 6.49
Noise : 56
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is a slight blurring around the edges of the image, particularly noticeable in the background, this could be due to noise reduction or post processing.
Lost in Thought: A Moment of Introspection
A young man, shrouded in a blue hoodie, stands alone in the vastness of a stadium or concert hall. His head is bowed, his gaze fixed on something unseen, lost in a world of contemplation. The low-angle shot emphasizes his isolation, capturing a moment of profound introspection.
Prompt
facial-expressions Embarrassment: Humiliated and defeated ; A gamer in a hoodie; eye-level; Gamer; A crowded esports tournament with loud cheers and flashing lights; cinematic
Characteristic
Shot : A young man in a blue hoodie, head down, looking contemplative, in front of a blurred background of a stadium or concert hall
Aesthetic Score : 0.6
Mood : pensive, reflective, introspective
Quality
Entropy : 6.43
Noise : 54
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors.
A Moment of Uncertainty
A man in a tuxedo, bathed in the soft glow of candlelight, sits alone at a table, a glass of wine in hand. His gaze is fixed on something unseen, his expression a mixture of concern and contemplation. The low-key lighting adds to the sense of drama and tension, leaving the viewer to wonder what secrets lie beneath the surface.
Prompt
facial-expressions Embarrassment: Awkward and uncomfortable ; A man in a tuxedo; eye-level; Single Persons; A romantic dinner for two with candles and flowers; cinematic
Characteristic
Shot : A man in a tuxedo is sitting at a table with a glass of wine. There are candles on the table. He is looking off to the side with a concerned or slightly anxious expression.
Aesthetic Score : 0.6
Mood : serious, anxious, contemplative
Quality
Entropy : 6.21
Noise : 33
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no major errors in the image.
Superman Faces the Press Amidst Growing Tensions
A tense atmosphere surrounds Superman as he fields questions from reporters, the blurred background hinting at a deeper story unfolding behind the scenes. The contrast between his iconic costume and the serious expressions of those present adds to the dramatic effect.
Prompt
facial-expressions Embarrassment: Mortified and ashamed ; A superhero in a mask; eye-level; Heroes; A news conference with reporters asking difficult questions; cinematic
Characteristic
Shot : A man dressed as Superman is being interviewed by reporters. He is wearing a blue and red superhero costume with a yellow ‘S’ on the chest. The reporters are holding microphones in front of him. The background is blurred, but it appears to be a dark room with a couple of men standing behind the Superman.
Aesthetic Score : 0.7
Mood : serious, dramatic, tense
Quality
Entropy : 6.48
Noise : 58
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible errors in the image.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.25, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.615, which is considered good. This indicates that the model was able to understand the scene and create a shot that was relatively close to what was described in the prompt.
- Aesthetic Analysis: The model scored 0.09, which is considered very good. This means that the generated image’s aesthetic was very close to the expected aesthetic described in the prompt.
Overall, the model seems to be better at understanding the scene and creating a shot that matches the prompt, but it needs improvement in accurately capturing the intended camera position. The model’s ability to create an image with the desired aesthetic is a strong point.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-3/