AI's Facial Expressions: A Step Forward, But Still Room for Growth with Flux-dev
- 9 minutes read - 1776 wordsTable of Contents
Facial expressions are a powerful tool in storytelling, conveying emotions and adding depth to characters. In the realm of AI-generated imagery, capturing realistic and nuanced facial expressions remains a challenge. This blog post examines the results of a generative AI model tasked with creating images based on specific scenes and camera angles, focusing on the model’s ability to generate convincing facial expressions. We’ll explore the model’s strengths and weaknesses, highlighting the areas where it excels and where it still needs improvement.
Created with: flux-dev
Lost in Thought: A Moment of Quiet Contemplation
A woman with long dark hair sits in a cafe, her gaze fixed on something beyond the frame. Her pensive expression and relaxed posture evoke a sense of peaceful introspection. The scene captures a fleeting moment of quiet contemplation, inviting viewers to share in her quiet reflection.
Prompt
facial-expressions Embarrassment: Awkward and self-conscious ; A single woman; eye-level; Single Persons; A crowded cafe with loud chatter and laughter; cinematic
Characteristic
Shot : A young woman is sitting in a cafe, looking thoughtful and slightly sad.
Aesthetic Score : 0.6
Mood : melancholy, contemplative, introspective
Quality
Entropy : 6.52
Noise : 54
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor artifacts and noise in the background, particularly around the woman’s hair.
A Moment of Reflection: Bride’s Pensive Elegance in Dimly Lit Room
A bride in a flowing white gown stands amidst a softly lit gathering, her expression hinting at a mix of romance and contemplation. The dim lighting casts a warm glow, enhancing the intimate and elegant atmosphere of the scene.
Prompt
facial-expressions Embarrassment: Lonely and out of place ; A woman in a wedding dress; eye-level; Single Persons; A crowded wedding reception with happy couples; cinematic
Characteristic
Shot : A bride stands in the middle of the frame, looking directly at the camera, at a wedding reception, with guests behind her, the lighting is warm and soft, creating a romantic atmosphere
Aesthetic Score : 0.7
Mood : romantic, elegant, formal
Quality
Entropy : 6.70
Noise : 57
Prompt Clip Score : 0.17
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to have a slight noise artifact in the background, especially noticeable in the areas of the walls
A Moment of Anticipation at a Formal Gathering
A man in a tuxedo stands amidst a crowd, his gaze fixed upwards, creating a sense of mystery and anticipation. The formal setting and his contemplative pose suggest a moment of heightened awareness, leaving the viewer wondering what has captured his attention.
Prompt
facial-expressions Embarrassment: Mortified and ashamed ; A man in a business suit; eye-level; Normal People; A formal dinner party with elegant guests; cinematic
Characteristic
Shot : A man in a tuxedo is seated at a formal event, likely a wedding or gala.
Aesthetic Score : 0.7
Mood : elegant, sophisticated, anticipation
Quality
Entropy : 6.57
Noise : 56
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight noise in the background, particularly noticeable around the man’s hair.
An Evening of Elegance and Intrigue
Experience the allure of a romantic, intimate setting as a man in a tuxedo is seated at a softly lit table, his gaze inviting you into a world of mystery and elegance. The warm, blurred background adds to the enchanting atmosphere, creating a scene that is both captivating and intimate.
Prompt
facial-expressions Embarrassment: Awkward and uncomfortable ; A man in a tuxedo; eye-level; Single Persons; A romantic dinner for two with candles and flowers; cinematic
Characteristic
Shot : A man in a tuxedo is sitting in a dimly lit room with candles on the table. The background is blurry, and the focus is on the man’s face.
Aesthetic Score : 0.7
Mood : mysterious, intimate, formal
Quality
Entropy : 6.53
Noise : 50
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry in the background. The lighting is uneven, with some areas appearing too dark. The man’s face is also slightly overexposed in some areas.
Superman Stands Tall, A City’s Hope in Focus
A solitary figure in a sea of blur, Superman stands resolute in a city street. The shallow depth of field emphasizes his heroic presence, conveying a sense of power and isolation. The mood is serious, yet hopeful, as he embodies the city’s last bastion of strength.
Prompt
facial-expressions Embarrassment: Humiliated and exposed ; A superhero in a full costume; eye-level; Heroes; A bustling city street with people staring; cinematic
Characteristic
Shot : A man dressed as Superman is standing in a city street, with blurred out background people.
Aesthetic Score : 0.7
Mood : heroic, confident, dramatic
Quality
Entropy : 6.73
Noise : 69
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : No notable image errors.
Young Superman Gazes Hopefully into the Future
A child actor, dressed as Superman, looks up with a hopeful and innocent expression, creating a sense of anticipation and wonder. The image evokes a feeling of excitement and optimism, suggesting a bright future ahead.
Prompt
facial-expressions Embarrassment: Embarrassed and self-conscious ; A superhero in a cape; eye-level; Heroes; A cheering crowd at a victory parade; cinematic
Characteristic
Shot : A young boy, possibly dressed as Superman, is looking up at something. There are other people in the background but they are out of focus. The setting is unclear but it looks like an outdoor event.
Aesthetic Score : 0.8
Mood : playful, hopeful, curious
Quality
Entropy : 6.79
Noise : 66
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, and the colors are a little bit washed out. There is some noise in the image.
Lost in Thought: A Woman’s Pensive Gaze
A woman in a white blouse sits at a restaurant table, her hands resting on her chin as she gazes thoughtfully at the camera. Her pensive expression and posture create an air of mystery and intrigue, inviting the viewer to ponder her thoughts.
Prompt
facial-expressions Embarrassment: Uncomfortable and out of place ; A woman in a casual outfit; eye-level; Normal People; A fancy restaurant with white tablecloths and expensive wine; cinematic
Characteristic
Shot : A woman in a white blazer sits at a table in a dimly lit restaurant, resting her chin on her hands and looking pensively at the camera.
Aesthetic Score : 0.6
Mood : pensive, elegant, mysterious
Quality
Entropy : 6.88
Noise : 65
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors.
Lost in Thought: A Moment of Mystery
A young man, shrouded in shadow, sits alone in a dimly lit room. His serious expression and the blurred figures in the background create an air of intrigue and introspection. The low-key lighting adds to the sense of mystery, leaving the viewer wondering what secrets lie within this enigmatic scene.
Prompt
facial-expressions Embarrassment: Humiliated and defeated ; A gamer in a hoodie; eye-level; Gamer; A crowded esports tournament with loud cheers and flashing lights; cinematic
Characteristic
Shot : A young man wearing a blue hoodie is sitting in a dimly lit room with a lot of colorful lights in the background.
Aesthetic Score : 0.7
Mood : mysterious, pensive, edgy
Quality
Entropy : 6.50
Noise : 67
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, causing the subject’s face to appear washed out. The colors are also a bit oversaturated, making the image look artificial.
Lost in the Shadows: A Gamer’s Despair
A dimly lit room, a gaming setup, and a figure slumped in a chair, their face hidden by their hands. The image captures the raw emotion of defeat and loneliness, leaving the viewer to ponder the gamer’s hidden struggles.
Prompt
facial-expressions Embarrassment: Cringing and defeated ; A gamer in a gaming chair; eye-level; Gamer; A dimly lit room with flashing screens and empty pizza boxes; cinematic
Characteristic
Shot : A person sitting in a gaming chair in a dimly lit room, with their face hidden by their hands, looking distressed.
Aesthetic Score : 0.4
Mood : sad, lonely, despair
Quality
Entropy : 6.50
Noise : 60
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly blurry and the colors are a bit washed out.
The Bat Signal Shines: A Moment of Intensity
A brooding figure in a dark Batman costume stares directly at the camera, his face etched with seriousness. The dimly lit room and the blurred figures in the background create a sense of suspense and mystery, hinting at a dramatic event unfolding. The microphone in the foreground adds to the intensity, leaving the viewer wondering what secrets lie behind the mask.
Prompt
facial-expressions Embarrassment: Mortified and ashamed ; A superhero in a mask; eye-level; Heroes; A news conference with reporters asking difficult questions; cinematic
Characteristic
Shot : A man dressed as Batman is looking intensely at the camera, his expression is menacing.
Aesthetic Score : 0.6
Mood : intense, dark, serious
Quality
Entropy : 6.78
Noise : 61
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly grainy, with visible noise.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t fully capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.67, which falls within the “good” range. This indicates that the model was able to understand the scene and create a shot that was relatively close to what was described in the prompt.
- Aesthetic Analysis: The model scored 0.08, which is far from the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic significantly deviated from the expected aesthetic described in the prompt.
Overall, the model demonstrated a decent understanding of the scene and shot composition, but struggled to achieve the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux/dev/api