AI Struggles to Capture Facial Expressions in Images with Stability-ai-ultra
- 10 minutes read - 1918 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and intentions in visual storytelling. They can add depth and realism to characters, enhancing the viewer’s engagement with the narrative. However, generating images with accurate and nuanced facial expressions remains a challenge for generative AI models. This blog post examines a case study where a model struggled to capture the desired facial expressions, highlighting the limitations and potential solutions in this area.
Created with: stability-ai-ultra
A Solitary Figure Contemplates the Fury of the Storm
A lone figure stands defiant against the elements, silhouetted against a backdrop of churning waves and dark, brooding clouds. The dramatic contrast between the individual and the powerful forces of nature creates a sense of awe and vulnerability, capturing the raw power of the storm.
Prompt
facial-expressions Disagreement: Melancholy, isolated, conflicted ; A lone figure standing on a clifftop, looking out at a stormy sea; eye-level; Single Person; Dramatic, stormy sky with crashing waves; cinematic
Characteristic
Shot : A lone figure stands on a rocky cliff overlooking a stormy sea, with large, dark clouds overhead and waves crashing in the foreground.
Aesthetic Score : 0.8
Mood : dramatic, melancholic, powerful
Quality
Entropy : 6.84
Noise : 91
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some slight noise and a bit of over-sharpening in some areas, especially in the sky and waves.
Superman Faces Down Disaster, Crowd Flees in Panic
A fiery inferno rages in the background as Superman stands resolute, facing the camera with a serious expression. The fleeing crowd and the billowing smoke create a sense of urgency and danger, highlighting the gravity of the situation.
Prompt
facial-expressions Disagreement: Urgent, conflicted, determined ; A superhero, cape billowing in the wind, standing in front of a burning building, looking at a group of people fleeing; eye-level; Hero; City skyline with smoke and flames; cinematic
Characteristic
Shot : Superman stands in the foreground with a determined expression. In the background, a city street is on fire, with people fleeing in panic. The image has a dramatic and heroic feel.
Aesthetic Score : 0.7
Mood : dramatic, heroic, tense
Quality
Entropy : 6.76
Noise : 80
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slightly grainy texture and some noise. The fire in the background appears a bit artificial.
The Silent Storm: A Couple’s Heated Exchange in Dimly Lit Restaurant
A tense moment unfolds in a dimly lit restaurant as a couple engages in a heated conversation. The man’s gaze is fixed on the woman, who averts her eyes with a furrowed brow. The close-up framing and dramatic lighting heighten the intensity and intimacy of the scene, leaving the viewer to wonder what secrets lie beneath the surface.
Prompt
facial-expressions Disagreement: Angry, tense, frustrated ; A couple arguing in a crowded restaurant, their faces close together; close-up; Normal People; Busy restaurant interior with other diners; cinematic
Characteristic
Shot : A couple is having a tense conversation at a restaurant. There are other people in the background, but they are out of focus.
Aesthetic Score : 0.6
Mood : tense, dramatic, confrontational
Quality
Entropy : 6.80
Noise : 90
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are some minor artifacts in the background, and the image is slightly overexposed.
Lost in the Neon Glow: A Gamer’s Intense Focus
A young man, bathed in the cool light of his monitor, is completely absorbed in his game. The dimly lit room, punctuated by neon accents, adds to the edgy and intense atmosphere, making it feel like he’s on the edge of his seat in a thrilling virtual world.
Prompt
facial-expressions Disagreement: Frustrated, intense, focused ; A gamer, hunched over a computer screen, furiously clicking a mouse; close-up; Gamer; Dark room with glowing computer screen and peripherals; cinematic
Characteristic
Shot : A man is gaming in a room with pink and blue lighting. He is sitting in front of a computer and looking intently at the screen. There are large speakers next to the computer.
Aesthetic Score : 0.6
Mood : intense, focused, cool
Quality
Entropy : 6.35
Noise : 69
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is a bit blurry. The lighting is also a bit harsh, making the image look overexposed in some areas. There is a bit of noise in the image, especially in the darker areas.
Lost in the Moment: A Woman Finds Peace in a Busy Cafe
A young woman, wrapped in a cozy grey sweater, sits alone at a cafe table, her gaze fixed on her phone. The bustling cafe fades into a blur behind her, highlighting her quiet contemplation. The scene evokes a sense of both intimacy and isolation, capturing the fleeting moments of peace we find in the midst of everyday life.
Prompt
facial-expressions Disagreement: Disappointed, lonely, withdrawn ; A woman sitting alone in a coffee shop, staring at a phone with a blank expression; eye-level; Single Person; Cozy coffee shop interior with other patrons; cinematic
Characteristic
Shot : A young woman is sitting at a cafe table, looking at her phone. There are other people in the background, but they are out of focus. The lighting is warm and inviting.
Aesthetic Score : 0.7
Mood : calm, introspective, peaceful
Quality
Entropy : 6.77
Noise : 84
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors or artifacts
Shadowed Secrets: A Man’s Suspicious Glance in a Gritty Alley
A lone figure, clad in leather, stands in a dimly lit alleyway, his gaze fixed on something unseen. The narrow space, adorned with graffiti, amplifies the sense of suspense and danger. This image evokes a feeling of mystery and intrigue, leaving you wondering what secrets lurk in the shadows.
Prompt
facial-expressions Disagreement: Confident, determined, defiant ; A hero, standing in a dark alleyway, looking at a villain with a determined expression; eye-level; Hero; Dark, gritty alleyway with shadows and graffiti; cinematic
Characteristic
Shot : A man in a leather jacket is standing in a dark alley, looking at someone off-camera.
Aesthetic Score : 0.6
Mood : suspenseful, mysterious, gritty
Quality
Entropy : 6.53
Noise : 86
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly grainy and the shadows are a bit harsh.
The Argument in the Park
A tense confrontation unfolds between two young people in a park, their faces close-up and filled with emotion. The blurred background isolates them, emphasizing the intensity of their argument.
Prompt
facial-expressions Disagreement: Angry, frustrated, heated ; A group of friends arguing in a park, their voices raised; medium shot; Normal People; Sunny park with trees and benches; cinematic
Characteristic
Shot : A young couple is arguing in a park, while other people are sitting on a bench in the background.
Aesthetic Score : 0.6
Mood : tense, dramatic, conflict
Quality
Entropy : 6.83
Noise : 84
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, particularly in the background. The lighting is also a bit flat and lacking in contrast.
Victory! Gamer’s Excitement Explodes in a Burst of Red and Blue
This image captures the raw energy of a gamer’s triumph. The man’s ecstatic expression, fist pump, and the vibrant red and blue lighting create a sense of intense excitement and drama. The two computer screens, one showcasing the game and the other blurred, add to the immersive atmosphere.
Prompt
facial-expressions Disagreement: Frustrated, angry, defeated ; A gamer, slamming his fist on a desk, yelling at the computer screen; close-up; Gamer; Brightly lit gaming room with multiple monitors; cinematic
Characteristic
Shot : A man is playing a video game and is getting very excited, screaming in triumph. He is lit with red and blue light.
Aesthetic Score : 0.6
Mood : excited, intense, dramatic
Quality
Entropy : 6.75
Noise : 69
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some slight compression artifacts and noise. The colors are a bit oversaturated.
Lost in the City Lights
A solitary figure navigates a bustling cityscape, their path obscured by the blur of urban life. The melancholy mood and dramatic use of blur evoke a sense of anonymity and isolation, leaving the viewer to ponder the figure’s thoughts and journey.
Prompt
facial-expressions Disagreement: Sad, lonely, rejected ; A man walking away from a group of people, his head down; long shot; Single Person; Busy city street with people walking by; cinematic
Characteristic
Shot : A man in a black jacket and brown backpack is walking in a crowded city street. The street is lined with buildings and shops, and the scene is lit by streetlights and neon signs.
Aesthetic Score : 0.5
Mood : urban, anonymous, everyday
Quality
Entropy : 6.94
Noise : 74
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : No errors detected
Silhouetted Against the Stars: A Moment of Contemplation
A solitary figure stands on a rooftop, bathed in the soft glow of city lights below. The vast expanse of the night sky, dotted with twinkling stars, creates a sense of both isolation and wonder. This image evokes a mood of melancholy, serenity, and contemplation, leaving the viewer to ponder the figure’s thoughts and the mysteries of the universe.
Prompt
facial-expressions Disagreement: Thoughtful, conflicted, determined ; A hero, standing on a rooftop, looking at a city skyline with a conflicted expression; eye-level; Hero; City skyline at night with twinkling lights; cinematic
Characteristic
Shot : A solitary figure stands on a rooftop overlooking a nighttime cityscape, illuminated by twinkling lights. The sky is filled with stars and the silhouette of the person is captured in the foreground, conveying a sense of solitude and contemplation.
Aesthetic Score : 0.8
Mood : melancholy, contemplative, peaceful
Quality
Entropy : 6.45
Noise : 62
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 1.00
Image errors : No visible artifacts or errors in the image.
Conclusion
The results show that the generative AI model performed okay in terms of camera position and shot analysis, but not so well in terms of aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.25, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t quite capture the intended camera positions described in the prompt.
- Shot Analysis: The model scored 0.43, also below the “good” range. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflects it.
- Aesthetic Analysis: The model scored 0.03, which is far from the “very good” range of -0.2 to 0.1. This means that the generated image’s aesthetic significantly deviated from the expected aesthetic based on the prompt.
Overall, the model struggled to accurately interpret the prompt and create an image that aligns with the desired camera positions, scene, and aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://stability.ai