AI's Struggle with Facial Expressions: A Tale of Two Cities with Stable-diffusion
- 9 minutes read - 1764 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and intentions. They play a crucial role in human communication, adding depth and nuance to our interactions. However, for AI models, accurately capturing these subtle expressions remains a significant challenge. This blog post explores the limitations of generative AI models in understanding and depicting facial expressions, using a case study to illustrate the complexities involved. We’ll examine how these models perform in different aspects of image generation, including camera position, scene composition, and aesthetic style. By understanding these limitations, we can gain valuable insights into the ongoing development of AI and its potential to create more realistic and emotionally engaging content.
Created with: stability-ai-core
A Shadow in the Rain: Mystery and Foreboding on a City Street
A solitary figure, shrouded in a dark coat, stands amidst the falling rain on a city street. The scene evokes a sense of mystery and somberness, with the man’s expression adding a layer of foreboding to the atmosphere.
Prompt
facial-expressions Guilt: Desolate, regretful ; A lone figure; eye-level; Single Person; Empty street at night, rain falling; cinematic
Characteristic
Shot : A man in a dark jacket stands in a rainy street. The background is blurred, giving a moody feeling.
Aesthetic Score : 0.7
Mood : dark, mysterious, contemplative
Quality
Entropy : 6.59
Noise : 76
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly grainy and the rain drops are somewhat artificial.
Superman: Hope Amidst the Ashes
A powerful image of Superman standing tall amidst a city ravaged by disaster. The smoke plume and his determined stance evoke a sense of urgency and heroism, promising a fight for hope in the face of destruction.
Prompt
facial-expressions Guilt: Heavy, burdened, conflicted ; A superhero, cape billowing in the wind; medium shot; Hero; City skyline, destroyed buildings in the background; cinematic
Characteristic
Shot : A superhero, likely Superman, stands in a destroyed city with a smoke plume in the background. The city is recognizable as a modern metropolis, possibly Chicago.
Aesthetic Score : 0.7
Mood : heroic, dramatic, somber
Quality
Entropy : 6.73
Noise : 75
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts, such as the smoke plume and the buildings in the background.
A Worried Glance, A Blurred Past
A woman holds a faded photograph, her expression etched with concern. The image within, slightly out of focus, depicts another woman enjoying a meal in a familiar kitchen. The scene evokes a sense of melancholy and introspection, leaving the viewer to ponder the unspoken story behind the blurred memories.
Prompt
facial-expressions Guilt: Nostalgic, melancholic ; A woman holding a photo of a loved one; close-up; Normal Person; A cluttered kitchen, dishes piled in the sink; cinematic
Characteristic
Shot : A woman is holding a picture of herself in a kitchen, the picture is of her sitting at a table, looking sad with a plate of food.
Aesthetic Score : 0.6
Mood : sad, pensive, contemplative
Quality
Entropy : 6.83
Noise : 75
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors.
Immersed in the Game: A Gamer’s Focused Intensity
A young man, bathed in the glow of neon lights, sits in a darkened room, headphones on, eyes fixed on his laptop. Pizza sits forgotten as he navigates the digital world, his expression a testament to the intense focus of a competitive gamer.
Prompt
facial-expressions Guilt: Isolated, self-loathing ; A gamer, hunched over a computer screen; close-up; Gamer; Neon lights reflecting in their eyes, empty pizza boxes scattered around; cinematic
Characteristic
Shot : A young man is sitting at a desk in a dimly lit room, wearing headphones and looking intently at a laptop. There are neon signs in the background, and pizza slices are in the foreground.
Aesthetic Score : 0.6
Mood : focused, techy, casual
Quality
Entropy : 5.88
Noise : 61
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight blurriness in some areas, especially the neon signs.
A Moment of Joy in the Crowd
A man stands amidst a festive crowd, his surprised smile and the twinkling string lights in the background capture a moment of pure happiness and anticipation.
Prompt
facial-expressions Guilt: Alienated, invisible ; A man standing in a crowded room, looking lost; wide shot; Single Person; A party, people laughing and dancing, oblivious to him; cinematic
Characteristic
Shot : A man in a blue shirt is standing in a crowd of people. He is looking up and smiling. The background is blurry and there are flags hanging from the ceiling.
Aesthetic Score : 0.7
Mood : happy, joyful, hopeful
Quality
Entropy : 6.75
Noise : 77
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image quality is good. There are no major artifacts or errors.
Amidst the Ashes, a Sole Survivor Stands
A lone figure in uniform stands defiant against a backdrop of smoldering ruins and raging fire. The image captures the intensity and somber mood of a post-apocalyptic world, hinting at a story of resilience and survival.
Prompt
facial-expressions Guilt: Torn, conflicted, remorseful ; A hero, standing over a fallen villain; medium shot; Hero; A battlefield, smoke and debris everywhere; cinematic
Characteristic
Shot : A man in a military uniform stands in the midst of a post-apocalyptic landscape, surrounded by debris and fire.
Aesthetic Score : 0.7
Mood : dramatic, tense, apocalyptic
Quality
Entropy : 6.84
Noise : 75
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry, especially in the background.
A Dinner Party Gone Wrong: Tension and Mystery at the Table
A family gathers for dinner, but the dimly lit room and their strained expressions reveal a hidden tension. What secrets lie beneath the surface? This unsettling scene invites you to unravel the mystery.
Prompt
facial-expressions Guilt: Awkward, strained, unspoken ; A family gathered around a table, but the atmosphere is tense; medium shot; Normal People; A dimly lit dining room, empty chairs at the table; cinematic
Characteristic
Shot : A family sitting at a dinner table. The lighting is dim and the mood is tense. The image is well-composed and the colors are muted.
Aesthetic Score : 0.7
Mood : tense, serious, intimate
Quality
Entropy : 6.47
Noise : 67
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no noticeable artifacts or errors in the image. The image is well-exposed and the colors are accurate.
The Weight of Focus: A Young Man’s Intense Gaze
A young man sits in a dimly lit room, his serious expression and direct gaze captivating the viewer. The simple composition, with a computer monitor and a shelf of games in the background, emphasizes his focused state. The low lighting and thoughtful mood create a sense of intensity and introspection.
Prompt
facial-expressions Guilt: Disillusioned, defeated, empty ; A gamer, staring at a blank screen, controller in hand; close-up; Gamer; A dimly lit room, empty energy drink cans scattered around; cinematic
Characteristic
Shot : A young man sitting at a desk in a dimly lit room, with a gaming console and energy drinks on the desk. He appears focused and intense, suggesting he is about to engage in a competitive gaming session.
Aesthetic Score : 0.6
Mood : intense, focused, edgy
Quality
Entropy : 6.02
Noise : 54
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to have a slight degree of noise in the darker areas. The background is relatively plain and could benefit from more depth or visual interest.
Lost in the City’s Embrace
A solitary figure walks down a cobblestone street, her brown coat blending with the aged architecture. The blurred background emphasizes her quiet solitude, creating a sense of calm introspection amidst the urban bustle.
Prompt
facial-expressions Guilt: Lonely, isolated, rejected ; A woman walking away from a group of friends; long shot; Single Person; A bustling city street, people rushing by; cinematic
Characteristic
Shot : A woman in a brown coat walks down a cobbled street in a European city. The street is lined with shops and buildings, and there are other people walking around.
Aesthetic Score : 0.7
Mood : melancholic, urban, mysterious
Quality
Entropy : 6.82
Noise : 73
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : No notable artifacts or errors.
Silhouetted Solitude: A Moment of Contemplation in the City
A solitary figure stands on a rooftop, their silhouette stark against the backdrop of a moonlit cityscape. The scene evokes a sense of melancholy and contemplation, capturing the quiet loneliness of urban life.
Prompt
facial-expressions Guilt: Reflective, contemplative, seeking redemption ; A hero, standing on a rooftop, looking out at the city; wide shot; Hero; A cityscape bathed in moonlight, a sense of peace; cinematic
Characteristic
Shot : A lone figure stands on a rooftop overlooking a city skyline at night, with a large full moon in the sky.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, urban
Quality
Entropy : 6.61
Noise : 68
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.30
Image errors : There are no visible artifacts or errors in the image.
Conclusion
The analysis shows that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.53, which is considered average. This indicates that the model was able to understand the scene in the prompt to a reasonable degree, but not exceptionally well.
- Aesthetic Analysis: The model scored 0.08, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model seems to be better at understanding the aesthetic style than the camera position and scene composition.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://stability.ai