AI's Facial Expressions: A Mixed Bag of Success with Stable-diffusion
- 9 minutes read - 1771 wordsTable of Contents
Facial expressions are a powerful tool in storytelling, conveying emotions and intentions without words. Generative AI is increasingly being used to create images with specific facial expressions, but how well does it capture the nuances of human emotion? This blog post explores the capabilities of generative AI in this domain, analyzing its performance in capturing camera position, shot analysis, and aesthetic appeal. We’ll examine examples of AI-generated images with facial expressions, highlighting both its successes and areas for improvement. By understanding the strengths and weaknesses of AI in this area, we can better appreciate its potential and limitations in creating compelling and emotionally resonant imagery.
Created with: stability-ai-core
Intense Gaze in the Shadows
A man stands in a dimly lit brick alleyway, his piercing gaze locked on the viewer. The narrow confines and mysterious atmosphere create a sense of suspense and intrigue.
Prompt
facial-expressions Fear: Unease, paranoia ; A lone figure; eye-level; Single Person; a dark, deserted alleyway; cinematic
Characteristic
Shot : A man in a black coat standing in a narrow alleyway with brick walls on either side. He is looking directly at the camera. A second man is walking away in the background.
Aesthetic Score : 0.6
Mood : dark, mysterious, suspenseful
Quality
Entropy : 6.37
Noise : 71
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.30
Image errors : Some minor graininess and noise are visible, particularly in the shadows. The image also appears slightly underexposed, leading to a darker overall tone.
Superman: A Hero in the Fog
A split-screen image captures Superman’s heroic presence against a backdrop of a foggy cityscape. The dark and epic mood, combined with the dramatic effect of the fog, creates a sense of mystery and intrigue.
Prompt
facial-expressions Fear: Dread, anticipation ; A superhero standing alone on a rooftop; eye-level; Hero; a cityscape shrouded in fog; cinematic
Characteristic
Shot : Superman standing on a rooftop overlooking a foggy cityscape. The top image shows Superman’s face while the bottom image shows his back.
Aesthetic Score : 0.6
Mood : heroic, dramatic, ominous
Quality
Entropy : 6.69
Noise : 69
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.80
Image errors : The fog is too uniform and looks slightly artificial. There are some minor artifacts around the edges of the image. There is a visible seam in the middle where the two images are joined.
Lost in the Shadows: A Woman’s Lonely Walk Through a Mysterious City
A woman walks alone down a cobblestone street at night, her worried expression and the dim lighting creating a sense of suspense and mystery. The buildings on either side and the streetlights overhead add to the feeling of isolation, leaving the viewer wondering what secrets lie ahead.
Prompt
facial-expressions Fear: Vulnerability, isolation ; A woman walking down a dimly lit street; eye-level; Normal Person; a deserted street with flickering streetlights; cinematic
Characteristic
Shot : A woman walking down a narrow, cobblestone street at dusk, illuminated by streetlights. The street is lined with buildings on both sides, and the woman is the main focus of the image.
Aesthetic Score : 0.7
Mood : mysterious, urban, melancholic
Quality
Entropy : 6.36
Noise : 71
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed, and there is some noise in the shadows. There are also some distracting elements in the background, such as the blurry figures in the distance.
The Hacker in the Shadows
A man hunches over his keyboard in a dimly lit room, his intense focus hinting at a secret mission. The atmosphere is thick with tension, leaving you wondering what he’s working on and what the stakes might be.
Prompt
facial-expressions Fear: Disquiet, unease ; A gamer hunched over their computer; close-up; Gamer; a flickering monitor displaying a disturbing image; cinematic
Characteristic
Shot : A man is sitting in front of a computer, typing on a keyboard. He is looking at the screen. The room is dark and dimly lit.
Aesthetic Score : 0.6
Mood : focused, intense, serious
Quality
Entropy : 5.96
Noise : 60
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some noise and grain, and the focus is slightly soft.
Intense Gaze in the Shadows
A close-up portrait captures a woman’s serious expression, bathed in dramatic, shadowy lighting. Her dark hair and sweater blend with the blurred background, creating a sense of mystery and intensity.
Prompt
facial-expressions Fear: Terror, helplessness ; hiding ; low-angle; Single Person; a dark room with shadows creeping in; cinematic
Characteristic
Shot : A close-up portrait of a woman with a dark background, looking directly at the viewer.
Aesthetic Score : 0.6
Mood : intense, mysterious, serious
Quality
Entropy : 4.74
Noise : 55
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : No obvious errors but could be sharper and better lit.
A Lone Warrior in a Wasteland of Despair
A lone warrior, clad in battle armor, stands defiant in a post-apocalyptic wasteland. A destroyed vehicle serves as a grim reminder of the past, while a shadowy figure in the distance hints at the dangers that lurk. The image is filled with tension and foreboding, capturing the gritty reality of a world on the brink.
Prompt
facial-expressions Fear: Desperation, courage ; A hero facing a monstrous creature; eye-level; Hero; a crumbling battlefield with smoke and debris; cinematic
Characteristic
Shot : A man in dark armor stands in a post-apocalyptic landscape, his face grimed with dirt and blood. Behind him, a monstrous creature lurks in the smoke-filled distance.
Aesthetic Score : 0.7
Mood : intense, dramatic, gritty
Quality
Entropy : 6.85
Noise : 81
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.30
Image errors : No visible errors. The image seems to have a minor noise artifact, especially in the smoke.
Stormy Skies and Troubled Faces: A Moment of Suspense
A group of people, mostly men, stand in a row, their faces turned upwards towards a dramatic, stormy sky. Lightning flashes illuminate the scene, creating a sense of foreboding and suspense. The woman in the middle, with her intense expression, adds to the tension of the moment.
Prompt
facial-expressions Fear: Anxiety, uncertainty ; A group of people huddled together in a darkened room; eye-level; Normal People; a storm raging outside with thunder and lightning; cinematic
Characteristic
Shot : A group of people are looking up in fear as lightning strikes in the background.
Aesthetic Score : 0.6
Mood : suspense, dramatic, eerie
Quality
Entropy : 6.27
Noise : 69
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.50
Image errors : Some blurring is present around the edges, especially noticeable in the lightning. The lightning is not realistically rendered and has a slightly artificial look.
Man’s Shocked Reaction Captures the Moment of Truth
A dimly lit image reveals a man’s stunned expression as he stares at a computer screen. His wide-open mouth and hands clasped to his head convey a sense of shock and tension, leaving viewers wondering what could have caused such a dramatic reaction.
Prompt
facial-expressions Fear: Shock, adrenaline ; A gamer’s hands shaking as they play a horror game; close-up; Gamer; a screen displaying a jump scare; cinematic
Characteristic
Shot : A man sitting in front of a computer, looking shocked and surprised, his hands raised in a gesture of shock.
Aesthetic Score : 0.4
Mood : surprised, shocked, intense
Quality
Entropy : 5.73
Noise : 62
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, and the lighting is not ideal.
Lost in the Storm’s Embrace
A solitary figure stands defiant against the raw power of a stormy ocean. The dramatic contrast of dark skies and crashing waves evokes a sense of awe and melancholic beauty, leaving the viewer contemplating the vastness of nature.
Prompt
facial-expressions Fear: Loneliness, despair ; A lone figure standing at the edge of a cliff; eye-level; Single Person; a vast, empty landscape with a stormy sky; cinematic
Characteristic
Shot : A lone figure stands on a cliff overlooking a stormy ocean, with dramatic dark clouds overhead.
Aesthetic Score : 0.8
Mood : dramatic, melancholic, awe
Quality
Entropy : 6.46
Noise : 79
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight overexposure, slight banding
Amidst the Flames, a Steadfast Figure
A lone figure, clad in black tactical gear, stands defiant against a backdrop of burning city streets. The intensity of the flames and the determined expression on his face capture the raw drama and urgency of an apocalyptic scene.
Prompt
facial-expressions Fear: Loss, determination ; A hero standing amidst a burning city; eye-level; Hero; a chaotic scene with smoke and flames; cinematic
Characteristic
Shot : A man in a black jacket and a backpack stands in a city street with burning cars behind him. There is smoke in the air. The man looks serious and determined.
Aesthetic Score : 0.7
Mood : intense, dramatic, apocalyptic
Quality
Entropy : 6.77
Noise : 76
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The fire and smoke look a little bit artificial. The composition is a bit too static.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.35, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t quite capture the intended camera position as described in the prompt.
- Shot Analysis: The model scored 0.58, which falls within the “good” range. This indicates that the model was able to understand the scene in the prompt and create a shot that was relatively close to what was expected.
- Aesthetic Analysis: The model scored 0.14, which is significantly higher than the “very good” range of -0.2 to 0.1. This means that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall, the model seems to be better at understanding the scene and shot composition than it is at capturing the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://stability.ai