AI's Struggle with Facial Expressions: A Look at the Emotional Gap with Flux-dev
- 9 minutes read - 1879 wordsTable of Contents
Facial expressions are a fundamental aspect of human communication, conveying a wide range of emotions and intentions. Dramatic facial expressions, in particular, play a crucial role in storytelling, theater, and film, adding depth and impact to narratives. However, replicating these expressions accurately in AI-generated content remains a challenge. This article examines the performance of a generative AI model in understanding and generating facial expressions, focusing on its ability to capture the dramatic and nuanced aspects of human emotion.
Created with: flux-dev
A Lone Warrior Stands Amidst the Ruins of Battle
A solitary figure, cloaked in crimson, stands amidst a battlefield littered with fallen comrades. The hazy smoke and dust paint a somber backdrop, highlighting the warrior’s isolation and the tragic aftermath of the conflict. The silhouette against the smoky sky creates a dramatic effect, emphasizing the weight of the battle and the warrior’s solitary vigil.
Prompt
facial-expressions Anger: Anger and determination ; A hero, standing on a battlefield, surrounded by fallen enemies; eye-level; Hero; A battlefield littered with bodies, with smoke and dust filling the air; cinematic
Characteristic
Shot : A lone warrior in a red cape stands in a field of fallen warriors, bathed in golden light.
Aesthetic Score : 0.7
Mood : dramatic, melancholic, somber
Quality
Entropy : 6.07
Noise : 73
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.80
Image errors : There is slight blurriness in the background, likely due to the fog effect. The edges of some figures are a bit pixelated, indicating potential upscaling or image processing.
Unspoken Tension: A Night at the Table
In the heart of a bustling restaurant, a man and woman share an intimate moment over drinks. Their conversation is lively, yet the woman’s tense expression hints at an underlying drama, adding a layer of suspense to their romantic evening.
Prompt
facial-expressions Anger: Frustration and rage ; A couple, arguing in a crowded restaurant, their voices raised in anger; eye-level; Normal People; A bustling restaurant, with other diners looking on; cinematic
Characteristic
Shot : A man and a woman are seated at a table in a restaurant, possibly on a date. They are engaged in a conversation, but the woman appears to be looking away from the man, with a slightly annoyed expression.
Aesthetic Score : 0.6
Mood : tense, awkward, intimate
Quality
Entropy : 6.71
Noise : 70
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some minor artifacts present in the image, especially in the shadows and the background.
Lost in the Mist: A Silhouette of Solitude
A lone figure walks through a misty night, their silhouette shrouded in mystery against the glow of streetlamps. The scene evokes a sense of loneliness and intrigue, leaving the viewer to wonder about the figure’s journey and destination.
Prompt
facial-expressions Anger: Despair and rage ; A lone figure, standing in the middle of a deserted street; eye-level; Single Person; Rain pouring down, streetlights casting long shadows; cinematic
Characteristic
Shot : A lone figure walks down a deserted street in the rain at night. The street is dimly lit by streetlights, and the atmosphere is moody and atmospheric.
Aesthetic Score : 0.6
Mood : mysterious, lonely, somber
Quality
Entropy : 6.67
Noise : 79
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, and there is some noise in the shadows.
Silhouetted Against the Inferno
A solitary figure stands defiant against a backdrop of a burning city, the flames and smoke creating a dramatic and apocalyptic scene. The stark contrast between light and shadow evokes a sense of loneliness and vulnerability, leaving the viewer to ponder the figure’s fate in this somber landscape.
Prompt
facial-expressions Anger: Anger and determination ; A hero, standing on a rooftop, overlooking a city in flames; eye-level; Hero; A fiery inferno engulfing the city, with smoke billowing into the sky; cinematic
Characteristic
Shot : A lone figure stands silhouetted against a fiery cityscape. The figure is dressed in a long, flowing robe. The cityscape is in the distance, and the sky is filled with smoke and fire.
Aesthetic Score : 0.7
Mood : dramatic, intense, apocalyptic
Quality
Entropy : 6.75
Noise : 71
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.80
Image errors : There are no visible errors in the image.
Screaming in the Dark: A Moment of Raw Emotion
A woman’s face, illuminated in a stark contrast of light and shadow, captures a moment of intense distress. Her scream, amplified by the close-up framing, evokes a sense of raw emotion and dramatic tension.
Prompt
facial-expressions Anger: Despair and rage ; A woman, screaming into the void, her face contorted in anger; close-up; Single Person; A dark, empty room, with only a single flickering light; cinematic
Characteristic
Shot : A woman with long dark hair is screaming with her eyes closed. She is wearing a white shirt. The background is dark and blurry.
Aesthetic Score : 0.4
Mood : intense, dramatic, fear
Quality
Entropy : 5.46
Noise : 38
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image appears to be slightly overexposed and the woman’s skin tone is a bit unnatural.
Superman Stands Tall, Ready for Action
A powerful image captures Superman in a moment of intense focus, standing amidst a blurred cityscape. The lighting and composition create a sense of dramatic intensity, highlighting the hero’s unwavering determination.
Prompt
facial-expressions Anger: Fury and determination ; A superhero, fists clenched, facing down a horde of villains; eye-level; Hero; A crumbling cityscape, smoke and debris filling the air; cinematic
Characteristic
Shot : A close-up shot of Superman, with a blurred background of city streets and other figures.
Aesthetic Score : 0.7
Mood : intense, dramatic, heroic
Quality
Entropy : 6.47
Noise : 78
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some minor artifacts visible in the background blur and slight banding in the costume. The lighting on the subject is a bit flat and inconsistent.
Caught in the Act: A Man’s Surprised Reaction Amidst Chaos
A man in a blue shirt sits at a desk, his face etched with surprise as he stares directly at the camera. Papers are scattered across the desk, hinting at a moment of urgency and chaos. The blurry office background adds to the sense of intensity and anxiety.
Prompt
facial-expressions Anger: Frustration and rage ; A man, slamming his fist on a table, surrounded by scattered papers; eye-level; Normal Person; A cluttered office, with a window showing a stormy sky; cinematic
Characteristic
Shot : A man in a blue shirt is leaning over a table with papers scattered on it, looking directly at the camera with an expression of anger and frustration.
Aesthetic Score : 0.6
Mood : intense, frustrated, angry
Quality
Entropy : 6.83
Noise : 74
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
The Glow of Focus: A Gamer’s Intensity Captured
A young man, bathed in the vibrant light of his computer screen, is completely engrossed in a video game. His focused expression and dramatic lighting create a powerful image of intense concentration and digital immersion.
Prompt
facial-expressions Anger: Frustration and rage ; A gamer, smashing his keyboard in a fit of rage; close-up; Gamer; A dimly lit room, with a computer screen displaying a game over screen; cinematic
Characteristic
Shot : A man is playing a video game on his computer and is looking very focused.
Aesthetic Score : 0.6
Mood : intense, focused, dark
Quality
Entropy : 6.64
Noise : 69
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is a slight blurriness to the image.
Mystery in the Rain: A Hooded Figure Walks the City Streets
A shadowy figure, shrouded in a hooded jacket, navigates a rain-soaked city. The dim lighting and obscured face create an air of mystery and intrigue, leaving viewers to wonder about the man’s intentions. This moody scene evokes a sense of urban drama and unspoken secrets.
Prompt
facial-expressions Anger: Despair and rage ; A man, standing in the rain, his face obscured by the downpour; eye-level; Single Person; A dark, deserted street, with only the sound of rain and thunder; cinematic
Characteristic
Shot : A person in a hooded jacket walks through the rain in an urban setting. The scene is dark and mysterious, with a sense of loneliness and isolation.
Aesthetic Score : 0.6
Mood : dark, mysterious, lonely
Quality
Entropy : 6.13
Noise : 79
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible artifacts or errors in the image.
The Gamer’s Focus: A Moment of Intensity
A young man, lost in the digital world, sits in his gaming chair, headphones on, eyes fixed on something in his hands. The low-key lighting and his intense focus create a sense of drama and suspense, while the energy drink cans on the floor add a touch of grunge. This image captures the essence of a gamer’s dedication and the thrill of the game.
Prompt
facial-expressions Anger: Frustration and rage ; A gamer, throwing his headset on the floor, surrounded by empty energy drink cans; eye-level; Gamer; A dimly lit room, with a computer screen displaying a game in progress; cinematic
Characteristic
Shot : A young man wearing headphones is sitting in a dimly lit room with a computer monitor in the background, there are two energy drinks on the table.
Aesthetic Score : 0.6
Mood : dark, contemplative, casual
Quality
Entropy : 6.68
Noise : 74
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is a slight blur around the edges of the image, which might be due to camera shake or processing. There is also a bit of noise in the shadows.
Conclusion
The analysis shows that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.3, indicating a moderate ability to react to camera positions in the prompt. This is considered average, as a score between 0.5 and 0.75 is considered good, and above 0.75 is very good.
- Shot Analysis: The model scored 0.64, indicating a good ability to understand the scene described in the prompt. This is within the good range, as scores between 0.5 and 0.75 are considered good.
- Aesthetic Analysis: The model scored 0.17, indicating a slight deviation from the expected aesthetic. This is considered very good, as scores between -0.2 and 0.1 are considered very good.
Overall, the model demonstrates a good understanding of the scene and camera position, but could benefit from further development in capturing the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux/dev/api