AI's Struggle with Facial Expressions: A Look at the Gap Between Vision and Emotion with Flux-pro
- 10 minutes read - 1944 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate realistic and emotionally evocative images is a coveted goal. While significant progress has been made in recent years, replicating the nuances of human facial expressions remains a challenge. This blog post delves into the results of an experiment where an AI model was tasked with generating images featuring specific facial expressions, revealing both the model’s strengths and limitations in capturing the essence of human emotion.
Created with: flux-pro
Lost in the Shadows: A Solitary Figure Walks the Wet Streets
A single figure traverses a rain-slicked street, bathed in the ethereal glow of distant streetlights. The interplay of light and shadow evokes a sense of melancholy and mystery, leaving the viewer to ponder the figure’s solitary journey.
Prompt
facial-expressions Anger: Despair and rage ; A lone figure, standing in the middle of a deserted street; eye-level; Single Person; Rain pouring down, streetlights casting long shadows; cinematic
Characteristic
Shot : A lone figure walking down a rainy street at night, illuminated by streetlights.
Aesthetic Score : 0.7
Mood : melancholy, solitude, mystery
Quality
Entropy : 6.72
Noise : 98
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.30
Image errors : No major artifacts, but there’s some slight graininess in the shadows.
Superman in Action: A City in Motion
A dramatic and intense image captures Superman in mid-stride, cape billowing, as he races through a bustling city. The scene evokes a sense of hope and urgency, with the hero’s determined expression and the blurred background highlighting the speed and power of his flight.
Prompt
facial-expressions Anger: Fury and determination ; A superhero, fists clenched, facing down a horde of villains; eye-level; Hero; A crumbling cityscape, smoke and debris filling the air; cinematic
Characteristic
Shot : Superman, in his iconic red and blue suit, charges towards the camera in a crowded city street, with a woman in black attire beside him. The image is likely from a film or television show, capturing a moment of intense action.
Aesthetic Score : 0.7
Mood : dramatic, intense, heroic
Quality
Entropy : 6.62
Noise : 94
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears slightly oversharpened, with some minor artifacts around the edges of Superman’s suit and the woman’s hair. However, these are relatively minor and do not significantly detract from the overall quality.
The Weight of Responsibility: A Man Crumbles Under Pressure
A man, overwhelmed by stress, clutches his head in frustration as papers litter his desk. The warm overhead light casts long shadows, amplifying the sense of urgency and unease in this dramatic scene.
Prompt
facial-expressions Anger: Frustration and rage ; A man, slamming his fist on a table, surrounded by scattered papers; eye-level; Normal Person; A cluttered office, with a window showing a stormy sky; cinematic
Characteristic
Shot : A man is sitting at a desk, looking stressed and frustrated. He is clutching his head with both hands, as if in pain or despair. There are papers scattered around him, suggesting he may be overwhelmed with work. The lighting is dim, giving the image a somber and slightly claustrophobic feel.
Aesthetic Score : 0.4
Mood : stressed, overwhelmed, desperate
Quality
Entropy : 6.63
Noise : 71
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant artifacts or errors are noticeable in this image.
The Focus Is Intense: A Gamer’s Moment of Truth
A young man, lost in the glow of his screen, sits in a dimly lit room, his focus unwavering. Two energy drink cans hint at the intensity of his gaming session, while the low light and his intense gaze create a palpable sense of suspense and anticipation. What is he playing? What is he waiting for? The answer lies in the shadows.
Prompt
facial-expressions Anger: Frustration and rage ; A gamer, throwing his headset on the floor, surrounded by empty energy drink cans; eye-level; Gamer; A dimly lit room, with a computer screen displaying a game in progress; cinematic
Characteristic
Shot : A young man wearing headphones is sitting in a dimly lit room, looking down at his hands. There are two cans of soda in front of him, and a computer monitor is in the background.
Aesthetic Score : 0.6
Mood : pensive, focused, relaxed
Quality
Entropy : 6.75
Noise : 79
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry. Some noise is visible in the shadows, and the color balance is slightly off. The image also appears slightly over-saturated.
The Scream: A Moment of Raw Terror
A close-up shot captures a woman’s face contorted in a silent scream, her eyes squeezed shut. The dim lighting and blurred background heighten the intensity of the moment, leaving the viewer with a sense of unease and fear. The image is a powerful testament to the raw emotions of human distress.
Prompt
facial-expressions Anger: Despair and rage ; A woman, screaming into the void, her face contorted in anger; close-up; Single Person; A dark, empty room, with only a single flickering light; cinematic
Characteristic
Shot : A close-up of a woman’s face, she is screaming with her eyes closed and mouth wide open. The lighting is dark and moody.
Aesthetic Score : 0.2
Mood : intense, dark, dramatic
Quality
Entropy : 6.67
Noise : 72
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is slight blurriness around the edges of the image.
Silhouette of Despair: Lone Figure Witnesses City in Flames
A solitary figure stands on a rooftop, silhouetted against a cityscape consumed by fire. The scene evokes a sense of dramatic desolation and melancholic despair, as a massive plume of smoke rises in the distance, hinting at a catastrophic event.
Prompt
facial-expressions Anger: Anger and determination ; A hero, standing on a rooftop, overlooking a city in flames; eye-level; Hero; A fiery inferno engulfing the city, with smoke billowing into the sky; cinematic
Characteristic
Shot : A lone figure stands on a rooftop overlooking a city engulfed in flames. The scene is dominated by a large, fiery explosion in the background, creating a sense of chaos and destruction.
Aesthetic Score : 0.7
Mood : dramatic, melancholic, apocalyptic
Quality
Entropy : 6.67
Noise : 90
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.90
Image errors : There are some minor artifacts and errors in the image, such as blurring around the edges of the figure and some inconsistencies in the fire.
The Argument: A Tense Encounter in Dimly Lit Restaurant
A couple’s heated conversation unfolds in a dimly lit restaurant. The woman looks away, while the man points accusingly, his anger evident in his expression. The lighting and framing amplify the tension, creating a dramatic and unsettling scene.
Prompt
facial-expressions Anger: Frustration and rage ; A couple, arguing in a crowded restaurant, their voices raised in anger; eye-level; Normal People; A bustling restaurant, with other diners looking on; cinematic
Characteristic
Shot : A couple is having a heated discussion in a dimly lit restaurant, the man seems to be yelling at the woman.
Aesthetic Score : 0.5
Mood : intense, dramatic, frustrated
Quality
Entropy : 6.75
Noise : 78
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some noise and grain, particularly in the darker areas.
The Face of Defeat: Man Reacts to ‘Game Over’
A man sits hunched over his computer, his face contorted in frustration as the screen displays the dreaded ‘Game Over’ message. The image captures the raw emotion of defeat, with an exaggerated expression that speaks volumes about the intensity of the moment.
Prompt
facial-expressions Anger: Frustration and rage ; A gamer, smashing his keyboard in a fit of rage; close-up; Gamer; A dimly lit room, with a computer screen displaying a game over screen; cinematic
Characteristic
Shot : A young man is sitting in front of a computer screen. The screen displays a ‘Game Over’ message. The man is yelling in frustration. The room is lit with red and blue lights.
Aesthetic Score : 0.5
Mood : frustrated, angry, dramatic
Quality
Entropy : 6.45
Noise : 72
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some slight noise and artifacts, particularly in the shadows. The red and blue lights are also very harsh and create a bit of a distracting glare.
Lost in the Rain: A Man’s Solitary Struggle
A poignant image of a man shrouded in rain, his face hidden in the shadows. The scene evokes a sense of melancholy and mystery, leaving the viewer to ponder his thoughts and the weight of his solitude.
Prompt
facial-expressions Anger: Despair and rage ; A man, standing in the rain, his face obscured by the downpour; eye-level; Single Person; A dark, deserted street, with only the sound of rain and thunder; cinematic
Characteristic
Shot : A man in a leather jacket stands in the rain, looking down. The background is blurry and out of focus, with the rain creating a moody atmosphere.
Aesthetic Score : 0.7
Mood : melancholy, mysterious, pensive
Quality
Entropy : 6.57
Noise : 95
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : None
A Lone Warrior Stands Amidst the Fallen
A dramatic and somber scene unfolds as a lone armored figure, cloaked in red, stands amidst a field of fallen warriors. The shallow depth of field isolates the figure, emphasizing his strength and resolve. The hazy background adds an epic and timeless feel to the scene.
Prompt
facial-expressions Anger: Anger and determination ; A hero, standing on a battlefield, surrounded by fallen enemies; eye-level; Hero; A battlefield littered with bodies, with smoke and dust filling the air; cinematic
Characteristic
Shot : A lone warrior, clad in armor and a crimson cloak, stands amidst a battlefield strewn with fallen figures. The setting sun casts a warm glow, highlighting the dust and smoke that hangs in the air. The warrior’s expression is determined, his gaze fixed on something beyond the frame.
Aesthetic Score : 0.7
Mood : epic, dramatic, melancholic
Quality
Entropy : 6.78
Noise : 69
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to have some minor compression artifacts, particularly in the shadows and highlights.
Conclusion
The analysis shows that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.2, which is considered poor. This means there’s a significant difference between the camera position described in the prompt and the one used in the generated image.
- Shot Analysis: The model scored 0.62, which is considered good. This indicates that the model was able to understand the scene described in the prompt and create a shot that aligns with it.
- Aesthetic Analysis: The model scored 0.29, which is considered okay. This means the generated image’s aesthetic is somewhat different from what was expected based on the prompt.
Overall, the model seems to be better at understanding the scene and shot composition than it is at capturing the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux-pro/api