AI Struggles to Capture the Nuance of Human Emotion with Flux-schnell
- 9 minutes read - 1887 wordsTable of Contents
The ability to convey emotion through facial expressions is a fundamental aspect of human communication. It’s a complex interplay of muscle movements, subtle shifts in expression, and the context of the situation. While AI has made significant strides in image generation, capturing the nuanced complexity of human emotion remains a significant challenge. This blog post explores the results of an experiment that highlights this challenge, examining how a generative AI model performed when tasked with creating images based on scenes and facial expressions.
Created with: flux-schnell
Mystery in the Shadows: A Hooded Figure Walks the Wet City Streets
A lone figure, shrouded in darkness, navigates the rain-slicked streets. The low-key lighting and the hooded figure create an atmosphere of mystery and intrigue, leaving you wondering about their secrets and their destination.
Prompt
facial-expressions Shame: Desolate, lonely, regretful ; A lone figure, hunched over, walking down a deserted street; eye-level; Single Person; Rain-slicked pavement and flickering streetlights; cinematic
Characteristic
Shot : A hooded figure walking down a deserted street in a foggy city at night. Street lights illuminate the scene and buildings on both sides of the street. There are cars parked on the side of the street.
Aesthetic Score : 0.6
Mood : lonely, mysterious, somber
Quality
Entropy : 6.22
Noise : 75
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, especially in the background. This could be due to low light conditions or camera shake.
Masked Hero at Sunset
A figure cloaked in mystery stands against a breathtaking cityscape at sunset. The dramatic lighting and heroic pose evoke a sense of intrigue and anticipation. Who is this masked figure, and what secrets lie ahead?
Prompt
facial-expressions Shame: Melancholy, disillusioned, burdened ; A superhero, their mask removed, revealing a face etched with pain; eye-level; Hero; A cityscape bathed in the glow of a setting sun; cinematic
Characteristic
Shot : A masked figure stands in front of a city skyline during sunset.
Aesthetic Score : 0.7
Mood : mysterious, intense, dramatic
Quality
Entropy : 6.53
Noise : 71
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.50
Image errors : The image has a slightly grainy texture and some noise.
Lost in the Shadows: A Moment of Melancholy
A woman sits alone in a dimly lit restaurant, her head in her hands, her sadness palpable. The low light and her posture create a sense of isolation and despair, drawing the viewer into her emotional turmoil.
Prompt
facial-expressions Shame: Embarrassed, defeated, self-loathing ; A woman, her face buried in her hands, sitting alone at a crowded diner table; eye-level; Normal Person; The bustling activity of the diner, a stark contrast to her isolation; cinematic
Characteristic
Shot : A woman is sitting at a table in a dimly lit restaurant, covering her face with her hands. She appears to be crying.
Aesthetic Score : 0.4
Mood : sad, contemplative, lonely
Quality
Entropy : 6.72
Noise : 74
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some noise and grain. The lighting is a little uneven, but it is probably intentional to create the desired mood.
Lost in the Game: A Moment of Intense Focus
A young man, bathed in the soft glow of a screen, is completely absorbed in his video game. The low lighting and his intense concentration create a palpable sense of suspense and excitement, drawing the viewer into his world of digital immersion.
Prompt
facial-expressions Shame: Empty, defeated, lost in a digital world ; A gamer, staring blankly at a screen, his controller lying idle; eye-level; Gamer; A dimly lit room filled with gaming paraphernalia, a sense of disconnection; cinematic
Characteristic
Shot : A young man wearing headphones is playing video games in a dimly lit room. There are other pictures on the walls and a bookshelf in the background.
Aesthetic Score : 0.6
Mood : focused, intense, serious
Quality
Entropy : 6.19
Noise : 72
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible image errors.
A Shadow of Doubt: Mystery and Tension in a Dimly Lit Room
A young man, his face etched with apprehension, stands amidst a crowd in a dimly lit room. The atmosphere is thick with tension, leaving viewers to wonder what secrets lie hidden in the shadows. The lighting and the man’s expression create a sense of mystery and intrigue, drawing you into a world of unspoken anxieties.
Prompt
facial-expressions Shame: Anxious, self-conscious, out of place ; A man, standing in a crowded room, his eyes darting nervously around; eye-level; Single Person; A party scene, filled with laughter and conversation, but he feels isolated; cinematic
Characteristic
Shot : A man stands in the foreground of a dimly lit bar or club. The background is blurred, showing other people in the space.
Aesthetic Score : 0.6
Mood : intense, focused, introspective
Quality
Entropy : 6.12
Noise : 65
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : No notable image errors.
Solitude in the City
A lone figure stands on a rooftop, gazing out at the sprawling cityscape. The blue sky and fluffy clouds create a sense of calm, while the figure’s isolation evokes a feeling of contemplation and quiet reflection.
Prompt
facial-expressions Shame: Disheartened, disillusioned, questioning his purpose ; A hero, standing on a rooftop, looking down at the city below; not too close; Hero; A panoramic view of the city, but he feels small and insignificant; cinematic
Characteristic
Shot : A man standing on a rooftop overlooking a city skyline, likely New York City.
Aesthetic Score : 0.7
Mood : solitude, contemplative, urban
Quality
Entropy : 6.81
Noise : 90
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, particularly in the distant buildings. The colors are a bit muted, lacking vibrancy. There is some noise in the sky.
A Moment of Quiet Contemplation
A woman sits alone at a kitchen table, bathed in warm artificial light. Her posture and expression suggest a quiet sadness, creating a sense of intimacy and melancholy. The scene evokes a feeling of contemplation and introspection.
Prompt
facial-expressions Shame: Depressed, unmotivated, lost in her thoughts ; A woman, sitting at her kitchen table, staring at a plate of untouched food; eye-level; Normal Person; A cluttered kitchen, a reflection of her inner turmoil; cinematic
Characteristic
Shot : A woman is sitting at a kitchen table, looking down at a plate of food. She appears to be in a thoughtful or sad mood.
Aesthetic Score : 0.6
Mood : thoughtful, somber, introspective
Quality
Entropy : 6.85
Noise : 79
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible errors in the image. However, the image quality is slightly blurry, which is likely due to the soft lighting.
Cyberpunk Focus: A Man Lost in the Digital Maze
A young man sits hunched over his keyboard, bathed in the vibrant glow of a cyberpunk cityscape. His intense focus and the mysterious aura surrounding him create a captivating scene of digital immersion.
Prompt
facial-expressions Shame: Despair, addiction, a sense of being lost ; A gamer, hunched over his keyboard, his fingers flying across the keys, but his eyes are filled with sadness; eye-level; Gamer; A brightly lit gaming room, but he feels trapped in a digital world; cinematic
Characteristic
Shot : A young man is sitting at a computer desk, typing on a keyboard. The image is taken from a low angle, giving the viewer a close-up view of the man’s face and hands.
Aesthetic Score : 0.6
Mood : focused, intense, serious
Quality
Entropy : 6.86
Noise : 88
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some slight artifacts around the edges of the man’s hair, and the keyboard keys are a bit blurry.
Lost in the City: A Man’s Anonymous Journey
A solitary figure walks through a bustling urban landscape, his path shrouded in mystery. The blurred background and sharp focus on the man create a sense of anonymity and intrigue, leaving the viewer to wonder about his destination and purpose.
Prompt
facial-expressions Shame: Rejected, isolated, a sense of being unwanted ; A man, walking away from a group of people, his head down, his shoulders slumped; eye-level; Single Person; A bustling street, but he feels alone and invisible; cinematic
Characteristic
Shot : A man walks away from the camera in a crowded city street. The scene is busy and filled with people, but the man is the main focus.
Aesthetic Score : 0.6
Mood : urban, anonymous, contemplative
Quality
Entropy : 6.76
Noise : 85
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some graininess, a bit of blur in the background, and slight chromatic aberration.
A Warrior’s Gaze Amidst the Ruins
A lone figure in armor stands amidst a scene of devastation, his gaze fixed on something unseen. The soft lighting and dramatic composition create a sense of intensity and mystery, leaving the viewer to wonder what transpired and what awaits the warrior next.
Prompt
facial-expressions Shame: Guilt, regret, a sense of responsibility ; A hero, standing in the ruins of a battle, his armor dented and his face covered in grime; not too close; Hero; A scene of destruction, a reminder of the cost of his actions; cinematic
Characteristic
Shot : A man in medieval armor stands in front of a blurry background of other people in the distance.
Aesthetic Score : 0.7
Mood : dark, dramatic, intense
Quality
Entropy : 6.89
Noise : 85
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some artifacts are visible in the background. The image is also slightly over-sharpened.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.15, which is below the “good” range of 0.5 to 0.75. This indicates that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.55, which falls within the “good” range. This suggests that the model was able to understand the scene described in the prompt and create a shot that aligns with it.
- Aesthetic Analysis: The model scored 0.17, which is significantly higher than the “very good” range of -0.2 to 0.1. This indicates that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall: While the model demonstrated good understanding of the scene and shot composition, it struggled to capture the desired aesthetic. This suggests that the model might need further training to better understand and translate aesthetic preferences into visual outputs.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux/schnell/api