AI's Mixed Bag: Capturing Emotion, Missing the Mark on Style with Flux-dev
- 9 minutes read - 1803 wordsTable of Contents
The ability to convey emotion through facial expressions is a fundamental aspect of human communication. However, replicating this complexity in AI-generated images remains a challenge. This blog post delves into a case study where an AI model was tasked with generating images featuring specific facial expressions, revealing both successes and limitations in capturing the nuances of human emotion. We’ll explore how the model performed in terms of camera position, scene understanding, and aesthetic quality, highlighting the areas where AI excels and where it still needs improvement.
Created with: flux-dev
Lost in the Code: A Young Man’s Intense Focus Under Neon Lights
A young man, bathed in blue and red light, sits before his computer screen, headphones on, eyes locked on the camera. His neutral expression and the dramatic lighting create a palpable sense of tension, hinting at a moment of intense focus and determination.
Prompt
facial-expressions Disappointment: Frustration, anger ; A gamer sitting in front of a computer screen; eye-level; Gamer; a dimly lit room with flashing lights and the glow of the monitor reflecting in their eyes; cinematic
Characteristic
Shot : A young man wearing headphones, looking directly at the camera in a dimly lit room with a blue screen in the background.
Aesthetic Score : 0.7
Mood : intense, focused, serious
Quality
Entropy : 6.45
Noise : 61
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors.
Lost in the Neon Glow: A Woman’s Solitary Walk
A woman walks alone through a city street bathed in vibrant neon light. The interplay of light and shadow creates an atmosphere of mystery and introspection, hinting at a story waiting to be told. This image evokes a sense of urban melancholy and invites viewers to contemplate the woman’s journey.
Prompt
facial-expressions Disappointment: Melancholy, isolation ; A lone figure; eye-level; Single Person; a bustling city street at night, with neon signs and blurred lights; cinematic
Characteristic
Shot : A woman in a dark coat walks down a city street at night. The street is lit by neon signs and streetlights. The woman’s face is obscured by her dark hair and the shadows of the street lights.
Aesthetic Score : 0.7
Mood : mysterious, lonely, melancholic
Quality
Entropy : 6.46
Noise : 59
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry, particularly in the background.
Lost in the Shadows: A Man’s Mysterious Journey
A solitary figure, cloaked in darkness, walks a path illuminated only by flickering streetlights. The narrow street, lined with imposing buildings, whispers secrets in the night. This moody and dramatic scene evokes a sense of mystery and intrigue, leaving you wondering what secrets lie ahead.
Prompt
facial-expressions Disappointment: Loneliness, despair ; A man walking down a deserted street; eye-level; Single Person; a street lined with closed shops and flickering streetlights; cinematic
Characteristic
Shot : A man in a dark coat walks down a dark alley lit by streetlamps, the image has a film noir aesthetic.
Aesthetic Score : 0.6
Mood : mysterious, moody, suspenseful
Quality
Entropy : 6.44
Noise : 64
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is some noise in the image, particularly in the shadows.
Solitude Amidst the Mist
A lone figure in a black robe stands on a cliff, silhouetted against a misty mountain range. The muted green sky and vast landscape create a serene and contemplative atmosphere, evoking a sense of peace and mystery.
Prompt
facial-expressions Disappointment: Isolation, disillusionment ; A hero standing on a mountaintop; eye-level; Hero; a vast landscape stretching out before them, but with a sense of emptiness in the air; cinematic
Characteristic
Shot : A lone figure, possibly a monk, stands on a mountain peak overlooking a vast, hazy mountain range. The scene is shrouded in a soft, ethereal light.
Aesthetic Score : 0.7
Mood : tranquility, solitude, mystery
Quality
Entropy : 6.14
Noise : 59
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.60
Image errors : The image appears slightly overexposed, resulting in a washed-out look. The sky and mountains lack detail and appear somewhat flat.
Lost in the City Lights
A solitary figure silhouetted against a backdrop of vibrant city lights, reflecting a mood of melancholy and longing. The rain-streaked window and blurred car lights amplify the sense of isolation and introspection.
Prompt
facial-expressions Disappointment: Sadness, longing ; A woman standing at a window; eye-level; Single Person; a rainy day with the city streets blurred in the background; cinematic
Characteristic
Shot : A young woman with long dark hair stands looking out a window at a blurred city street at night, the window has raindrops on it.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, introspective
Quality
Entropy : 6.42
Noise : 66
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable artifacts or errors in the image.
Intimate Gathering Under Warm Light
A cozy scene unfolds as four individuals share a meal around a table bathed in soft, warm light. The intimate setting and dim lighting create a sense of closeness and comfort, capturing a moment of shared connection.
Prompt
facial-expressions Disappointment: Tension, estrangement ; A family gathered around a dinner table; eye-level; Normal People; a table set with a simple meal, but with an uncomfortable silence hanging in the air; cinematic
Characteristic
Shot : A group of people are sitting at a dining table in a dimly lit room, eating and talking.
Aesthetic Score : 0.6
Mood : cozy, intimate, relaxed
Quality
Entropy : 6.60
Noise : 74
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.30
Image errors : There are some slight artifacts around the edges of the subjects and the table.
The Weight of Defeat: A Moment of Loss Captured in Blue and Red
A solitary figure sits before a computer screen, the stark words ‘Game Over’ illuminating their face. The blue and red lighting casts a somber mood, reflecting the disappointment and loss etched on their expression. This image captures the raw emotion of defeat, leaving a lasting impression of the weight of failure.
Prompt
facial-expressions Disappointment: Defeat, frustration ; A gamer staring at a game over screen; eye-level; Gamer; a darkened room with the glow of the monitor reflecting in their eyes, showing a game over message; cinematic
Characteristic
Shot : A young person sits in front of a computer screen, illuminated by blue light, with the words ‘GAME OVER’ displayed in large red letters. The scene evokes a sense of disappointment and defeat.
Aesthetic Score : 0.6
Mood : melancholy, defeat, digital
Quality
Entropy : 6.23
Noise : 47
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to have some noise, potentially from compression or low light conditions.
A Moment of Solitude: A Woman’s Pensive Gaze
A woman sits alone at a table, her head resting on her hand, lost in thought. The soft lighting and her introspective pose evoke a sense of loneliness and isolation. The presence of food adds a touch of domesticity, creating a poignant contrast with her melancholic mood.
Prompt
facial-expressions Disappointment: Hopelessness, resignation ; A woman sitting at a kitchen table; eye-level; Normal Person; a cluttered kitchen with dirty dishes and a half-eaten meal; cinematic
Characteristic
Shot : A woman sits at a kitchen table with a plate of food in front of her. She is looking down at the food, and her expression is one of sadness or worry.
Aesthetic Score : 0.6
Mood : melancholy, somber, contemplative
Quality
Entropy : 6.77
Noise : 80
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slight amount of noise and grain, particularly in the shadows. The lighting is a bit flat.
Shadow of Doubt: A Post-Apocalyptic Encounter
A cloaked figure looms over a fallen man in a desolate landscape, bathed in dramatic lighting. The scene evokes a sense of mystery and suspense, hinting at a post-apocalyptic world where shadows hold secrets.
Prompt
facial-expressions Disappointment: Disappointment, regret ; A hero standing over a fallen villain; eye-level; Hero; a battlefield littered with debris and smoke, with the villain’s defeated form at the hero’s feet; cinematic
Characteristic
Shot : A man in a long coat stands over a fallen figure in a dusty, rocky landscape. The setting sun casts a warm glow over the scene.
Aesthetic Score : 0.6
Mood : dramatic, somber, intense
Quality
Entropy : 6.56
Noise : 65
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible errors in the image.
Silhouetted Hero at Sunset
A lone figure, cloaked in red, stands at the edge of a rooftop, their silhouette a powerful symbol of hope and heroism against the vibrant sunset over the city skyline.
Prompt
facial-expressions Disappointment: Defeated, disillusioned ; A superhero standing on a rooftop; eye-level; Hero; a cityscape bathed in the orange glow of a setting sun, with the hero’s cape billowing in the wind; cinematic
Characteristic
Shot : A lone figure, silhouetted against a setting sun, stands on a rooftop overlooking a cityscape. They are wearing a cape, which billows in the wind. The cityscape is in the background, blurred and slightly out of focus.
Aesthetic Score : 0.7
Mood : dramatic, heroic, hopeful
Quality
Entropy : 6.63
Noise : 44
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors in the image.
Conclusion
The analysis of the generated image reveals mixed results:
- Camera Position: The model performed fairly well in capturing the intended camera position, scoring 0.15. This is slightly below the “good” range of 0.5 to 0.75, indicating some discrepancies between the prompt and the final image.
- Shot Analysis: The model demonstrated good understanding of the scene described in the prompt, achieving a score of 0.57. This falls within the “good” range, suggesting the model successfully translated the prompt’s description into a visually coherent scene.
- Aesthetic Analysis: The generated image’s aesthetic deviated significantly from the expected aesthetic, scoring -0.08. This score falls outside the “very good” range of -0.2 to 0.1, indicating a noticeable difference between the desired and actual aesthetic.
Overall, the model showed a decent ability to understand the scene and camera position, but struggled to achieve the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux/dev/api