AI's Artistic Struggle: Capturing Emotion in Visuals with Flux-schnell
- 9 minutes read - 1844 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate realistic and emotionally evocative visuals is a coveted goal. This blog post examines the performance of a generative AI model in creating images based on detailed scene descriptions, focusing on the model’s ability to capture facial expressions. While the model demonstrates a strong understanding of camera position and shot composition, it struggles to convey the desired aesthetic, particularly in the realm of facial expressions. This highlights the ongoing challenge of imbuing AI-generated visuals with the nuanced emotional depth that human artists effortlessly achieve. We will explore the reasons behind this limitation and discuss the potential for future improvements in this area.
Created with: flux-schnell
Contemplation in the Bleak Landscape
A solitary figure, clad in black, stands amidst a desolate landscape under a brooding sky. His serious gaze, directed straight at the viewer, evokes a sense of contemplation and foreboding. The stark contrast between the man’s expression and the bleak surroundings creates a powerful dramatic effect.
Prompt
facial-expressions Determination: Solitude and resilience ; A lone figure; eye-level; Single Person; A vast, desolate landscape; cinematic
Characteristic
Shot : A man stands in a desolate landscape, looking directly at the camera. The sky is overcast and the ground is covered in sparse vegetation.
Aesthetic Score : 0.6
Mood : serious, contemplative, melancholic
Quality
Entropy : 6.33
Noise : 54
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.10
Image errors : None
Heroic Figure Emerges from the Flames
A man clad in futuristic armor and a flowing red cape stands defiantly against a backdrop of a burning city. His intense expression and the fiery chaos behind him create a sense of urgency and drama, hinting at a heroic struggle against overwhelming odds.
Prompt
facial-expressions Determination: Courage and unwavering resolve ; A hero standing tall; low-angle; Hero; A burning city in the background; cinematic
Characteristic
Shot : A man with a serious expression is wearing a dark and intricate costume with a star on the chest, the background is a blurry fiery scene, likely a cityscape engulfed in flames. The man’s costume and the fiery backdrop create a dramatic and visually striking scene.
Aesthetic Score : 0.7
Mood : intense, dramatic, heroic
Quality
Entropy : 6.73
Noise : 92
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.60
Image errors : No visible artifacts or errors.
The Weight of Industry: A Man’s Contemplative Gaze
A worker, clad in his uniform, stands amidst the blur of an industrial setting. His serious expression and the depth of the scene create a sense of contemplation and the weight of his surroundings. The image evokes a mood of seriousness and industrial grit.
Prompt
facial-expressions Determination: Grit and perseverance ; A worker pushing a heavy cart; eye-level; Normal People; A bustling factory floor; cinematic
Characteristic
Shot : A man in a grey work uniform and cap stands in front of a blurry industrial setting. He appears to be holding onto a metal bar.
Aesthetic Score : 0.6
Mood : industrial, serious, worn
Quality
Entropy : 6.78
Noise : 98
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slight blur in the background and the subject’s face is not perfectly sharp. The lighting could be more balanced.
Lost in the Code: A Portrait of Focus
A young man, headphones on, stares intently at a computer screen. The background blurs, leaving only his focused face and the mystery of his digital world. This image captures the intensity of concentration, the quiet power of a mind immersed in code.
Prompt
facial-expressions Determination: Concentration and drive ; A gamer intensely focused on a screen; close-up; Gamer; A dimly lit room with glowing monitors; cinematic
Characteristic
Shot : A young man with headphones on, looking intensely at a computer screen. The scene is dimly lit, with a few other computer screens visible in the background.
Aesthetic Score : 0.6
Mood : intense, focused, mysterious
Quality
Entropy : 6.34
Noise : 69
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some noise and graininess, particularly in the darker areas. The colors are also slightly muted and lacking vibrancy.
Silhouetted in the Setting Sun: A Moment of Contemplation
A woman gazes out a window, her face partially illuminated by the fading light of the setting sun. The dramatic lighting casts her features in shadow, creating a sense of mystery and intrigue. Her pensive expression suggests a moment of deep contemplation, tinged with melancholy.
Prompt
facial-expressions Determination: Inner strength and hope ; A woman staring out a window; eye-level; Single Person; A stormy sky; cinematic
Characteristic
Shot : A woman gazes out of a window, with a thoughtful expression, as the sun sets in the background
Aesthetic Score : 0.7
Mood : melancholy, contemplative, introspective
Quality
Entropy : 6.39
Noise : 48
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image seems to be slightly overexposed and has a bit of noise in the darker areas.
One Man’s Fist, A Crowd’s Hope: A Moment of Defiance
A lone figure, clad in brown, raises his fist to the sky, his silhouette stark against the blurred backdrop of a cheering crowd. The scene evokes a sense of dramatic intensity, suggesting a moment of defiance or rallying, filled with hope and inspiration.
Prompt
facial-expressions Determination: Victory and unwavering resolve ; A hero raising a sword; low-angle; Hero; A battlefield with fallen enemies; cinematic
Characteristic
Shot : A man in a brown robe is looking up at the sky with his fist raised. He is in a crowd of people, possibly a battlefield or a gathering of sorts.
Aesthetic Score : 0.7
Mood : dramatic, hopeful, determined
Quality
Entropy : 6.49
Noise : 67
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.10
Image errors : None
Children Face Danger as Fire Engulfs House
A tense scene unfolds as three children stand before a burning house, the flames casting an ominous glow. The old wooden structure is consumed by fire, creating a dramatic and suspenseful atmosphere. The children’s expressions suggest fear and uncertainty as they witness the unfolding disaster.
Prompt
facial-expressions Determination: Resilience and unity ; A family huddled together; eye-level; Normal People; A burning house in the background; cinematic
Characteristic
Shot : A group of four people, including two children, are standing in front of a house with a fire burning in the upper windows. The house is old and worn, with a dark gray exterior. The people appear to be concerned or frightened, but it is not clear what is happening.
Aesthetic Score : 0.6
Mood : tense, anxious, dramatic
Quality
Entropy : 6.84
Noise : 89
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed, which has resulted in some loss of detail in the highlights. There is also some noise in the shadows, but this is not very noticeable.
In the Zone: Gamer’s Intensity Under Red-Hot Lights
A young man, bathed in red and orange light, is completely absorbed in his game. His focused expression and rapid keystrokes tell a story of intense competition and unwavering determination. The blurred background emphasizes the drama of the moment, capturing the raw energy of a gamer in their element.
Prompt
facial-expressions Determination: Excitement and focus ; A gamer’s hands furiously typing on a keyboard; close-up; Gamer; A brightly lit gaming room; cinematic
Characteristic
Shot : A young man is playing a video game, wearing headphones and looking intensely at the screen. He is sitting in a dimly lit room with red lighting and a blurry figure in the background.
Aesthetic Score : 0.6
Mood : intense, focused, competitive
Quality
Entropy : 6.75
Noise : 74
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some noise and grain, especially in the darker areas. The focus is slightly soft on the background.
Lost in the Mist: A Figure Vanishes into the Unknown
A solitary figure walks through a dense, misty forest, their form almost entirely obscured by the swirling fog. The stark contrast between the pale blue sky and the shadowy trees creates an eerie atmosphere, leaving the viewer with a sense of mystery and foreboding.
Prompt
facial-expressions Determination: Hope and perseverance ; A lone figure walking towards a distant light; eye-level; Single Person; A dark, foreboding forest; cinematic
Characteristic
Shot : A lone figure walks through a dense foggy forest at night. The light at the end of the path creates an eerie and mysterious atmosphere.
Aesthetic Score : 0.7
Mood : mysterious, eerie, suspenseful
Quality
Entropy : 6.03
Noise : 71
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.30
Image errors : No major errors detected, but the image feels a bit flat, lacking depth and a strong sense of presence.
Silhouetted Against the Sunset, a Moment of Contemplation
A man with short brown hair stands against a blurred urban backdrop, bathed in the warm glow of the setting sun. His serious expression and the play of light and shadow create a mood of quiet contemplation, hinting at a story waiting to be told.
Prompt
facial-expressions Determination: Confidence and unwavering resolve ; A hero standing on a rooftop; high-angle; Hero; A city skyline bathed in sunlight; cinematic
Characteristic
Shot : A man is standing in front of a blurry city background, the sun is setting behind him.
Aesthetic Score : 0.6
Mood : serious, contemplative, masculine
Quality
Entropy : 6.61
Noise : 65
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed, resulting in some blown-out highlights.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t fully capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.645, which falls within the “good” range. This indicates that the model was able to understand the scene and create a shot that was relatively close to what was described in the prompt.
- Aesthetic Analysis: The model scored 0.16, which is outside the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic didn’t quite match the expected aesthetic described in the prompt.
Overall, the model demonstrated a good understanding of the scene and shot composition, but needs improvement in capturing the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux/schnell/api