AI Captures the Scene, But Struggles with the Shot with Flux-dev
- 9 minutes read - 1808 wordsTable of Contents
In the realm of artificial intelligence, generative models are pushing the boundaries of creativity. These models can generate images, text, and even music based on user prompts. One area of particular interest is the ability of these models to capture and express human emotions through facial expressions. This blog post explores the performance of a generative AI model in creating images that convey a range of emotions through facial expressions. We’ll analyze the model’s strengths and weaknesses, examining how it handles different scenarios and the nuances of human emotion.
Created with: flux-dev
The Frustration of a Bug
A young man sits at his desk, headphones on, eyes glued to the computer screen. His hand rests on the keyboard, but his expression is one of frustration. The error message on the screen speaks volumes about the struggle he’s facing. This image captures the all-too-familiar feeling of being stuck, unable to move forward.
Prompt
facial-expressions Sadness: Frustration, defeat ; A gamer’s hands on a keyboard; close-up; Gamer; Screen displaying a game over message; cinematic
Characteristic
Shot : A person is sitting in front of a computer, typing on the keyboard, and looking at the screen. The person is wearing a headset.
Aesthetic Score : 0.4
Mood : focused, concentrated, technological
Quality
Entropy : 6.42
Noise : 60
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is a bit blurry and the colors are not very vibrant. There is some noise in the image, especially around the person’s face and the keyboard.
Silhouette of Mystery in a Gloomy Hallway
A single, dark silhouette of a child stands in a dimly lit hallway, casting an eerie presence. The contrast between the light spilling from an open door and the surrounding darkness creates a sense of suspense and isolation. This image evokes a feeling of gloom and mystery, leaving the viewer wondering what secrets lie within the shadows.
Prompt
facial-expressions Sadness: Loneliness, abandonment ; A child standing in a doorway; eye-level; Single Person; Empty hallway, dim lighting; cinematic
Characteristic
Shot : A child stands in a dimly lit hallway with walls on either side and a door at the end.
Aesthetic Score : 0.4
Mood : melancholy, eerie, suspenseful
Quality
Entropy : 6.26
Noise : 46
Prompt Clip Score : 0.18
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly grainy and the shadows are very dark.
Lost in the Fog: A Moment of Contemplation
A solitary figure sits on a park bench, enveloped in a thick fog. The changing leaves and the man’s posture evoke a sense of melancholy and contemplation, highlighting the feeling of isolation in the midst of nature’s beauty.
Prompt
facial-expressions Sadness: Melancholy, loneliness ; A lone figure; eye-level; Single Person; Empty park bench with fallen leaves; cinematic
Characteristic
Shot : A lone man sits on a bench in a park, looking out at the foggy trees. There are fallen leaves around him, suggesting autumn.
Aesthetic Score : 0.6
Mood : melancholy, contemplative, lonely
Quality
Entropy : 6.72
Noise : 62
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to be slightly overexposed, especially in the background, resulting in some loss of detail.
Lost in the City’s Embrace
A woman, shrouded in mystery, walks through a blurred cityscape. Her pensive expression and the dramatic lighting create a sense of melancholy and intrigue, leaving the viewer wondering about her story.
Prompt
facial-expressions Sadness: Alienation, loneliness ; A woman walking down a crowded street; eye-level; Single Person; People passing by, oblivious to her; cinematic
Characteristic
Shot : A young woman in a black coat and scarf walks through a city street. The background is blurry and the focus is on the woman’s face.
Aesthetic Score : 0.6
Mood : melancholy, contemplative, urban
Quality
Entropy : 6.42
Noise : 61
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly grainy and there are some artifacts in the background.
Silhouetted Against the City Lights: A Moment of Contemplation
A man in a suit stands alone on a rooftop, his silhouette stark against the glittering cityscape. The night air is thick with the weight of his thoughts, a sense of loneliness and melancholy hanging heavy in the air. The dramatic effect of his isolation against the vibrant backdrop evokes a powerful sense of contemplation.
Prompt
facial-expressions Sadness: Reflection, introspection ; A hero standing on a rooftop; eye-level; Hero; City lights twinkling in the distance; cinematic
Characteristic
Shot : A man in a suit stands against a city skyline at dusk
Aesthetic Score : 0.7
Mood : melancholy, contemplative, lonely
Quality
Entropy : 6.51
Noise : 36
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight blurriness in the background. The subject’s hair appears a little too perfect and there is a faint white artifact at the top of his head.
The Dark Knight’s Secret: A Glimpse of the Unexpected
A close-up portrait reveals a superhero shrouded in mystery, their identity obscured by the rain-soaked city backdrop. The Superman symbol on their chest hints at a hidden connection, leaving us to wonder: who is this enigmatic figure, and what secrets lie beneath the surface?
Prompt
facial-expressions Sadness: Despair, disillusionment ; A superhero in their costume; eye-level; Hero; City skyline at night, rain falling; cinematic
Characteristic
Shot : A close-up shot of a person dressed as Batman, with the Superman symbol on their chest, standing in a dark, rainy city.
Aesthetic Score : 0.7
Mood : dark, brooding, mysterious
Quality
Entropy : 6.58
Noise : 81
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has a slight blurriness around the edges, which is likely due to the rain. There is also some noise in the image, particularly in the darker areas.
Lost in the Code: A Young Man’s Focused Determination
A dimly lit room, a pizza box in the foreground, and a young man engrossed in his work, headphones on, face illuminated by the computer screen. This image captures the intensity of focus and dedication, a testament to the power of concentration in the pursuit of a goal.
Prompt
facial-expressions Sadness: Isolation, withdrawal ; A gamer hunched over their computer; close-up; Gamer; Empty pizza boxes, energy drink cans; cinematic
Characteristic
Shot : A young man is sitting at a desk in a dimly lit room, wearing headphones and looking at a computer screen. There is a pizza box in front of him, suggesting he is taking a break from working or gaming.
Aesthetic Score : 0.6
Mood : focused, relaxed, techy
Quality
Entropy : 6.11
Noise : 58
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some noise and grain, particularly in the darker areas. The color balance is a bit off, with the blue tones being a little too strong.
Lost in Thought: A Moment of Quiet Contemplation
A woman sits alone at a kitchen table, lost in thought as she sips her coffee. The subdued lighting and her pensive posture create a sense of quiet contemplation and melancholic introspection.
Prompt
facial-expressions Sadness: Hopelessness, grief ; A woman sitting at a kitchen table; eye-level; Normal People; Empty coffee cup, unwashed dishes; cinematic
Characteristic
Shot : A young woman sits at a kitchen table, looking down thoughtfully, with a cup of tea in front of her. The kitchen is dimly lit and has a warm, cozy atmosphere.
Aesthetic Score : 0.7
Mood : melancholic, introspective, contemplative
Quality
Entropy : 6.60
Noise : 68
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant image errors.
Silhouetted Against the Setting Sun: A Soldier’s Melancholy
A lone soldier, silhouetted against a blazing sunset, kneels with a backpack on his shoulder. The dramatic image evokes a sense of melancholy and contemplation, highlighting the weight of the soldier’s experience.
Prompt
facial-expressions Sadness: Loss, regret ; A soldier kneeling on a battlefield; eye-level; Hero; Explosions in the distance, smoke filling the air; cinematic
Characteristic
Shot : A silhouette of a soldier in military fatigues and a backpack, kneeling in a field with a sunset in the background. The sunset is a fiery orange color with a large cloud of smoke billowing above it.
Aesthetic Score : 0.6
Mood : somber, reflective, melancholic
Quality
Entropy : 6.61
Noise : 53
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors
A Moment of Quiet Reflection
A couple shares a quiet moment on the couch, bathed in soft light. The woman’s gaze meets her partner’s, while he looks down, lost in thought. The scene evokes a sense of intimacy and introspection, capturing a moment of shared contemplation.
Prompt
facial-expressions Sadness: Silence, unspoken tension ; A couple sitting on a couch; eye-level; Normal People; Empty popcorn bowl, remote control on the floor; cinematic
Characteristic
Shot : A man and a woman are sitting on a couch, looking at each other. There is a bowl of popcorn on the coffee table between them. The scene is set in a living room, and it appears to be evening.
Aesthetic Score : 0.5
Mood : cozy, contemplative, romantic
Quality
Entropy : 6.39
Noise : 64
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly blurry, and the colors are a bit muted. The lighting is also a bit uneven.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.3, which is below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.54, which is considered good. This indicates that the model was able to understand the scene described in the prompt and create a shot that aligns with it.
- Aesthetic Analysis: The model scored 0.21, which is considered very good. This means that the generated image’s aesthetic closely matched the expected aesthetic described in the prompt.
Overall, the model demonstrates a good understanding of the scene and shot composition, but needs improvement in accurately capturing the intended camera position. The aesthetic of the generated image is very close to the expected aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux/dev/api