AI's Struggle with Camera Angles: A Case Study in Facial Expressions with Stable-diffusion
- 9 minutes read - 1807 wordsTable of Contents
In the realm of artificial intelligence, generating images with specific facial expressions is a challenging task. This blog post examines the performance of a generative AI model in capturing the nuances of facial expressions, focusing on its ability to understand camera position, shot composition, and aesthetic style. We’ll explore how the model excels in creating visually coherent shots and achieving the desired aesthetic, but struggles with accurately capturing the intended camera position. Through this analysis, we gain insights into the strengths and limitations of AI in generating images with expressive power.
Created with: stability-ai-core
Lost in the Neon Rain
A solitary figure walks through a rain-soaked city alley, their silhouette stark against the vibrant neon reflections. The atmosphere is dark and mysterious, hinting at secrets hidden in the shadows.
Prompt
facial-expressions Surprise: Eerie, suspenseful ; A lone figure walking down a deserted street; eye-level; Single Person; neon signs reflecting in puddles; cinematic
Characteristic
Shot : A lone figure walks down a wet, narrow alleyway at night, illuminated by neon signs and reflections in puddles.
Aesthetic Score : 0.8
Mood : mysterious, urban, atmospheric
Quality
Entropy : 6.01
Noise : 84
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors.
Superman, Guardian of the Night
A dramatic shot of Superman standing tall on a rooftop, bathed in the glow of the city lights. The image captures his heroic presence and the power he commands, leaving viewers in awe of the Man of Steel.
Prompt
facial-expressions Surprise: Triumphant, awe-inspiring ; A superhero standing on a rooftop, looking out over the city; eye-level; Hero; cityscape at night, with flashing lights and sirens in the distance; cinematic
Characteristic
Shot : Superman standing on a rooftop overlooking a cityscape at night
Aesthetic Score : 0.6
Mood : heroic, dramatic, powerful
Quality
Entropy : 6.73
Noise : 74
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.70
Image errors : There are some minor artifacts in the image, particularly around the edges of the cape and the city lights. The image appears to be slightly over-sharpened, which results in a slightly artificial look.
A Family Dinner, But Something Feels Off
A warm, dimly lit kitchen scene reveals a family gathered for dinner. The adults’ gaze towards the camera and the children’s focus on their parents create a palpable tension. The quiet atmosphere and lack of eye contact suggest an unspoken discomfort, leaving the viewer to wonder what secrets lie beneath the surface.
Prompt
facial-expressions Surprise: Innocent, unsettling ; A family having dinner together, unaware of the approaching danger; eye-level; Normal People; cozy kitchen, warm lighting; cinematic
Characteristic
Shot : A family is sitting around a dinner table in a warm, dimly lit kitchen. The table is set with plates of food, glasses of wine, and a lit candle. The family members are looking at each other and talking, creating a sense of intimacy and connection.
Aesthetic Score : 0.7
Mood : intimate, warm, subdued
Quality
Entropy : 6.72
Noise : 75
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors or artifacts in the image.
Lost in the Code: A Moment of Intense Focus
A young man, bathed in the glow of his computer screen, is completely absorbed in his work. The dimly lit room and the dramatic play of light and shadow emphasize his intense focus, creating a powerful image of dedication and technological immersion.
Prompt
facial-expressions Surprise: Intense, focused ; A gamer sitting in a dimly lit room, eyes glued to the screen; close-up; Gamer; glowing monitor, keyboard, and mouse; cinematic
Characteristic
Shot : A young man sits at a desk in a dark room, wearing headphones and typing on a keyboard. There are two computer monitors in the background.
Aesthetic Score : 0.6
Mood : focused, intense, serious
Quality
Entropy : 6.16
Noise : 61
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors
Lost in the Crowd: Fear Grips a Woman in a Bustling Station
A woman stands amidst a sea of faces at a train station, her expression etched with fear. The blurry background adds to the sense of urgency and uncertainty, leaving the viewer wondering what she is running from.
Prompt
facial-expressions Surprise: Panic, frantic ; A woman standing in a crowded train station, suddenly realizing she’s lost her purse; eye-level; Single Person; bustling crowd, hurried footsteps; cinematic
Characteristic
Shot : A woman is standing in a crowded train station, looking startled, with a train in the background.
Aesthetic Score : 0.6
Mood : suspense, anxiety, tense
Quality
Entropy : 6.60
Noise : 74
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible errors in the image.
City in Flames: A Post-Apocalyptic Vision
A haunting collage captures the chaos and destruction of a city consumed by fire. The intense flames and billowing smoke create a sense of urgency and danger, while the figures in the foreground highlight the devastating scale of the apocalypse.
Prompt
facial-expressions Surprise: Brave, heroic ; A hero emerging from a burning building, carrying a child; eye-level; Hero; smoke and flames, collapsing structure; cinematic
Characteristic
Shot : A montage of three images depicting a warzone. Buildings are burning and people are running for their lives. The focus is on the fire and the chaos of the war.
Aesthetic Score : 0.7
Mood : intense, chaotic, dramatic
Quality
Entropy : 6.85
Noise : 86
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is a slight blur on the left side of the first image, and some noise in the second and third images, especially in the darker areas.
A Moment of Shared Wonder
A group of friends, united in laughter and curiosity, share a picnic in a sun-drenched park. Their gaze is fixed on something unseen, creating a sense of playful anticipation and shared joy. The vibrant colors and relaxed atmosphere capture the essence of a perfect summer day.
Prompt
facial-expressions Surprise: Peaceful, ominous ; A group of friends enjoying a picnic in a park, unaware of the strange object falling from the sky; eye-level; Normal People; sunny day, green grass, blue sky; cinematic
Characteristic
Shot : A group of friends are enjoying a picnic in a park on a sunny day. They are sitting on a blanket and eating food. There are trees in the background.
Aesthetic Score : 0.7
Mood : happy, relaxed, friendly
Quality
Entropy : 6.75
Noise : 82
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slight blur in the background. This could be due to the shallow depth of field or the movement of the subjects. The image is also slightly overexposed, which makes the colors look washed out.
Lost in the Code: A Young Man’s Intense Focus Under Dim Lights
A young man, headphones on, is completely absorbed in his work, typing furiously on a keyboard in a dimly lit room. The low lighting and close-up shot create a palpable sense of tension and suspense, highlighting the intensity of his focus.
Prompt
facial-expressions Surprise: Disbelief, frustration ; A gamer’s hands frantically moving across the keyboard, as a sudden glitch appears on the screen; close-up; Gamer; distorted screen, flashing lights; cinematic
Characteristic
Shot : A man wearing headphones is looking intently at a computer screen. He is typing on a keyboard. The scene is dimly lit.
Aesthetic Score : 0.5
Mood : intense, focused, serious
Quality
Entropy : 5.81
Noise : 66
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
A Shadow in the Woods: Man Encounters Monstrous Creature
A hiker stumbles upon a chilling sight in the heart of the forest. A monstrous creature, adorned with antlers and covered in moss, lurks unseen behind him. The composition evokes a sense of unease and anticipation, leaving the viewer wondering what fate awaits the unsuspecting man.
Prompt
facial-expressions Surprise: Mystical, awe-inspiring ; A man walking through a forest, suddenly finding himself face-to-face with a mythical creature; eye-level; Single Person; dense foliage, dappled sunlight; cinematic
Characteristic
Shot : A man is walking through a forest, unaware of a large, monstrous deer creature standing behind him.
Aesthetic Score : 0.6
Mood : eerie, mysterious, suspenseful
Quality
Entropy : 6.83
Noise : 92
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The monster’s antlers and face appear slightly unnatural, and the lighting is somewhat inconsistent.
A Soldier’s Gaze into the Heart of War
A lone soldier stands amidst the devastation of a war-torn landscape, his serious expression reflecting the tense and dramatic atmosphere. Flames and smoke billow in the background, creating a sense of danger and chaos, while rubble litters the ground, a stark reminder of the destruction wrought by conflict.
Prompt
facial-expressions Surprise: Melancholy, reflective ; A hero standing on a battlefield, surrounded by fallen enemies, realizing the true cost of victory; eye-level; Hero; smoke and debris, wounded soldiers; cinematic
Characteristic
Shot : A soldier in a tattered uniform stands in a war-torn landscape, with smoke and flames in the background.
Aesthetic Score : 0.7
Mood : dramatic, somber, gritty
Quality
Entropy : 6.82
Noise : 74
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : Slight noise in the background and some artifacts in the smoke.
Conclusion
The results of the analysis show that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic expectations. Here’s a breakdown:
- Camera Position: The model scored 0.15, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.52, which is considered good. This indicates that the model was able to understand and translate the scene description in the prompt into a visually coherent shot.
- Aesthetic Analysis: The model scored 0.12, which is considered very good. This means that the generated image closely matched the expected aesthetic style, despite the issues with camera position.
Overall, the model demonstrates a good understanding of shot composition but needs improvement in accurately capturing the intended camera position. The model’s ability to achieve the desired aesthetic style is a positive sign.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://stability.ai