AI Captures the Essence of Emotion, But Struggles with Camera Angles with Stable-diffusion
- 9 minutes read - 1819 wordsTable of Contents
The ability to generate realistic and expressive facial expressions is a crucial aspect of AI-generated imagery. This study explores the capabilities of a generative AI model in capturing the nuances of human emotion through facial expressions. While the model demonstrates impressive skill in capturing the essence of emotion and achieving the desired aesthetic, it struggles with accurately representing the camera position described in the prompts. This suggests that the model may not yet fully understand the relationship between camera angles and the resulting perspective in an image. This blog post delves into the findings of this study, exploring the strengths and weaknesses of the model and discussing the implications for the future of AI-generated imagery.
Created with: stability-ai-core
Autumn Melancholy
A woman sits on a park bench, surrounded by fallen leaves, lost in thought. The vibrant autumn colors and her wistful expression evoke a sense of quiet contemplation and perhaps a touch of sadness.
Prompt
facial-expressions Attentiveness: Melancholy, yet observant ; A lone figure sitting on a park bench; eye-level; Single Person; bustling city park in the background; cinematic
Characteristic
Shot : A woman is sitting on a bench in a park, surrounded by autumn leaves. She is looking off to the side, and appears to be lost in thought.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, wistful
Quality
Entropy : 6.78
Noise : 74
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors
Superman: A Silhouette of Power
A dramatic image of Superman standing on a rooftop at dusk, his cape billowing in the wind. The lighting and pose create a sense of heroism and power, capturing the essence of the iconic superhero.
Prompt
facial-expressions Attentiveness: Determined, vigilant ; A superhero standing on a rooftop, looking out over the city; eye-level; Hero; cityscape with twinkling lights; cinematic
Characteristic
Shot : A man dressed as Superman stands on a rooftop overlooking a city at dusk.
Aesthetic Score : 0.7
Mood : heroic, dramatic, powerful
Quality
Entropy : 6.87
Noise : 74
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.60
Image errors : There are some minor artifacts in the image, particularly in the background city skyline. The subject’s costume appears to be somewhat plastic and unrealistic.
Lost in Thought: A Moment of Contemplation on the Train
A young woman finds solace in a book, her pensive gaze fixed on the passing scenery. The blurred background emphasizes her isolation and introspective mood, creating a sense of calm and quiet reflection.
Prompt
facial-expressions Attentiveness: Focused, absorbed ; A woman reading a book on a train; eye-level; Normal Person; blurred passengers and train windows; cinematic
Characteristic
Shot : A woman wearing glasses sits on a train and reads a book. There are other passengers in the background.
Aesthetic Score : 0.7
Mood : calm, contemplative, thoughtful
Quality
Entropy : 6.72
Noise : 70
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no significant image errors, but some minor details, like the woman’s glasses and the book’s pages, could be a bit sharper.
Focused Intensity: A Gamer’s Dedication
A young man, lost in the digital world, sits at his desk with unwavering focus. The dimly lit room and his intense gaze speak volumes about his dedication to the task at hand. This image captures the essence of a gamer’s concentration, highlighting the serious and immersive nature of their passion.
Prompt
facial-expressions Attentiveness: Thrilled, competitive ; A gamer intensely focused on a screen, fingers flying across the keyboard; close-up; Gamer; dimly lit room with glowing monitor; cinematic
Characteristic
Shot : A young man in a dark room with a headset, sitting in front of a computer, typing on a keyboard. The room is lit with blue and green lighting.
Aesthetic Score : 0.7
Mood : serious, focused, intense
Quality
Entropy : 5.94
Noise : 60
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is a slight blur on the background monitors and the image quality is slightly grainy, making it appear as if the image has been digitally altered.
Lost in the City’s Pulse
A solitary figure navigates the bustling urban landscape, his serious gaze and the blurred background highlighting a moment of introspection amidst the city’s relentless energy.
Prompt
facial-expressions Attentiveness: Lost in thought, introspective ; A man walking down a crowded street, seemingly oblivious to the chaos around him; eye-level; Single Person; bustling city street with people and traffic; cinematic
Characteristic
Shot : A man is walking down a busy city street, looking straight ahead with a focused expression. The people around him are blurred, giving the impression of movement and anonymity.
Aesthetic Score : 0.7
Mood : serious, urban, contemplative
Quality
Entropy : 6.78
Noise : 75
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : No notable image errors. There may be slight noise in the background due to blurring.
Heroic Stand Amidst Chaos
A lone warrior, bathed in the glow of explosions, stands defiant against a backdrop of smoke and destruction. His stoic gaze and the dramatic lighting create a powerful image of courage and resilience in the face of overwhelming odds.
Prompt
facial-expressions Attentiveness: Brave, fearless ; A hero standing in the middle of a battle, eyes locked on the enemy; eye-level; Hero; chaotic battlefield with explosions and smoke; cinematic
Characteristic
Shot : A lone warrior stands in the foreground, facing the camera. He is in the middle of a battlefield with fire and smoke in the background. He is clad in dark armor with gold accents. Several other armored warriors are visible in the background, as well as some burning debris.
Aesthetic Score : 0.7
Mood : intense, dramatic, epic
Quality
Entropy : 6.77
Noise : 78
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts, particularly in the smoke and fire areas. The lighting is slightly uneven, with some areas appearing overexposed or underexposed.
Generations United: A Tender Moment of Connection
In a heartwarming scene, a grandmother and her two young granddaughters share a tender moment in the comfort of their living room. The grandmother’s gentle touch and the girls’ curious expressions create an intimate atmosphere, while the mysterious lighting adds a touch of drama to this intimate family scene.
Prompt
facial-expressions Attentiveness: Curious, engaged ; A young girl listening intently to her grandmother tell a story; eye-level; Normal Person; cozy living room with warm lighting; cinematic
Characteristic
Shot : An elderly woman sitting next to two young girls. It seems like a story about three generations, the grandmother, the mother, and the daughter.
Aesthetic Score : 0.7
Mood : warm, intimate, contemplative
Quality
Entropy : 6.64
Noise : 76
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry and the lighting is uneven. The colors are also a bit muted.
Pure Joy: Capturing the Excitement of the Game
This photo perfectly encapsulates the thrill of watching a game with friends. The wide smile and focused gaze of the main subject, against the blur of the cheering crowd, speaks volumes about the energy and excitement of the moment. It’s a snapshot of pure joy and shared passion.
Prompt
facial-expressions Attentiveness: Joyful, triumphant ; A gamer celebrating a victory, eyes wide with excitement; close-up; Gamer; brightly lit room with cheering friends; cinematic
Characteristic
Shot : A group of young men are celebrating a victory while wearing headphones, the focus is on the man in the foreground with a wide smile.
Aesthetic Score : 0.7
Mood : joyful, excited, celebratory
Quality
Entropy : 6.72
Noise : 70
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors.
Lost in Thought: A Moment of Contemplation in a Cozy Cafe
A woman finds solace in a bustling cafe, her thoughtful gaze fixed on the world outside. The warm lighting and gentle atmosphere create a sense of intimacy and introspection, capturing a moment of quiet contemplation.
Prompt
facial-expressions Attentiveness: Observant, introspective ; A woman sitting alone in a cafe, observing the people around her; eye-level; Single Person; bustling cafe with tables and chairs; cinematic
Characteristic
Shot : A woman sits alone at a cafe table, lost in thought, with a cup of coffee in front of her. The cafe is dimly lit, with a large window that looks out onto a busy street.
Aesthetic Score : 0.7
Mood : pensive, contemplative, relaxed
Quality
Entropy : 6.65
Noise : 69
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible artifacts or errors in the image.
Solitude on the Mountaintop: A Dreamy Landscape of Serenity
A lone figure stands silhouetted against a breathtaking vista of a winding river, distant snow-capped peaks, and a sky filled with fluffy clouds. The scene evokes a sense of serene contemplation and vastness, with the figure’s isolation highlighting the scale of the natural world.
Prompt
facial-expressions Attentiveness: Reflective, contemplative ; A hero standing on a cliff, looking out at the vast landscape; eye-level; Hero; dramatic mountain range with clouds and sunlight; cinematic
Characteristic
Shot : A lone figure stands on a mountain peak, gazing out at a vast valley with a river winding through it. The sky is filled with dramatic clouds, and the mountains in the distance are shrouded in a soft haze.
Aesthetic Score : 0.8
Mood : tranquil, contemplative, majestic
Quality
Entropy : 6.79
Noise : 77
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : None
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.1, indicating a very low ability to accurately represent the camera position described in the prompt. This suggests the model may not be very good at understanding and implementing camera angles.
- Shot Analysis: The model scored 0.53, which is considered good. This means the model was able to understand the scene described in the prompt and create an image that reflects it reasonably well.
- Aesthetic Analysis: The model scored 0.09, which is considered very good. This indicates that the generated image closely matched the expected aesthetic style.
Overall, the model seems to be better at understanding the scene and achieving the desired aesthetic than it is at accurately representing the camera position.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://stability.ai