AI's Facial Expressions: A Step Towards Realism, But Still Room for Growth with Stable-diffusion
- 9 minutes read - 1778 wordsTable of Contents
Facial expressions are a powerful tool in storytelling, conveying emotions and adding depth to characters. In the realm of AI-generated imagery, capturing these expressions accurately is crucial for creating compelling and engaging visuals. This analysis explores the performance of a generative AI model in generating images with specific facial expressions, highlighting its strengths and weaknesses in understanding and translating complex prompts.
Created with: stability-ai-core
Lost in the Neon Maze
A solitary figure navigates the bustling city streets at night, bathed in the vibrant glow of neon signs. The shallow depth of field isolates the man, creating a sense of mystery and intrigue. His expression hints at a hidden story, leaving you wondering what secrets lie within the urban labyrinth.
Prompt
facial-expressions Confusion: Disoriented, overwhelmed ; A lone figure; eye-level; Single Person; a bustling city street with neon signs and crowds; cinematic
Characteristic
Shot : A man is standing in the middle of a street in a city, there are many neon signs, there is a lot of light pollution
Aesthetic Score : 0.7
Mood : mysterious, urban, lonely
Quality
Entropy : 6.46
Noise : 67
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some noise, particularly in the shadows. The neon signs are also a bit overexposed, which makes them lose some detail.
Superman Rises from the Ashes
A gritty and realistic depiction of Superman standing amidst a destroyed city, his determined gaze promising hope in the face of devastation. The contrast between his powerful physique and the ruined cityscape creates a powerful sense of drama and intensity.
Prompt
facial-expressions Confusion: Doubt, uncertainty ; A superhero in a tattered costume; eye-level; Hero; a destroyed cityscape with smoke and debris; cinematic
Characteristic
Shot : A superhero, possibly Superman, is walking through a destroyed city, looking determined and with a slight grimace on his face.
Aesthetic Score : 0.7
Mood : dramatic, gritty, somber
Quality
Entropy : 6.84
Noise : 84
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.60
Image errors : No significant image errors. The lighting could be slightly improved.
Corporate Tensions Rise: What’s the Big Secret?
A sense of unease hangs in the air as a group of corporate professionals gather, their gazes fixed on something unseen. The mood is serious, the atmosphere tense, and the question remains: what is the source of this palpable tension?
Prompt
facial-expressions Confusion: Lost, unmoored ; A woman in a business suit; eye-level; Normal People; a sterile office with fluorescent lights and cubicles; cinematic
Characteristic
Shot : The image portrays a series of scenes within an office setting, featuring individuals in professional attire, likely involved in a high-stakes situation. The visual style leans towards a dramatic and suspenseful aesthetic.
Aesthetic Score : 0.7
Mood : tense, professional, serious
Quality
Entropy : 6.77
Noise : 67
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors were detected in the image. The lighting and color balance are consistent across the scenes, and the image appears to be of high quality.
In the Zone: A Gamer’s Focus Under Low Light
A young man, headphones on, sits before a wall of computer monitors, his expression intense and focused. The low lighting adds a dramatic edge, highlighting his concentration as he navigates the digital world.
Prompt
facial-expressions Confusion: Frustration, bewilderment ; A gamer with headphones on; close-up; Gamer; a dimly lit room with a computer screen displaying a complex game interface; cinematic
Characteristic
Shot : A young man wearing headphones is sitting in front of a computer screen in a dimly lit room. He appears to be focused on something on the screen.
Aesthetic Score : 0.6
Mood : focused, serious, concentrated
Quality
Entropy : 6.11
Noise : 66
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Lost in the Fog: A Man’s Shadowy Journey
A solitary figure, cloaked in a trench coat, stands amidst the swirling fog of a narrow alleyway. The dim glow of streetlamps casts long, eerie shadows, adding to the atmosphere of mystery and suspense. This brooding scene evokes a sense of intrigue, leaving the viewer to wonder about the man’s secrets and the path he is destined to take.
Prompt
facial-expressions Confusion: Suspicious, wary ; A man in a trench coat; eye-level; Single Person; a foggy alleyway with flickering streetlights; cinematic
Characteristic
Shot : A man in a trench coat stands in a foggy, cobblestone alleyway. The light from streetlamps casts long shadows.
Aesthetic Score : 0.7
Mood : mysterious, moody, atmospheric
Quality
Entropy : 6.63
Noise : 71
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to have some noise and artifacting, especially in the shadows.
The Antlered Knight: A Mystery in the Woods
A shadowy figure in medieval armor, adorned with antlers, stands amidst a blurred forest. The image evokes a sense of mystery and suspense, leaving the viewer to ponder the knight’s purpose and the secrets hidden within the woods.
Prompt
facial-expressions Confusion: Disillusioned, lost ; A knight in shining armor; eye-level; Hero; a dark forest with twisted trees and ominous shadows; cinematic
Characteristic
Shot : A collage of nine images of a man in a knight’s armor and helmet, set in a dark forest with tall trees
Aesthetic Score : 0.6
Mood : dark, mysterious, fantasy
Quality
Entropy : 6.66
Noise : 84
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.70
Image errors : The images appear to have been digitally manipulated, with some imperfections in the edges and seams, and some slight blurring.
Tension at the Table: A Moment of Uncomfortable Truth
A group of people gather around a kitchen table, their faces etched with tension. The warm lighting and remnants of a meal create a stark contrast to the palpable unease in the air. The composition, with characters tightly clustered, amplifies the feeling of claustrophobia and suspense, leaving the viewer wondering what secrets lie beneath the surface.
Prompt
facial-expressions Confusion: Awkward, uncomfortable ; A family at a dinner table; eye-level; Normal People; a brightly lit kitchen with mismatched plates and silverware; cinematic
Characteristic
Shot : A group of people are sitting around a table eating dinner. There is a tense atmosphere in the room, and it appears as if they are having a difficult conversation. The lighting is warm and inviting, but the composition is not very dynamic.
Aesthetic Score : 0.6
Mood : tense, serious, somber
Quality
Entropy : 6.79
Noise : 77
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed in some areas, and the colors are a little bit muted.
Gamer’s Shock: Caught in the Heat of the Game
A young man’s face is etched with surprise and intensity as he plays video games, surrounded by multiple screens. The scene captures the thrill and focus of a gamer fully immersed in the digital world.
Prompt
facial-expressions Confusion: Overwhelmed, disoriented ; A gamer holding a controller; close-up; Gamer; a brightly lit room with a TV screen displaying a chaotic game scene; cinematic
Characteristic
Shot : A young man wearing a headset and holding a gaming controller, sitting in a dimly lit room with multiple screens behind him.
Aesthetic Score : 0.6
Mood : intense, focused, surprised
Quality
Entropy : 6.53
Noise : 62
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors in the image.
Lost in the City: A Woman’s Solitary Stroll
A woman walks through a bustling city street, her face shrouded in mystery as the shallow depth of field isolates her from the surrounding crowd. The urban landscape and pensive mood create a sense of intrigue, leaving you wondering about her story.
Prompt
facial-expressions Confusion: Lost, alienated ; A woman walking down a crowded street; eye-level; Single Person; a bustling city street with people rushing past; cinematic
Characteristic
Shot : A woman in a trench coat is walking down a busy city street, the city background is blurred and out of focus
Aesthetic Score : 0.7
Mood : mysterious, urban, cool
Quality
Entropy : 6.79
Noise : 77
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors
City Lights, City Hope: A Superhero Stands Watch
A lone figure, silhouetted against the moonlit cityscape, embodies hope and heroism. This dramatic image captures the essence of a superhero’s unwavering commitment to protecting the city below.
Prompt
facial-expressions Confusion: Doubt, questioning ; A superhero standing on a rooftop; eye-level; Hero; a cityscape with twinkling lights and a full moon; cinematic
Characteristic
Shot : A superhero, possibly Superman, stands on a rooftop overlooking a city at night, the full moon illuminating the scene.
Aesthetic Score : 0.6
Mood : dramatic, heroic, powerful
Quality
Entropy : 6.70
Noise : 70
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image exhibits some noticeable artifacts, particularly in the background cityscape, suggesting some level of digital manipulation.
Conclusion
The results show that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.25, which is below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.45, which is also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflects it.
- Aesthetic Analysis: The model scored 0.1, which is considered very good. This means that the generated image closely matched the expected aesthetic style, despite the issues with camera position and scene understanding.
Overall, the model seems to be better at capturing the desired aesthetic than understanding the scene and camera position. This suggests that the model might need further training to improve its ability to interpret and translate complex prompts into visually accurate images.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://stability.ai