AI's Artistic Eye: Capturing Emotion, Missing the Shot with Stability-ai-ultra
- 9 minutes read - 1841 wordsTable of Contents
Facial expressions are a powerful tool in storytelling, conveying emotions and intentions without words. In the realm of generative AI, the ability to create images with nuanced facial expressions is a crucial step towards realistic and engaging content. This blog post delves into the performance of a generative AI model in capturing facial expressions, analyzing its strengths and weaknesses in various scenarios.
Created with: stability-ai-ultra
Lost in the Neon Rain
A solitary figure walks through a city drenched in rain, the vibrant glow of neon signs reflecting off the wet pavement. The scene evokes a sense of mystery and solitude, creating a visually striking and almost surreal atmosphere.
Prompt
facial-expressions Realization: Melancholy, introspective ; A lone figure; eye-level; Single Person; a bustling city street at night, with neon signs and rain reflecting on the wet pavement; cinematic
Characteristic
Shot : A lone figure walks down a wet, neon-lit street in a city, likely in Asia, in the rain. The cityscape is blurred and there are many signs in an unknown language.
Aesthetic Score : 0.8
Mood : melancholy, urban, mysterious
Quality
Entropy : 6.90
Noise : 92
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image appears to have some artifacts and blur, especially on the neon signs and the figure.
Superman Silhouettes Against a Hopeful Sunset
A powerful image captures Superman standing atop a skyscraper, bathed in the golden light of a setting sun. The dramatic lighting and his heroic pose evoke a sense of epic grandeur and hope for the future.
Prompt
facial-expressions Realization: Triumphant, awe-inspiring ; A superhero, standing atop a skyscraper; wide shot; Hero; a sprawling cityscape bathed in the golden light of sunset; cinematic
Characteristic
Shot : A superhero, likely Superman, stands on a rooftop overlooking a cityscape at sunset. The sun is setting in the distance, casting a warm glow over the city. The superhero’s back is to the viewer, and they are looking out at the city.
Aesthetic Score : 0.6
Mood : heroic, hopeful, dramatic
Quality
Entropy : 6.68
Noise : 90
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to have been generated using AI and there is some blurriness and unrealistic details in the cityscape and the superhero. The lighting is also slightly off.
A Moment of Quiet Despair
A young woman sits alone at a messy kitchen table, her gaze fixed on the camera. Her floral dress and the scattered food scraps create a stark contrast to the melancholy that hangs in the air. The image evokes a sense of unease and loneliness, leaving the viewer to ponder the unspoken story behind her sad expression.
Prompt
facial-expressions Realization: Disillusioned, resigned ; A young woman, sitting at a kitchen table; close-up; Normal People; a cluttered kitchen, with dishes piled in the sink and a half-eaten meal on the table; cinematic
Characteristic
Shot : A young woman sits at a kitchen table with a plate of salad in front of her. The table is messy with food scraps and crumbs.
Aesthetic Score : 0.6
Mood : melancholy, thoughtful, somber
Quality
Entropy : 6.73
Noise : 78
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : No notable artifacts or errors.
Lost in the Game: A Gamer’s Intense Focus Under Neon Lights
A young man is completely absorbed in his video game, his face illuminated by the vibrant blue and red lighting of his dimly lit room. The intensity of his focus is palpable, creating a sense of excitement and immersion in the digital world.
Prompt
facial-expressions Realization: Intense, focused ; A gamer, hunched over a computer screen; close-up; Gamer; a dimly lit room, with flashing lights from the monitor and empty pizza boxes scattered around; cinematic
Characteristic
Shot : A young man is playing a game on his computer in a dimly lit room. He is wearing headphones and is focused on the screen.
Aesthetic Score : 0.6
Mood : intense, focused, dramatic
Quality
Entropy : 6.63
Noise : 72
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry in the background.
Lost in the Crowd: A Man’s Solitary Gaze
A man stands alone in a bustling train station, his intense gaze piercing through the camera lens. The blurred background and low lighting amplify his sense of isolation, creating a mood of intensity, seriousness, and loneliness.
Prompt
facial-expressions Realization: Lost, alienated ; A man, walking through a crowded train station; eye-level; Single Person; a sea of faces, all rushing in different directions; cinematic
Characteristic
Shot : A man stands out in a crowd of people. He is looking directly at the camera with a serious expression.
Aesthetic Score : 0.7
Mood : serious, intense, mysterious
Quality
Entropy : 6.84
Noise : 81
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors
Superman Rises from the Ashes in Epic Showdown
A dramatic scene unfolds as Superman stands tall amidst a city consumed by flames. The contrasting colors of the burning cityscape and his iconic costume create a powerful visual, highlighting the intensity of the moment and the hero’s unwavering resolve.
Prompt
facial-expressions Realization: Determined, resolute ; A superhero, standing in the middle of a battle; wide shot; Hero; a chaotic scene of destruction and explosions, with enemies closing in; cinematic
Characteristic
Shot : Superman stands in the midst of a fiery city-scape, with large explosions and rubble around him.
Aesthetic Score : 0.7
Mood : intense, heroic, dramatic
Quality
Entropy : 6.84
Noise : 99
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.80
Image errors : Minor artifacts around the explosions and some blurriness on the background elements, but they don’t detract significantly from the overall image.
A Family’s Joyful Gathering Around a Candlelit Table
This heartwarming scene captures a family of four sharing a meal and laughter, bathed in warm light. The candle in the center adds a touch of intimacy, creating a sense of togetherness and comfort.
Prompt
facial-expressions Realization: Nostalgic, heartwarming ; A family, gathered around a dinner table; medium shot; Normal People; a warm and inviting kitchen, with the aroma of home-cooked food filling the air; cinematic
Characteristic
Shot : A family of four is enjoying a meal together at a table. The scene is warm and inviting. The lighting is soft and the table is set with food and a candle.
Aesthetic Score : 0.7
Mood : happy, warm, cozy
Quality
Entropy : 6.90
Noise : 83
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors or artifacts in the image.
Lost in the Code: A Silhouette of Focus
A solitary figure, bathed in warm orange light, sits before a blank computer screen. Headphones on, they are lost in a world of code, their silhouette a testament to their intense focus. The dimly lit room adds to the sense of mystery and intrigue, leaving us to wonder what secrets lie within the digital realm.
Prompt
facial-expressions Realization: Defeated, frustrated ; A gamer, staring at a blank screen; close-up; Gamer; a dimly lit room, with the only light coming from the monitor, which is now displaying a game over message; cinematic
Characteristic
Shot : A young man sits at a desk facing a computer monitor in a dimly lit room. He is wearing headphones and a grey t-shirt. The screen is blank. The image has a low angle view.
Aesthetic Score : 0.4
Mood : dark, focused, solitary
Quality
Entropy : 6.37
Noise : 66
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors
Silhouetted Solitude at Sunset
A lone figure stands on a cliff, silhouetted against a breathtaking sunset. The vibrant hues of the sky and the golden glow on the water create a serene and contemplative mood, emphasizing the majesty of nature and the solitude of the moment.
Prompt
facial-expressions Realization: Reflective, contemplative ; A woman, standing on a cliff overlooking the ocean; eye-level; Single Person; a vast expanse of blue water stretching out to the horizon, with the sun setting in the distance; cinematic
Characteristic
Shot : A solitary figure stands on a rocky cliff overlooking a vast ocean at sunset.
Aesthetic Score : 0.75
Mood : serene, peaceful, contemplative
Quality
Entropy : 6.67
Noise : 84
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to be slightly overexposed, particularly in the sky. There may be some minor noise in the darker areas, but it’s not significant.
Superman: Hope Amidst the Ruins
As the sun sets on a ravaged city, Superman stands tall, a beacon of hope amidst the destruction. The dramatic backdrop of smoke, debris, and the fading light evokes a sense of impending doom, yet Superman’s presence inspires a glimmer of optimism for the future.
Prompt
facial-expressions Realization: Hopeful, determined ; A superhero, standing in the ruins of a city; wide shot; Hero; a desolate landscape, with smoke rising from the rubble and the sun breaking through the clouds; cinematic
Characteristic
Shot : A man dressed as Superman stands in a destroyed city with rubble and smoke around him. The background is a sunset sky.
Aesthetic Score : 0.7
Mood : heroic, dramatic, powerful
Quality
Entropy : 6.86
Noise : 84
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a few minor artifacts, particularly in the smoke and the rubble. The color balance is slightly off, with the colors appearing a bit too saturated.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.25, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.435, which is also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflected the intended shot.
- Aesthetic Analysis: The model scored 0.14, which is considered very good. This means that the generated image closely matched the expected aesthetic style, despite the issues with camera position and shot analysis.
Overall, the model seems to be better at capturing the desired aesthetic style than accurately interpreting the camera position and shot descriptions. This suggests that the model might need further training to improve its understanding of these aspects of image generation.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://stability.ai