AI's Artistic Eye: Capturing Emotion, Missing the Shot with Stability-ai-ultra
- 9 minutes read - 1779 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and intentions in visual storytelling. They can add depth and realism to characters, enhancing the viewer’s understanding and engagement with the narrative. However, capturing these subtle nuances in AI-generated images remains a challenge. This blog post explores a case study where a generative AI model was tasked with creating images based on descriptions, focusing on the model’s ability to capture facial expressions. The results highlight the model’s strengths and weaknesses, providing insights into the ongoing development of AI image generation.
Created with: stability-ai-ultra
Melancholy Streets: A Solitary Figure Walks Through the Rain
A lone figure traverses a rain-soaked city street at night, bathed in the warm glow of streetlights. The scene evokes a sense of melancholy and mystery, with the dramatic effect of isolation and contemplation.
Prompt
facial-expressions Guilt: Desolate, regretful ; A lone figure; eye-level; Single Person; Empty street at night, rain falling; cinematic
Characteristic
Shot : A lonely figure walks down a wet city street at night, with rain falling and the streetlights reflecting in the puddles.
Aesthetic Score : 0.7
Mood : gloomy, melancholic, mysterious
Quality
Entropy : 6.33
Noise : 101
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly pixelated, and some details appear slightly blurred. The edges of the image also seem slightly blurry, indicating a possible cropping of the image.
Superman: A City Awaits
A powerful and heroic image of Superman standing tall in a cityscape, his cape billowing behind him. The scene evokes a sense of drama and anticipation, hinting at the challenges that lie ahead.
Prompt
facial-expressions Guilt: Heavy, burdened, conflicted ; A superhero, cape billowing in the wind; medium shot; Hero; City skyline, destroyed buildings in the background; cinematic
Characteristic
Shot : Superman stands on a rooftop looking out over a city skyline.
Aesthetic Score : 0.7
Mood : heroic, dramatic, powerful
Quality
Entropy : 6.71
Noise : 89
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.90
Image errors : The background appears to be slightly blurry and the lighting is a bit uneven.
A Moment of Reflection: A Woman’s Silent Grief
A poignant image captures a woman in a yellow floral dress, her gaze fixed on a framed photograph. Her expression, a blend of sadness and contemplation, hints at a deep connection to the woman in the picture. The kitchen setting adds a touch of intimacy, while the overall mood evokes a sense of loss or longing.
Prompt
facial-expressions Guilt: Nostalgic, melancholic ; A woman holding a photo of a loved one; close-up; Normal Person; A cluttered kitchen, dishes piled in the sink; cinematic
Characteristic
Shot : A young woman is holding a framed photograph of an older woman, possibly her mother, in a kitchen setting. The background is a bit cluttered with kitchen items, but the focus is on the two women in the picture.
Aesthetic Score : 0.7
Mood : melancholic, sentimental, nostalgic
Quality
Entropy : 6.91
Noise : 87
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Lost in the Digital Realm
A young man, eyes glowing with digital light, is completely absorbed in the world unfolding on his computer screen. The neon-drenched room and his intense focus create a futuristic and captivating scene.
Prompt
facial-expressions Guilt: Isolated, self-loathing ; A gamer, hunched over a computer screen; close-up; Gamer; Neon lights reflecting in their eyes, empty pizza boxes scattered around; cinematic
Characteristic
Shot : A young man wearing a headset is playing a video game. He is lit from behind by neon pink and blue lights. His eye is glowing with a digital display. There are pizza boxes and snacks in the foreground.
Aesthetic Score : 0.6
Mood : cyberpunk, futuristic, intense
Quality
Entropy : 6.63
Noise : 76
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.50
Image errors : The image is slightly blurry, but the digital elements are well rendered.
Lost in the Crowd, Found in the Moment
A man stands amidst the vibrant chaos of a party, his gaze fixed on something beyond the revelry. The warm glow of string lights casts a nostalgic hue, hinting at a moment of quiet reflection amidst the celebration.
Prompt
facial-expressions Guilt: Alienated, invisible ; A man standing in a crowded room, looking lost; wide shot; Single Person; A party, people laughing and dancing, oblivious to him; cinematic
Characteristic
Shot : A man is standing in a crowded room, looking away from the camera. There are other people in the background, blurred, but the man is the focus.
Aesthetic Score : 0.6
Mood : pensive, intimate, melancholic
Quality
Entropy : 6.28
Noise : 84
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly grainy, and there are some artifacts in the background.
Hero Stands Victorious Amidst the Ashes
A powerful superhero dominates the scene, standing over a defeated villain in a post-apocalyptic cityscape. Smoke and fire engulf the background, creating a dramatic and intense atmosphere. The image captures the hero’s triumph and the devastating consequences of the battle.
Prompt
facial-expressions Guilt: Torn, conflicted, remorseful ; A hero, standing over a fallen villain; medium shot; Hero; A battlefield, smoke and debris everywhere; cinematic
Characteristic
Shot : A superhero, possibly Superman, stands over a fallen villain in a post-apocalyptic cityscape. The background features flames and smoke, creating a sense of destruction and chaos.
Aesthetic Score : 0.6
Mood : intense, dramatic, heroic
Quality
Entropy : 6.86
Noise : 76
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable image errors.
A Silent Dinner, Heavy with Tension
Four figures huddle around a dimly lit table, their expressions betraying a palpable unease. The low light and the weight of the silence amplify the awkwardness, leaving the viewer to wonder what secrets lie beneath the surface.
Prompt
facial-expressions Guilt: Awkward, strained, unspoken ; A family gathered around a table, but the atmosphere is tense; medium shot; Normal People; A dimly lit dining room, empty chairs at the table; cinematic
Characteristic
Shot : A group of people are sitting around a table in a dimly lit room. They appear to be having a conversation, but the mood is somber.
Aesthetic Score : 0.6
Mood : somber, tense, quiet
Quality
Entropy : 6.45
Noise : 82
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly grainy and the colors are a little bit washed out.
In the Zone: A Gamer’s Focus Illuminated
A young gamer, bathed in warm and cool lighting, sits intently at their computer desk, headphones on and controller in hand. The dramatic play of light and shadow emphasizes their laser-like focus on the game, capturing the intensity and seriousness of the moment.
Prompt
facial-expressions Guilt: Disillusioned, defeated, empty ; A gamer, staring at a blank screen, controller in hand; close-up; Gamer; A dimly lit room, empty energy drink cans scattered around; cinematic
Characteristic
Shot : A young man, wearing headphones, sits in front of a computer screen with a controller in his hand, gaming.
Aesthetic Score : 0.6
Mood : focused, intense, gaming
Quality
Entropy : 5.87
Noise : 69
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some slight noise in the background and a bit of blur in the edges.
Lost in the City’s Pulse
A solitary figure walks away from the camera, disappearing into the bustling urban landscape. The anonymity of the crowd and the vibrant neon glow create a sense of both isolation and intrigue. This image captures the contemplative mood of a city dweller navigating the urban jungle.
Prompt
facial-expressions Guilt: Lonely, isolated, rejected ; A woman walking away from a group of friends; long shot; Single Person; A bustling city street, people rushing by; cinematic
Characteristic
Shot : A woman walking in a busy city street, with a lot of people walking around her, the focus is on the woman, with a blurred background.
Aesthetic Score : 0.6
Mood : city, urban, bustling
Quality
Entropy : 6.79
Noise : 85
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.30
Image errors : There are some slight artifacts in the image, but they are not very noticeable. The image is a little bit noisy in the background.
Silhouetted Against the City, a Moment of Solitude
A lone figure stands on a rooftop, bathed in the ethereal glow of a full moon. The cityscape stretches out below, a sea of twinkling lights. The scene evokes a sense of quiet contemplation and urban loneliness, with the man’s silhouette adding a touch of mystery.
Prompt
facial-expressions Guilt: Reflective, contemplative, seeking redemption ; A hero, standing on a rooftop, looking out at the city; wide shot; Hero; A cityscape bathed in moonlight, a sense of peace; cinematic
Characteristic
Shot : A man stands on a rooftop overlooking a city skyline at night, with a full moon in the sky.
Aesthetic Score : 0.7
Mood : lonely, contemplative, urban
Quality
Entropy : 6.70
Noise : 67
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.50
Image errors : The image appears to be a bit blurry and the details of the cityscape are not very clear. The moon also seems a bit too bright and unnatural.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.25, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.47, which is also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflected it.
- Aesthetic Analysis: The model scored 0.13, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model seems to be better at capturing the desired aesthetic style than understanding the camera position and scene description.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://stability.ai