AI's Mixed Bag: Capturing Emotion in Images with Imagen-v3
- 9 minutes read - 1763 wordsTable of Contents
The ability to convey emotion through facial expressions is a fundamental aspect of human communication. In the realm of AI-generated imagery, replicating these subtle nuances poses a significant challenge. This blog post examines a case study where an AI model attempts to generate images with specific facial expressions, revealing both its successes and limitations in capturing the complexities of human emotion. We’ll explore how the model interprets prompts, analyzes its performance in capturing camera position, shot composition, and aesthetic, and discuss the implications for the future of AI-generated imagery.
Created with: imagen-v3
Lost in the City Lights
A solitary figure walks through the urban landscape, bathed in the soft glow of distant lights. The mood is melancholic, reflecting a sense of loneliness and introspection. The low-light conditions create a dramatic effect, highlighting the man’s downcast expression and the vastness of the city around him.
Prompt
facial-expressions Disappointment: Melancholy, isolation ; A lone figure; eye-level; Single Person; a bustling city street at night, with neon signs and blurred lights; cinematic
Characteristic
Shot : A man is walking in a city at night, the lights of the city are blurred in the background.
Aesthetic Score : 0.7
Mood : lonely, moody, melancholic
Quality
Entropy : 5.89
Noise : 64
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant image errors.
Superman: A Silhouette of Hope Against the Setting Sun
A powerful image captures Superman standing tall on a rooftop, silhouetted against a vibrant sunset. The dramatic lighting and the hero’s pose evoke a sense of hope and heroism, promising a brighter future for the city below.
Prompt
facial-expressions Disappointment: Defeated, disillusioned ; A superhero standing on a rooftop; eye-level; Hero; a cityscape bathed in the orange glow of a setting sun, with the hero’s cape billowing in the wind; cinematic
Characteristic
Shot : Superman standing on a rooftop overlooking a city skyline at sunset.
Aesthetic Score : 0.7
Mood : heroic, dramatic, hopeful
Quality
Entropy : 6.76
Noise : 76
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.70
Image errors : The city skyline appears to be a bit blurry and unrealistic. The lighting is also a bit over-saturated.
A Moment of Melancholy in the Kitchen
A woman sits alone at a cluttered kitchen table, her posture and expression conveying a sense of sadness and contemplation. The scene evokes a feeling of melancholy and isolation.
Prompt
facial-expressions Disappointment: Hopelessness, resignation ; A woman sitting at a kitchen table; eye-level; Normal Person; a cluttered kitchen with dirty dishes and a half-eaten meal; cinematic
Characteristic
Shot : A woman is sitting at a table in a kitchen. There are dirty dishes on the table and she is looking down, seemingly sad or contemplative.
Aesthetic Score : 0.4
Mood : melancholy, somber, contemplative
Quality
Entropy : 6.84
Noise : 88
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image seems to have a slight graininess, particularly in the shadows. This could be due to the lighting or the image capture process.
The Intensity of the Game: A Gamer’s Focused Expression
A close-up shot captures the intense focus of a young gamer, illuminated by dramatic red and blue lighting. His expression reveals the competitive spirit and dedication required to conquer the virtual world.
Prompt
facial-expressions Disappointment: Frustration, anger ; A gamer sitting in front of a computer screen; eye-level; Gamer; a dimly lit room with flashing lights and the glow of the monitor reflecting in their eyes; cinematic
Characteristic
Shot : A young man is playing video games, with a focused and intense expression on his face. He is wearing a black shirt with white logos and a headset with a microphone. The scene is lit with red and blue hues, creating a dramatic and atmospheric effect. The image is cropped at the torso, focusing on the gamer’s face and hands.
Aesthetic Score : 0.6
Mood : intense, focused, competitive
Quality
Entropy : 6.01
Noise : 72
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry and the lighting is uneven. The gamer’s face is slightly overexposed.
Lost in the Shadows: A Man’s Solitary Walk Through the Night
A lone figure walks down a dimly lit street, swallowed by the darkness. The play of light and shadow creates a sense of mystery and isolation, leaving the viewer to wonder about the man’s journey and the secrets he carries.
Prompt
facial-expressions Disappointment: Loneliness, despair ; A man walking down a deserted street; eye-level; Single Person; a street lined with closed shops and flickering streetlights; cinematic
Characteristic
Shot : A man walking down a narrow street at night, lit by street lights.
Aesthetic Score : 0.6
Mood : dark, mysterious, lonely
Quality
Entropy : 5.92
Noise : 93
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors
Heroic Stand Amidst the Ashes
A lone warrior stands defiant, a symbol of hope amidst the desolation of a post-apocalyptic city. Smoke and fire engulf the scene, highlighting the stark contrast between the warrior’s unwavering resolve and the fallen comrade at his feet. This powerful image captures the somber mood and heroic spirit of a world on the brink.
Prompt
facial-expressions Disappointment: Disappointment, regret ; A hero standing over a fallen villain; eye-level; Hero; a battlefield littered with debris and smoke, with the villain’s defeated form at the hero’s feet; cinematic
Characteristic
Shot : A lone warrior stands over a fallen comrade in a post-apocalyptic city, engulfed in smoke and fire.
Aesthetic Score : 0.7
Mood : desolate, somber, heroic
Quality
Entropy : 6.82
Noise : 75
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors detected.
The Weight of Solitude
A solitary figure sits amidst a half-eaten meal, bathed in the dim glow of a single light. The scene evokes a sense of loneliness and contemplation, with the man’s posture and the empty plate hinting at a heavy heart.
Prompt
facial-expressions Disappointment: Loneliness, stagnation ; A lone figure sits at a dimly lit table, a half-eaten meal before them. The room is cluttered with unfinished projects, a testament to their solitude.; cinematic
Characteristic
Shot : A man sits alone at a table with a half-eaten meal and a glass of liquid. The room is dimly lit with a single overhead light. The man appears to be in a state of contemplation or perhaps melancholy. The mess on the table and the man’s expression contribute to a sense of loneliness and isolation.
Aesthetic Score : 0.5
Mood : gloomy, lonely, somber
Quality
Entropy : 5.54
Noise : 60
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible image errors
In the Zone: Gamer’s Face Lit by the Glow of Victory (or Defeat)
A close-up shot captures the intensity of a young man engrossed in a video game. Blue and red light illuminate his face, highlighting his focused expression and creating a sense of suspense. Is he on the verge of triumph or facing a crushing defeat? The moment is electric.
Prompt
facial-expressions Disappointment: Defeat, frustration ; A gamer staring at a game over screen; eye-level; Gamer; a darkened room with the glow of the monitor reflecting in their eyes, showing a game over message; cinematic
Characteristic
Shot : A close-up shot of a young man’s face, illuminated by blue and red light, looking down at a computer screen, likely playing video games.
Aesthetic Score : 0.6
Mood : intense, focused, suspenseful
Quality
Entropy : 5.85
Noise : 56
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors or artifacts
A Rainy Night, A Lonely Heart
A woman gazes out a window at a rain-soaked city, her expression filled with sadness. The melancholic atmosphere and her longing gaze create a poignant scene of loneliness and reflection.
Prompt
facial-expressions Disappointment: Sadness, longing ; A woman standing at a window; eye-level; Single Person; a rainy day with the city streets blurred in the background; cinematic
Characteristic
Shot : A woman is looking out a window at a rainy city night. Her expression is sad.
Aesthetic Score : 0.7
Mood : sad, melancholic, lonely
Quality
Entropy : 5.94
Noise : 73
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : No obvious errors, the image is slightly underexposed but this adds to the mood.
Silhouetted on the Summit: A Moment of Contemplation
A solitary figure stands on a mountain peak, their silhouette stark against the misty expanse below. The scene evokes a sense of mystery, contemplation, and serene isolation.
Prompt
facial-expressions Disappointment: Isolation, disillusionment ; A hero standing on a mountaintop; eye-level; Hero; a vast landscape stretching out before them, but with a sense of emptiness in the air; cinematic
Characteristic
Shot : A lone figure stands on a mountaintop, looking out at a vast, misty landscape.
Aesthetic Score : 0.7
Mood : mysterious, contemplative, serene
Quality
Entropy : 6.56
Noise : 101
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable image errors.
Conclusion
The analysis of the generated image reveals mixed results:
Camera Position: The model’s performance in capturing the intended camera position is fairly good, with a score of 0.15. This suggests that the model is somewhat able to understand and translate the camera position described in the prompt. While not excellent, it’s better than average.
Shot Analysis: The model’s ability to understand the scene and create a shot that matches the prompt is pretty good, with a score of 0.46. This indicates that the model is able to grasp the overall scene and create a shot that is somewhat aligned with the prompt’s description.
Aesthetic Analysis: The model’s performance in achieving the desired aesthetic is below average, with a score of -0.06. This suggests that the generated image’s aesthetic deviates from the expected aesthetic, potentially lacking the desired visual style or mood.
Overall, the model shows some strengths in understanding the camera position and scene description, but struggles to achieve the intended aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-3/