AI's Artistic Eye: Capturing Emotion in Images with Imagen-v2
- 9 minutes read - 1817 wordsTable of Contents
Facial expressions are a powerful tool in storytelling, conveying emotions and adding depth to characters. Generative AI models are increasingly being used to create realistic images, and one key aspect of this realism is the ability to accurately depict facial expressions. This blog post examines the performance of a generative AI model in capturing facial expressions, analyzing its strengths and weaknesses. We’ll explore how the model handles different scenes, camera positions, and aesthetic styles, providing insights into the current state of AI-generated facial expressions.
Created with: imagen-v2
Lost in Thought, On the Edge of Something Big
A solitary figure walks through a city, his gaze fixed on something unseen. The blurred background emphasizes his isolation and the weight of his internal focus. A sense of mystery and anticipation hangs in the air, hinting at a pivotal moment about to unfold.
Prompt
facial-expressions Interest: Intrigued, observant ; A lone figure; eye-level; Single Person; bustling city street; cinematic
Characteristic
Shot : A man with a serious expression is walking on a city street, his face is in focus, and the background is blurred.
Aesthetic Score : 0.6
Mood : serious, contemplative, mysterious
Quality
Entropy : 6.84
Noise : 93
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible image errors or artifacts.
Superman Faces the Flames: A City in Peril
A close-up of Superman’s resolute face, set against the backdrop of a burning city, captures the dramatic intensity of the moment. The burning cityscape evokes a sense of urgency and danger, while Superman’s determined expression suggests he is ready to face the challenge head-on.
Prompt
facial-expressions Interest: Focused, determined ; A superhero in a dramatic pose; medium shot; Hero; cityscape with a burning building in the background; cinematic
Characteristic
Shot : A close-up portrait of Superman, with a burning building in the background and a cityscape in the distance. The image is stylized and dramatic.
Aesthetic Score : 0.7
Mood : serious, heroic, determined
Quality
Entropy : 6.67
Noise : 68
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.90
Image errors : There are some slight artifacts and errors in the image, particularly around the edges of Superman’s cape and the background city. The lighting is uneven and there is some color banding.
Lost in the Pages: A Moment of Serenity in a Cozy Cafe
A young woman finds peace and contemplation amidst the gentle hum of a cafe, her face illuminated by the warm glow of the setting sun. The intimate lighting and composition draw the viewer into her world, capturing a moment of quiet reflection.
Prompt
facial-expressions Interest: Engrossed, absorbed ; A woman reading a book in a coffee shop; eye-level; Normal People; warm, inviting cafe interior; cinematic
Characteristic
Shot : A young woman is sitting in a cafe, reading a book.
Aesthetic Score : 0.7
Mood : thoughtful, serene, introspective
Quality
Entropy : 6.62
Noise : 85
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant artifacts or errors.
Lost in the Moment: A Young Man’s Intense Focus
A young man, bathed in dramatic blue and orange lighting, is completely absorbed in something just outside the frame. His headphones amplify the intensity of the moment, leaving us to wonder what captivating scene has captured his attention.
Prompt
facial-expressions Interest: Excited, concentrated ; A gamer intensely focused on a screen; close-up; Gamer; dimly lit room with glowing monitor; cinematic
Characteristic
Shot : A young man wearing headphones, lit by blue and orange light, sitting in a dimly lit room, possibly a gaming room.
Aesthetic Score : 0.7
Mood : intense, focused, dramatic
Quality
Entropy : 6.23
Noise : 82
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.30
Image errors : Some artifacts are visible in the image, particularly around the edges of the subject’s hair and the headphones.
A Life Lived in Thought
A close-up portrait captures the contemplative gaze of an older man, his gray hair and wistful expression hinting at a lifetime of experiences and unspoken stories. The dramatic lighting adds a layer of intrigue, leaving the viewer to ponder the thoughts swirling within his mind.
Prompt
facial-expressions Interest: Contemplative, thoughtful ; A man gazing out a window at a stormy sky; eye-level; Single Person; dark, moody interior; cinematic
Characteristic
Shot : Close-up portrait of an elderly man with a contemplative expression, looking out of a window.
Aesthetic Score : 0.7
Mood : pensive, introspective, thoughtful
Quality
Entropy : 6.61
Noise : 80
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image exhibits some minor artifacting around the edges of the face, suggesting it may have been digitally enhanced.
Unveiling the Mystery: A Man in the Shadows
A captivating image of a man in a dark costume, his gaze piercing through the lens. The blurry cityscape and hazy sky create an atmosphere of intrigue and suspense, leaving you wondering about his secrets.
Prompt
facial-expressions Interest: Confident, determined ; A hero standing on a rooftop overlooking a city; wide shot; Hero; panoramic cityscape with dramatic lighting; cinematic
Characteristic
Shot : A man with piercing blue eyes stares intensely at the viewer in front of a city skyline at dusk. He is wearing a dark cape with a red lining and a large, silver button on his shoulder.
Aesthetic Score : 0.8
Mood : intense, mysterious, powerful
Quality
Entropy : 6.50
Noise : 59
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image is slightly blurry in areas, particularly in the background. The image appears somewhat artificial, likely due to AI generation.
Laughter and Light: Capturing the Joy of Togetherness
A heartwarming scene unfolds with three friends gathered around a table, sharing laughter and conversation. The warm lighting and casual atmosphere create a sense of intimacy and contentment, making this image a perfect representation of the joy of shared moments.
Prompt
facial-expressions Interest: Happy, engaged ; A group of friends laughing together at a dinner table; eye-level; Normal People; cozy, homey dining room; cinematic
Characteristic
Shot : Three friends, two women and a man, are sitting at a table in a warmly lit dining room. They are laughing and having a good time. There are plates of food and glasses of wine on the table.
Aesthetic Score : 0.7
Mood : joyful, warm, intimate
Quality
Entropy : 6.77
Noise : 88
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to be slightly overexposed. There are some artifacts visible on the walls and on the subjects’ skin. Some blurriness is visible on the woman to the right.
Lost in the Code: A Neon-Lit Focus
A lone figure, bathed in the glow of a neon light, is completely absorbed in their work. The intensity of their focus is palpable, creating a sense of mystery and futuristic intrigue. This image captures the essence of dedication and the power of technology in a captivating way.
Prompt
facial-expressions Interest: Thrilled, focused ; A gamer’s hands rapidly moving across a keyboard and mouse; close-up; Gamer; brightly lit gaming setup with flashing lights; cinematic
Characteristic
Shot : A person in a red hoodie, wearing headphones, is looking intensely at a keyboard with glowing blue lights. The background is a blurred orange and yellow abstract pattern.
Aesthetic Score : 0.7
Mood : intense, focused, futuristic
Quality
Entropy : 5.94
Noise : 72
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some minor artifacts, particularly in the darker areas. The lighting appears slightly uneven, and the focus could be sharper.
Lost in Art: A Moment of Contemplation
A woman stands captivated before a painting in an art gallery, her gaze drawn upwards. The teal walls and gold frames create an elegant backdrop, while the composition evokes a sense of mystery and intrigue. Her expression suggests a moment of deep contemplation, inviting viewers to share in her experience.
Prompt
facial-expressions Interest: Appreciative, curious ; A woman looking at a painting in a museum; eye-level; Single Person; grand museum hall with intricate artwork; cinematic
Characteristic
Shot : A woman in a gallery, gazing upwards at a painting, in front of other paintings, with rich teal walls and gold-trimmed picture frames. The gallery is empty.
Aesthetic Score : 0.7
Mood : mysterious, contemplative, artful
Quality
Entropy : 6.76
Noise : 79
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is slight blurring around the edges of the picture frames and the woman’s hair.
Intense Gaze in the Shadows
A close-up portrait captures the serious expression of a man, his gaze piercing through the darkness. The dramatic lighting and close-up shot create a sense of mystery and intensity, leaving the viewer questioning the story behind his gaze.
Prompt
facial-expressions Interest: Intense, focused ; A hero facing off against a villain; medium shot; Hero; dramatic, action-packed scene with explosions and smoke; cinematic
Characteristic
Shot : A close-up portrait of a man with a determined expression, set against a blurred background of smoke and light.
Aesthetic Score : 0.8
Mood : intense, dramatic, mysterious
Quality
Entropy : 6.48
Noise : 84
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has a slight amount of noise, and the lighting is a bit harsh. The background is blurred, but the blur is not very smooth.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t quite capture the intended camera position as described in the prompt.
- Shot Analysis: The model scored 0.64, which falls within the “good” range. This indicates that the model was able to understand the scene and create a shot that was relatively close to what was described in the prompt.
- Aesthetic Analysis: The model scored 0.1, which is within the “very good” range of -0.2 to 0.1. This means that the generated image’s aesthetic was very close to the expected aesthetic, despite the camera position and shot analysis not being perfect.
Overall, the model seems to be capable of understanding the scene and creating a visually appealing image, but it could benefit from improvements in accurately capturing the intended camera position.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-2/