AI's Facial Expressions: A Step Forward, But Still Room for Growth with Imagen-v2
- 9 minutes read - 1774 wordsTable of Contents
Facial expressions are a powerful tool in storytelling, conveying a wide range of emotions and adding depth to characters. In the realm of generative AI, capturing these nuances presents a unique challenge. This blog post examines the results of an experiment where an AI model was tasked with generating images based on specific scene descriptions, focusing on the model’s ability to depict facial expressions. While the model shows promise in understanding scene composition and camera angles, it struggles to capture the desired aesthetic, particularly in the realm of facial expressions. We delve into the model’s performance, analyzing its strengths and weaknesses, and discuss the potential for future improvements.
Created with: imagen-v2
Lost in the City Lights: A Moment of Wonder
A young woman gazes up at the dazzling cityscape, her expression a mix of surprise and awe. The blurred lights create a sense of mystery and intrigue, leaving the viewer to wonder what has captured her attention.
Prompt
facial-expressions Excitement: Thrilled, anticipation ; A lone figure; eye-level; Single Person; bustling city street at night; cinematic
Characteristic
Shot : A woman with long brown hair is standing in a city at night, looking up in awe. The scene is lit with warm and cool colors.
Aesthetic Score : 0.6
Mood : dreamy, magical, hopeful
Quality
Entropy : 6.47
Noise : 51
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.80
Image errors : The woman’s face appears slightly distorted and the lighting is somewhat unnatural, especially around her hair.
Superman Soars Above the City at Sunset
A powerful image captures Superman in flight over a cityscape, bathed in the dramatic hues of sunset. His intense expression and the dramatic lighting create a sense of heroism and power.
Prompt
facial-expressions Excitement: Triumphant, exhilarating ; A superhero in mid-air; low-angle; Hero; cityscape with a dramatic sunset; cinematic
Characteristic
Shot : Superman flying over a city skyline at sunset.
Aesthetic Score : 0.7
Mood : heroic, dramatic, powerful
Quality
Entropy : 6.61
Noise : 60
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some minor artifacts, such as the slight blurring around Superman’s head and the overly-smooth skin textures. The city skyline also appears a little blurry and lacking in detail.
Sun-Kissed Laughter: Friends Embrace the Joy of Movement
Four young adults revel in the carefree spirit of a sunny day, their laughter echoing through a vibrant green field. The image captures the energy of their run, with blurred feet and outstretched arms painting a picture of pure joy and youthful exuberance.
Prompt
facial-expressions Excitement: Joyful, carefree ; A group of friends laughing and running; eye-level; Normal People; a sunny park with a vibrant green lawn; cinematic
Characteristic
Shot : Four young adults are running towards the camera in a grassy field. There is a sunset in the background. The image is composed from a low angle perspective.
Aesthetic Score : 0.6
Mood : joyful, carefree, playful
Quality
Entropy : 6.74
Noise : 107
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some graininess and the lighting is a bit uneven.
The Intensity of Focus
A close-up shot captures a man engrossed in his work, his face illuminated by the glow of the computer screen. The blurred background and dramatic lighting create a sense of suspense and isolation, drawing the viewer into the moment of intense concentration.
Prompt
facial-expressions Excitement: Intense, focused ; A gamer’s hands furiously tapping on a keyboard; close-up; Gamer; a dimly lit room with glowing screens; cinematic
Characteristic
Shot : A young man, wearing a headset, is intensely focused on a computer keyboard, lit with dramatic lighting.
Aesthetic Score : 0.6
Mood : intense, focused, competitive
Quality
Entropy : 6.16
Noise : 80
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The edges of the keyboard and the hands appear slightly blurred and lacking in detail.
Hope Amidst the Storm
A woman stands silhouetted against a dramatic sunset, her gaze fixed on the swirling clouds above. The golden light illuminates her face, reflecting a sense of awe and wonder. This captivating image captures a moment of hope and resilience, even in the face of adversity.
Prompt
facial-expressions Excitement: Awe-inspiring, liberating ; A woman standing on a cliff overlooking a vast ocean; eye-level; Single Person; dramatic clouds and a setting sun; cinematic
Characteristic
Shot : A woman with blonde hair is staring up at the sky. The sky is a mix of cloudy grey and orange, and there is a body of water behind the woman.
Aesthetic Score : 0.7
Mood : dramatic, surprised, awe
Quality
Entropy : 6.68
Noise : 70
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The hair seems too perfect, and there are some blur and sharpness inconsistencies.
Superman’s Last Stand: A Hero in Peril
A close-up shot captures Superman, his costume torn and battered, flying towards the viewer with a determined expression. The background is engulfed in smoke and fire, hinting at a fierce battle. The lighting, smoke, and Superman’s intense gaze create a sense of urgency and impending danger, leaving the viewer wondering if he will prevail.
Prompt
facial-expressions Excitement: Brave, adrenaline-fueled ; A hero charging into battle; low-angle; Hero; a chaotic battlefield with explosions and smoke; cinematic
Characteristic
Shot : A close-up shot of Superman running through a battle scene with smoke and fire in the background. He is looking determined and focused. The costume is slightly gritty and weathered, which is a visual storytelling device
Aesthetic Score : 0.7
Mood : intense, dramatic, heroic
Quality
Entropy : 6.76
Noise : 56
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to have some minor artifacts in the background, and the subject’s skin texture is slightly unnatural.
Birthday Bliss on the Rooftop
Four friends celebrate a birthday with laughter, balloons, and confetti on a rooftop, capturing the essence of carefree joy and celebration.
Prompt
facial-expressions Excitement: Happy, celebratory ; A group of friends celebrating a graduation; eye-level; Normal People; a brightly decorated rooftop with balloons and streamers; cinematic
Characteristic
Shot : Four friends are celebrating on a rooftop, throwing confetti and holding balloons. They are all smiling and laughing. The background is a cityscape.
Aesthetic Score : 0.7
Mood : joyful, carefree, celebratory
Quality
Entropy : 6.77
Noise : 103
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors, slight compression artifacts are noticeable on the clothes
Intense Gaze, Dramatic Lighting: A Portrait of Mystery
This close-up portrait captures a man’s face bathed in cool and warm lighting, creating a dramatic effect. His intense gaze and the play of light and shadow evoke a sense of tension and mystery, leaving the viewer captivated.
Prompt
facial-expressions Excitement: Engrossed, focused ; A gamer’s face illuminated by the screen; close-up; Gamer; a dark room with neon lights reflecting on the screen; cinematic
Characteristic
Shot : Close up portrait of a man with a serious expression, lit with blue and red light.
Aesthetic Score : 0.6
Mood : intense, dramatic, mysterious
Quality
Entropy : 6.17
Noise : 52
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some artifacts around the eyes and the jawline
The Thrill of the Ride: A Close-Up Look at Pure Excitement
This intense close-up captures the raw emotion of a roller coaster ride. The man’s wide-eyed scream and the blurred background create a sense of immediacy, pulling you right into the heart of the action.
Prompt
facial-expressions Excitement: Thrilling, exhilarating ; A man riding a rollercoaster; POV shot; Single Person; a fast-paced ride with twists and turns; cinematic
Characteristic
Shot : A close-up shot of two people riding a roller coaster. The person in the foreground is looking directly at the camera with a surprised expression. The person in the background is partially visible and is also looking at the camera.
Aesthetic Score : 0.4
Mood : excitement, surprise, thrill
Quality
Entropy : 6.72
Noise : 86
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is some blur in the background of the image, which may be due to the movement of the roller coaster. There is also some noise in the image, which may be due to the low light conditions.
Iron Man Faces the Storm
A brooding Iron Man stands atop a skyscraper, the cityscape spread out before him. The sky is a canvas of swirling clouds, mirroring the intensity of the moment. His expression is grim, hinting at the danger that lies ahead.
Prompt
facial-expressions Excitement: Victorious, powerful ; A hero standing triumphantly on a rooftop; high-angle; Hero; a cityscape with a dramatic storm in the background; cinematic
Characteristic
Shot : A man in an Iron Man suit stands on a rooftop with a city behind him, he appears to be shouting or screaming. The sky has a stormy look to it.
Aesthetic Score : 0.6
Mood : dramatic, heroic, intense
Quality
Entropy : 6.65
Noise : 48
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has a somewhat painted look, with visible brushstrokes, especially on the suit. The details of the suit are not very sharp.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.22, which is below the “good” threshold of 0.5. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.52, which falls within the “good” range. This indicates that the model was able to understand the scene described in the prompt and create a shot that aligns with it.
- Aesthetic Analysis: The model scored 0.26, which is significantly lower than the “very good” threshold of -0.2 to 0.1. This suggests that the generated image didn’t match the expected aesthetic style described in the prompt.
Overall, the model shows promise in understanding the scene and shot composition, but needs improvement in capturing the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-2/