AI's Struggle with Facial Expressions: A Deep Dive into Generative Models with Imagen-v3-fast
- 9 minutes read - 1801 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and intentions. They play a crucial role in human communication, adding depth and nuance to our interactions. However, capturing these subtle expressions in AI-generated images remains a challenge. This blog post explores the limitations of generative AI models in understanding and replicating facial expressions, using a case study to illustrate the complexities involved. We’ll examine how these models perform in different scenarios, analyzing their strengths and weaknesses in capturing camera position, scene composition, and aesthetic style. By understanding these limitations, we can better appreciate the potential of AI in capturing the human experience through facial expressions.
Created with: imagen-v3-fast
Lost in the Golden Hour
A young man with long hair, shrouded in a dark scarf, stands amidst a desert landscape bathed in the warm glow of a setting sun. His serious gaze, captured in the dramatic golden hour light, evokes a sense of melancholy and contemplation.
Prompt
facial-expressions Curiosity: Melancholy, contemplative ; A lone figure, silhouetted against a setting sun; eye-level; Single Person; vast, empty desert landscape; cinematic
Characteristic
Shot : A young man with long hair, wearing a dark scarf, stands in a desert landscape during a golden hour sunset. He looks directly at the camera with a serious expression.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, dramatic
Quality
Entropy : 6.58
Noise : 67
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no significant errors in the image, but the subject’s hair appears somewhat unnatural in texture and lighting.
Hope Rises Above the City Lights
A powerful superhero stands tall against the backdrop of a futuristic cityscape, their gaze fixed on the unknown. The dramatic pose and vibrant city lights create a sense of anticipation and hope, hinting at a thrilling story unfolding.
Prompt
facial-expressions Curiosity: Determined, hopeful ; A superhero, standing atop a skyscraper, looking out at the city; eye-level; Hero; bustling cityscape with neon lights; cinematic
Characteristic
Shot : A superhero standing in a futuristic city at night, looking up.
Aesthetic Score : 0.7
Mood : dramatic, hopeful, powerful
Quality
Entropy : 6.55
Noise : 57
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : No obvious artifacts or errors.
Silhouettes of Solitude: A Moment of Contemplation in the Setting Sun
A solitary figure sits on a cobblestone bench, their back to the camera, bathed in the golden light of the setting sun. The scene evokes a sense of melancholy and contemplation, with the dramatic play of light and shadow highlighting the figure’s isolation.
Prompt
facial-expressions Curiosity: Melancholy, contemplative ; A lone figure sits on a weathered bench, gazing at the bustling city street below. The sun casts long shadows across the cobblestones.; cinematic
Characteristic
Shot : A solitary figure sits on a bench in a cobblestone street, facing away from the camera, with buildings on both sides and the setting sun in the background.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, lonely
Quality
Entropy : 6.51
Noise : 81
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, and the colors are a bit muted. The figure’s silhouette is somewhat blurry.
Lost in the Code: A Moment of Intense Focus
A young man, shrouded in dim light, is completely absorbed in his work. The close-up shot and mysterious atmosphere draw you into his world, leaving you wondering what secrets he’s uncovering.
Prompt
facial-expressions Curiosity: Intense, focused ; A gamer, hunched over a computer screen, eyes glued to the monitor; close-up; Gamer; dimly lit room with flashing lights from the screen; cinematic
Characteristic
Shot : A young man wearing headphones is looking intently at a computer screen. The lighting is dim, creating a sense of mystery and intrigue.
Aesthetic Score : 0.6
Mood : intense, focused, mysterious
Quality
Entropy : 6.36
Noise : 52
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors, though the image is slightly underexposed.
Lost in the City’s Pulse: A Man’s Mysterious Gaze
A lone figure, shrouded in leather, stands amidst the bustling chaos of a city market. His intense gaze pierces through the blurred background, leaving a trail of intrigue and unspoken stories. The urban landscape becomes a canvas for his enigmatic presence, painting a picture of mystery and intensity.
Prompt
facial-expressions Curiosity: Intrigued, observant ; A man, walking through a crowded marketplace, his eyes darting around; eye-level; Single Person; bustling marketplace with colorful stalls and vendors; cinematic
Characteristic
Shot : A man in a leather jacket standing in a busy market, the background is blurred
Aesthetic Score : 0.7
Mood : mysterious, intense, urban
Quality
Entropy : 6.77
Noise : 70
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors, just slight blurriness in the background.
Shadow Amidst the Flames
A figure clad in dark armor stands defiant against a backdrop of raging fire and swirling smoke. The intense contrast evokes a sense of mystery and dramatic tension, hinting at a story of courage and resilience.
Prompt
facial-expressions Curiosity: Brave, resolute ; A hero, standing in the middle of a chaotic battle, looking determined; eye-level; Hero; smoke-filled battlefield with explosions and debris; cinematic
Characteristic
Shot : A man in dark armor stands before a backdrop of fire and smoke.
Aesthetic Score : 0.7
Mood : intense, dramatic, mysterious
Quality
Entropy : 6.73
Noise : 69
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some artifacts and blurring around the edges of the armor. The fire is a bit too digitally-painted.
Intimate Conversation: A Glimpse into Warm Camaraderie
Experience the warmth and intimacy of a focused conversation between four individuals, bathed in the soft glow of a cozy room. The scene invites you to join the engaging discussion, creating a sense of belonging and closeness.
Prompt
facial-expressions Curiosity: Joyful, connected ; A group of friends, gathered around a table, sharing stories and laughter; eye-level; Normal People; cozy living room with warm lighting; cinematic
Characteristic
Shot : Four people are sitting at a table, having a conversation. The room is lit with warm light, creating a cozy atmosphere.
Aesthetic Score : 0.7
Mood : intimate, focused, conversational
Quality
Entropy : 6.60
Noise : 53
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to be slightly overexposed, with the faces of the people being washed out.
Caught in the Heat of the Moment: Gamer’s Intense Focus Under Neon Lights
A young man’s face is illuminated by a mix of blue and orange light as he plays a video game, his expression a mix of surprise and intense focus. The dramatic lighting and tight framing capture the raw emotion of the moment, highlighting the gamer’s complete immersion in the virtual world.
Prompt
facial-expressions Curiosity: Excited, engaged ; A gamer, holding a controller, eyes wide with excitement; close-up; Gamer; brightly lit gaming room with colorful lights; cinematic
Characteristic
Shot : A young man is playing a video game, looking surprised, in a dimly lit room with blue and orange light.
Aesthetic Score : 0.7
Mood : intense, focused, surprised
Quality
Entropy : 6.66
Noise : 45
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Lost in the Vastness: A Moment of Contemplation on the Cliffside
A solitary figure stands on a cliff, their back turned towards the viewer, gazing out at the endless ocean. The scene evokes a sense of serenity and introspection, highlighting the dramatic contrast between the woman’s smallness and the vastness of the sea. This image captures a moment of quiet contemplation, inviting viewers to reflect on their own connection to nature and the world around them.
Prompt
facial-expressions Curiosity: Contemplative, introspective ; A woman, standing at the edge of a cliff, gazing out at the vast ocean; eye-level; Single Person; dramatic cliffside with crashing waves; cinematic
Characteristic
Shot : A woman standing on a cliff overlooking the ocean, with her back turned towards the viewer and her gaze focused on the distant horizon.
Aesthetic Score : 0.7
Mood : serene, contemplative, introspective
Quality
Entropy : 6.91
Noise : 61
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors.
Facing the Flames: A Man’s Courage in the Firelight
A solitary figure, clad in green, stands defiant against a backdrop of blurred flames. The intense focus on his face, illuminated by the fire’s glow, creates a dramatic and suspenseful scene. The contrast between the man’s composure and the chaotic fire evokes a sense of strength and resilience in the face of danger.
Prompt
facial-expressions Curiosity: Brave, selfless ; A hero, standing in front of a burning building, ready to save people; eye-level; Hero; chaotic scene with smoke and flames; cinematic
Characteristic
Shot : A man in a green jacket is standing in front of a fire. The fire is blurred and out of focus, and the man’s face is in sharp focus. The background is dark, and the fire is the only light source.
Aesthetic Score : 0.7
Mood : intense, dramatic, suspenseful
Quality
Entropy : 6.82
Noise : 73
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The fire in the background appears somewhat artificial, with slightly unrealistic flames. There are minor artifacts around the edges of the man’s hair.
Conclusion
The analysis shows that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.55, which is considered average. This indicates that the model was able to understand the scene in the prompt to a reasonable degree, but not exceptionally well.
- Aesthetic Analysis: The model scored 0.08, which is considered very good. This means the generated image closely matched the expected aesthetic style.
Overall, the model seems to be better at understanding the aesthetic style than the camera position and scene. It might be helpful to provide more specific instructions regarding camera angles and shot composition in future prompts.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-3/