AI's Facial Expressions: A Step Towards Realism, But Still a Long Way to Go with Imagen-v3
- 9 minutes read - 1862 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and intentions. In the realm of artificial intelligence, the ability to generate images with realistic facial expressions is a crucial step towards creating more engaging and believable virtual experiences. This blog post delves into the results of a recent experiment that tested a generative AI model’s ability to create images with specific facial expressions. The results reveal both promising progress and areas where further development is needed.
Created with: imagen-v3
Lost in the Neon Glow: A Solitary Figure Amidst the City’s Pulse
A lone figure navigates the bustling streets of a futuristic metropolis, bathed in vibrant neon light. The towering buildings and crowded sidewalks create a sense of awe and wonder, while the solitary figure adds a touch of mystery and solitude to the scene. The glowing sign reading ‘CHINO’ hints at a hidden story waiting to be discovered.
Prompt
facial-expressions Skepticism: Melancholy, disillusioned ; A lone figure, back turned, walking away from a brightly lit city skyline; eye-level; Single Person; Urban, neon signs, bustling crowds; cinematic
Characteristic
Shot : A lone figure walks through a neon-lit city street, looking towards a glowing sign that reads “CHINO”. The street is crowded with other people and the buildings are tall and imposing.
Aesthetic Score : 0.75
Mood : futuristic, vibrant, mysterious
Quality
Entropy : 6.47
Noise : 77
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some minor errors, including some unnatural-looking lighting and a few areas where the resolution is low. The neon lighting is a bit too sharp and saturated, and the edges of the objects have some aliasing artifacts.
Hero Rises from the Ashes
A powerful image captures the essence of heroism amidst chaos. A lone superhero stands defiant against a backdrop of a city engulfed in flames, their silhouette a beacon of hope in the face of destruction.
Prompt
facial-expressions Skepticism: Doubtful, conflicted ; A superhero, cape billowing, standing on a rooftop, looking down at a city in chaos; eye-level; Hero; Smoke, fire, destruction; cinematic
Characteristic
Shot : A superhero stands on a rooftop overlooking a city in flames. The smoke and fire create a sense of chaos and destruction. The figure of the superhero stands out against the backdrop of the disaster, emphasizing their power and resilience.
Aesthetic Score : 0.6
Mood : dark, dramatic, heroic
Quality
Entropy : 6.24
Noise : 69
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The smoke and fire in the background appear to be somewhat artificial and unrealistic.
Lost in Thought: A Moment of Contemplation in a Parisian Cafe
A woman, shrouded in the soft glow of a cafe’s dim lighting, immerses herself in the pages of a newspaper. Her focused expression and the somber atmosphere create a sense of mystery and intrigue, leaving the viewer to wonder about the thoughts swirling in her mind.
Prompt
facial-expressions Skepticism: Cynical, disbelieving ; A woman, dressed in everyday clothes, holding a newspaper with a sensational headline; eye-level; Normal People; Coffee shop, people going about their day; cinematic
Characteristic
Shot : A woman in a cafe, reading a newspaper, with a cup of coffee on the table.
Aesthetic Score : 0.7
Mood : focused, contemplative, somber
Quality
Entropy : 6.39
Noise : 68
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors.
The Gamer’s Focus: A Moment of Intensity Under Neon Lights
A young man, clad in a gaming jersey and headphones, sits at his desk, eyes locked on the computer screen. The blue and green neon lights cast a futuristic glow, highlighting the intense focus and determination of a gamer in the heat of the game. Pizza boxes and an energy drink litter the desk, a testament to the dedication required for victory.
Prompt
facial-expressions Skepticism: Suspicious, wary ; A gamer, hunched over a computer screen, surrounded by empty pizza boxes and energy drink cans; close-up; Gamer; Dark room, flashing lights, gaming peripherals; cinematic
Characteristic
Shot : A young man is sitting at his desk, wearing headphones and a gaming jersey. He is looking at his computer screen and has a serious expression on his face. There are some pizza boxes and a can of energy drink on the desk. The scene is lit with blue and green neon lights, creating a futuristic atmosphere.
Aesthetic Score : 0.6
Mood : intense, focused, gaming
Quality
Entropy : 6.30
Noise : 78
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.00
Image errors : No visible artifacts or errors.
Lost in the Neon Rain
A solitary figure sits at a bar, their gaze lost in the downpour outside. Neon lights paint the wet window with a melancholic glow, reflecting the mood of the moment.
Prompt
facial-expressions Skepticism: Doubtful, introspective ; A man, sitting alone in a dimly lit bar, staring into his drink; eye-level; Single Person; Empty bar, flickering neon lights, rain outside; cinematic
Characteristic
Shot : A man sits alone at a bar, looking out the window at the rain. Neon lights are reflected in the wet glass.
Aesthetic Score : 0.7
Mood : melancholy, lonely, contemplative
Quality
Entropy : 6.37
Noise : 74
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors, just a bit grainy in the background.
Warrior on the Brink of Glory
A lone figure in dark armor, weapon in hand, stands before a roaring crowd in a dimly lit arena. The air crackles with anticipation as the warrior prepares for an epic battle, his focused expression hinting at the intensity and suspense that lies ahead.
Prompt
facial-expressions Skepticism: Uncertain, hesitant ; A hero, standing in front of a crowd, holding a weapon, but looking conflicted; eye-level; Hero; cheering crowd, bright lights, stage; cinematic
Characteristic
Shot : A man in dark armor, holding a weapon, standing in front of a cheering crowd in a dimly lit arena.
Aesthetic Score : 0.7
Mood : intense, dramatic, suspenseful
Quality
Entropy : 6.12
Noise : 72
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight graininess and noise in the image, particularly in the background.
Intrigued Gazes: A Moment of Shared Curiosity
Three friends huddle on a cozy couch, their eyes fixed on something unseen. The relaxed atmosphere and snacks on the coffee table suggest a casual setting, but the intensity of their focus hints at a captivating moment unfolding off-screen. What could they be watching so intently?
Prompt
facial-expressions Skepticism: Disbelieving, amused ; A group of friends, gathered around a table, listening to a story with skeptical expressions; eye-level; Normal People; Cozy living room, warm lighting, snacks; cinematic
Characteristic
Shot : Three people are sitting on a couch, watching something. It’s a living room with a coffee table and some snacks on it.
Aesthetic Score : 0.5
Mood : relaxed, curious, focused
Quality
Entropy : 6.43
Noise : 80
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has slight noise and some compression artifacts, particularly in the shadows. The colors are a bit washed out.
In the Zone: The Intensity of Gaming
A young man, headphones on, is completely immersed in his video game. His focused expression and the tight crop of the image capture the intensity of the gaming experience.
Prompt
facial-expressions Skepticism: Frustrated, doubtful ; A gamer, staring intently at a screen, but with a look of frustration; close-up; Gamer; Brightly lit room, gaming setup, controller in hand; cinematic
Characteristic
Shot : A young man wearing headphones is playing a video game with a controller.
Aesthetic Score : 0.5
Mood : intense, focused, serious
Quality
Entropy : 6.63
Noise : 81
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears slightly overexposed, leading to a loss of detail in the highlights.
Lost in the Crowd: A Woman’s Worried Journey
A woman navigates a bustling, narrow street, her worried expression hinting at a hidden tension. The urban backdrop, possibly in Asia, adds to the sense of suspense and apprehension. The scene evokes a feeling of foreboding, leaving the viewer wondering what lies ahead for the woman.
Prompt
facial-expressions Skepticism: Paranoid, distrustful ; A woman, walking through a crowded street, looking around with suspicion; eye-level; Single Person; Busy city street, people rushing by, street vendors; cinematic
Characteristic
Shot : A woman with a worried expression is walking down a narrow, crowded street. The scene appears to be in an urban area, possibly Asia, with a busy market in the background.
Aesthetic Score : 0.6
Mood : suspenseful, apprehensive, urban
Quality
Entropy : 6.54
Noise : 66
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight blurriness in the background, possibly from camera movement or focus issues.
Lost in the City Lights: A Silhouette of Mystery
A young man, shrouded in shadow, stands on a rooftop overlooking a vibrant cityscape. The dramatic lighting casts him in silhouette, creating an atmosphere of brooding mystery and urban isolation.
Prompt
facial-expressions Skepticism: Isolated, disillusioned ; A hero, standing on a rooftop, looking out at a city skyline, but with a sense of loneliness; eye-level; Hero; City lights, distant sounds of the city; cinematic
Characteristic
Shot : A young man in a hooded jacket stands on a rooftop, looking out over a city skyline at night. The city is bathed in the glow of streetlights and distant buildings.
Aesthetic Score : 0.7
Mood : mysterious, brooding, urban
Quality
Entropy : 6.13
Noise : 78
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slightly grainy texture, which could be due to the low-light conditions or the compression algorithm used to save the image.
Conclusion
The results show that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.15, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.47, which is considered below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflects it.
- Aesthetic Analysis: The model scored 0.095, which is considered very good. This means that the generated image closely matched the expected aesthetic style.
Overall, the model seems to be better at capturing the desired aesthetic than understanding the scene and camera position. This suggests that the model might need further training to improve its ability to interpret and translate prompts into accurate visual representations.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://deepmind.google/technologies/imagen-3/