AI's Artistic Struggle: Capturing Emotion in Visuals with Scenario
- 9 minutes read - 1823 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate realistic and emotionally evocative images is a coveted goal. While significant strides have been made in recent years, AI models still face challenges in capturing the subtle nuances of human expression. This blog post examines the results of a generative AI model tasked with creating images based on specific scene descriptions, focusing on the model’s performance in depicting facial expressions. We’ll explore the model’s strengths and weaknesses, highlighting the areas where it excels and where it falls short. Through this analysis, we aim to shed light on the ongoing quest to bridge the gap between AI-generated art and the human capacity for emotional expression.
Created with: scenario
Intimate Portrait: A Moment of Serene Beauty
A close-up portrait captures the soft beauty of a woman in a knitted sweater. The warm lighting and gentle focus create an intimate atmosphere, inviting the viewer to connect with her serene expression.
Prompt
facial-expressions Determination: Solitude and resilience ; A lone figure; eye-level; Single Person; A vast, desolate landscape; cinematic
Characteristic
Shot : Close-up portrait of a woman with long dark hair. She is wearing a white sweater and has a soft, natural makeup look.
Aesthetic Score : 0.8
Mood : soft, dreamy, alluring
Quality
Entropy : 6.71
Noise : 96
Prompt Clip Score : 0.14
AI Evaluation
Likelihood of AI : 0.90
Image errors : There are some minor artifacts around the edges of the image. The skin looks a bit too smooth.
Hope Amidst the Ashes: A Woman’s Resilience in a Burning City
A powerful image captures the spirit of hope and resilience in the face of destruction. A young woman, clad in tactical gear, gazes towards a burning city, her windblown hair and determined expression conveying a sense of strength and unwavering optimism.
Prompt
facial-expressions Determination: Courage and unwavering resolve ; A hero standing tall; low-angle; Hero; A burning city in the background; cinematic
Characteristic
Shot : A young woman stands in front of a burning city with her back to the viewer. She is wearing a green jacket and a white scarf. The fire is in the background and the woman is looking up at it.
Aesthetic Score : 0.7
Mood : dramatic, intense, hope
Quality
Entropy : 6.85
Noise : 98
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some blur in the background and a slight artifacting around the fire
Strength in Steel: A Woman’s Focus in the Industrial Landscape
A woman in a blue shirt and white overalls stands confidently in a bustling factory, her gaze fixed on a distant point. Her white hardhat and determined expression convey a sense of focus and strength, highlighting the power and resilience of women in the industrial workforce.
Prompt
facial-expressions Determination: Grit and perseverance ; A worker pushing a heavy cart; eye-level; Normal People; A bustling factory floor; cinematic
Characteristic
Shot : A female worker wearing a white hard hat and blue shirt over beige overalls is pushing a cart in a factory. The scene is set in a factory environment with metal structures, conveyor belts, and other workers in the background.
Aesthetic Score : 0.7
Mood : determined, focused, industrial
Quality
Entropy : 6.58
Noise : 94
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is well-composed and there are no visible artifacts or errors.
Lost in Thought: A Moment of Quiet Reflection
A young woman, bathed in soft, warm light, gazes thoughtfully into the distance. Her headset and microphone suggest a recent conversation, leaving her lost in contemplation. The scene evokes a sense of calm and intimacy, inviting viewers to share in her quiet moment of reflection.
Prompt
facial-expressions Determination: Concentration and drive ; A gamer intensely focused on a screen; close-up; Gamer; A dimly lit room with glowing monitors; cinematic
Characteristic
Shot : Close-up portrait of a young woman wearing a headset with blue light. The background is blurred and out of focus.
Aesthetic Score : 0.7
Mood : serious, contemplative, futuristic
Quality
Entropy : 6.54
Noise : 93
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors
Lost in Thought, Watching the Rain Fall
A woman with long, dark hair gazes out a rain-streaked window, her expression a blend of contemplation and wistful longing. The blurry windowpane reflects the melancholic mood, suggesting a sense of isolation and introspection.
Prompt
facial-expressions Determination: Inner strength and hope ; A woman staring out a window; eye-level; Single Person; A stormy sky; cinematic
Characteristic
Shot : A close-up portrait of a woman looking out of a window, with her hair falling around her face.
Aesthetic Score : 0.8
Mood : dreamy, wistful, contemplative
Quality
Entropy : 6.81
Noise : 90
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.80
Image errors : The hair is slightly blurry in places and the skin tone appears overly smooth, suggesting AI generation.
Warrior’s Resolve: A Moment of Hope Amidst the Chaos
A lone female warrior, clad in gleaming armor, stands defiant on a battlefield, her sword held high. The blurry army behind her suggests the intensity of the battle, while her determined gaze and the dramatic lighting create a sense of hope and anticipation. This image captures the essence of courage and resilience in the face of adversity.
Prompt
facial-expressions Determination: Victory and unwavering resolve ; A hero raising a sword; low-angle; Hero; A battlefield with fallen enemies; cinematic
Characteristic
Shot : A woman in armor, with a sword raised in the air, stands in a battlefield with other warriors in the background.
Aesthetic Score : 0.8
Mood : epic, powerful, determined
Quality
Entropy : 6.82
Noise : 94
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly blurry in the background and the lighting appears a little too dramatic and unrealistic, however, the model is very well rendered.
A Shadow of Loss: A Family Faces Ruin
A sepia-toned image captures a poignant moment of melancholy. Three adults and a young girl stand before a damaged house, their somber expressions reflecting the tragedy that has befallen them. The ruined structure in the background serves as a stark reminder of loss, creating an eerie and somber atmosphere.
Prompt
facial-expressions Determination: Resilience and unity ; A family huddled together; eye-level; Normal People; A burning house in the background; cinematic
Characteristic
Shot : A group of four people, two men and two women, standing in front of a dilapidated house. The house is in the background and is partially obscured by smoke. The people are looking at the camera, and they all have a serious expression on their faces.
Aesthetic Score : 0.6
Mood : melancholy, somber, dramatic
Quality
Entropy : 6.67
Noise : 112
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors are present in the image.
Gaming Bliss: A Moment of Pure Joy
Capture the infectious energy of a young woman lost in the world of gaming. Her smile and laughter, illuminated by a pink glow, radiate pure happiness. The blurred background of computer screens adds to the immersive atmosphere, highlighting the joy of the experience.
Prompt
facial-expressions Determination: Excitement and focus ; A gamer’s hands furiously typing on a keyboard; close-up; Gamer; A brightly lit gaming room; cinematic
Characteristic
Shot : A young woman, wearing headphones, is smiling and typing on a keyboard in a dimly lit room, with a computer monitor and colorful lights in the background.
Aesthetic Score : 0.7
Mood : happy, joyful, focused
Quality
Entropy : 6.76
Noise : 81
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Lost in Thought, Amidst the Whispering Trees
A young woman, her face etched with contemplation, gazes into the depths of a mysterious forest. The tall, slender trees create an ethereal atmosphere, hinting at secrets hidden within. The contrast between her delicate features and the stark backdrop evokes a sense of melancholy and intrigue.
Prompt
facial-expressions Determination: Hope and perseverance ; A lone figure walking towards a distant light; eye-level; Single Person; A dark, foreboding forest; cinematic
Characteristic
Shot : A woman with a contemplative expression stands in front of a dark, dense forest.
Aesthetic Score : 0.8
Mood : mysterious, melancholic, pensive
Quality
Entropy : 6.71
Noise : 113
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image appears to be drawn digitally. While the details are well executed, the overall look lacks the texture and richness of a real pencil drawing. Some of the tree trunks appear repetitive and lack natural variations.
Sunset Serenity: A Moment of Peace Above the City
A young woman finds solace on a rooftop, bathed in the golden hues of a setting sun. The sprawling cityscape below evokes a sense of grandeur and aspiration, while her peaceful expression speaks of quiet contemplation and hope for the future.
Prompt
facial-expressions Determination: Confidence and unwavering resolve ; A hero standing on a rooftop; high-angle; Hero; A city skyline bathed in sunlight; cinematic
Characteristic
Shot : A woman is standing on a rooftop with a cityscape in the background, the sun is setting and casting a golden light on the scene.
Aesthetic Score : 0.7
Mood : dreamy, peaceful, hopeful
Quality
Entropy : 6.67
Noise : 87
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : The lighting is uneven and the glare on the woman’s face is distracting.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.46, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t perfectly capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.69, which falls within the “good” range. This indicates that the model was able to understand the scene and create a shot that was generally consistent with the prompt.
- Aesthetic Analysis: The model scored 0.09, which is significantly above the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall, the model demonstrated a good understanding of the scene and shot composition, but struggled to achieve the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://www.scenario.com