AI Captures the Essence of Emotion, But Struggles with Camera Angles with Stable-diffusion
- 9 minutes read - 1793 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate realistic and emotionally evocative images is a rapidly evolving field. This blog post examines the performance of a generative AI model in capturing facial expressions within various scenes. The model demonstrates a remarkable ability to convey the intended emotional tone, but struggles with accurately replicating the desired camera angles. We explore these findings, highlighting the model’s strengths and weaknesses, and discuss the implications for future development in the field of AI-generated imagery.
Created with: stability-ai-core
Silhouetted Solitude: A Moment of Contemplation in the Desert
A lone figure stands silhouetted against a vibrant sunset, casting a long shadow across the vast desert landscape. The scene evokes a sense of solitude and contemplation, highlighting the vastness of the world and the smallness of the individual within it.
Prompt
facial-expressions Curiosity: Melancholy, contemplative ; A lone figure, silhouetted against a setting sun; eye-level; Single Person; vast, empty desert landscape; cinematic
Characteristic
Shot : A lone figure stands in a desert landscape, facing the setting sun. The vast expanse of sand dunes creates a sense of isolation and awe.
Aesthetic Score : 0.7
Mood : serene, contemplative, desolate
Quality
Entropy : 6.77
Noise : 64
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Silhouetted Hero: A Futuristic Guardian Stands Watch
A lone superhero, clad in advanced armor, dominates the skyline. The dramatic silhouette against the cityscape evokes a sense of power and isolation, hinting at the weight of their heroic burden in this futuristic world.
Prompt
facial-expressions Curiosity: Determined, hopeful ; A superhero, standing atop a skyscraper, looking out at the city; eye-level; Hero; bustling cityscape with neon lights; cinematic
Characteristic
Shot : A superhero in a futuristic suit standing on a rooftop overlooking a city skyline at night.
Aesthetic Score : 0.7
Mood : heroic, dramatic, mysterious
Quality
Entropy : 6.52
Noise : 70
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.80
Image errors : The lighting on the superhero’s face is a little unnatural. The cityscape appears a bit too blurred and artificial.
Finding Serenity Amidst the Blossoms
A young woman finds peace and contemplation on a park bench, surrounded by the vibrant beauty of cherry blossoms and tulips. The soft colors and gentle atmosphere create a sense of tranquility, inviting viewers to share in her moment of quiet reflection.
Prompt
facial-expressions Curiosity: Peaceful, observant ; A young woman, sitting on a park bench, watching children play; eye-level; Normal People; vibrant park with blooming flowers; cinematic
Characteristic
Shot : A young woman sitting on a bench in a park, looking out of frame with a wistful expression. She is surrounded by blooming pink trees and orange flowers, with a blurred background of people and trees.
Aesthetic Score : 0.7
Mood : melancholy, peaceful, serene
Quality
Entropy : 6.86
Noise : 78
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image shows slight artifacts in the form of blur and color banding. The image also suffers from a slight over-exposure in the highlights.
The Hacker in the Shadows
A young man, shrouded in darkness, sits hunched over a keyboard, his face illuminated by the blue glow of multiple computer screens. The atmosphere is intense, focused, and undeniably futuristic. The play of light and shadow creates an air of mystery, drawing you into the depths of his digital world.
Prompt
facial-expressions Curiosity: Intense, focused ; A gamer, hunched over a computer screen, eyes glued to the monitor; close-up; Gamer; dimly lit room with flashing lights from the screen; cinematic
Characteristic
Shot : A young man is sitting in front of a computer, typing on the keyboard. The room is dark, and the only light comes from the screens in the background. The man looks intense and focused.
Aesthetic Score : 0.7
Mood : serious, focused, intense
Quality
Entropy : 5.91
Noise : 61
Prompt Clip Score : 0.18
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, and the subject’s face is a bit too dark.
Lost in the Crowd: A Moment of Intensity
A man stands amidst the bustling chaos of a market, his gaze locked directly on the viewer. The blurred background creates a sense of intimacy, drawing you into his world of focused intensity. This urban scene evokes a feeling of seriousness and tension, leaving you wondering what story lies behind his piercing stare.
Prompt
facial-expressions Curiosity: Intrigued, observant ; A man, walking through a crowded marketplace, his eyes darting around; eye-level; Single Person; bustling marketplace with colorful stalls and vendors; cinematic
Characteristic
Shot : A man stands in the middle of a crowded marketplace, with the camera focused on his face, background is blurry and mostly out of focus
Aesthetic Score : 0.7
Mood : serious, intense, contemplative
Quality
Entropy : 6.79
Noise : 77
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : No major artifacts or errors present, some minor noise in the background.
A Knight Amidst the Ashes: A City’s Fall in Flames
A lone knight stands defiant in a war-torn city, engulfed in fire and smoke. The scene evokes a sense of epic drama and somber reflection, highlighting the devastating consequences of conflict.
Prompt
facial-expressions Curiosity: Brave, resolute ; A hero, standing in the middle of a chaotic battle, looking determined; eye-level; Hero; smoke-filled battlefield with explosions and debris; cinematic
Characteristic
Shot : A man in armor is standing in a post-apocalyptic city, surrounded by fire and smoke.
Aesthetic Score : 0.6
Mood : grim, apocalyptic, desolate
Quality
Entropy : 6.90
Noise : 87
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.70
Image errors : Some of the images seem to be slightly out of focus.
Candlelit Laughter: A Moment of Joy and Connection
A heartwarming scene of friends and family gathered around a table, bathed in the warm glow of candlelight. Their laughter and smiles radiate joy and intimacy, capturing the essence of shared moments and cherished connections.
Prompt
facial-expressions Curiosity: Joyful, connected ; A group of friends, gathered around a table, sharing stories and laughter; eye-level; Normal People; cozy living room with warm lighting; cinematic
Characteristic
Shot : A group of friends are having a meal together around a table lit by candles. The scene is warm and inviting, and the people are all smiling and laughing.
Aesthetic Score : 0.7
Mood : joyful, warm, intimate
Quality
Entropy : 6.80
Noise : 77
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : None
Lost in the Game: A Moment of Pure Joy
A young woman, bathed in soft light, is completely engrossed in her video game. Her smile and focused expression radiate pure joy and excitement, capturing the immersive power of gaming.
Prompt
facial-expressions Curiosity: Excited, engaged ; A gamer, holding a controller, eyes wide with excitement; close-up; Gamer; brightly lit gaming room with colorful lights; cinematic
Characteristic
Shot : A young woman is playing video games in a gaming room. She is wearing a headset and holding a video game controller.
Aesthetic Score : 0.7
Mood : joyful, focused, energetic
Quality
Entropy : 6.59
Noise : 71
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are some minor artifacts in the image, such as the slightly blurry background and the slightly soft edges of the woman’s hair.
Solitude on the Stormy Edge
A woman stands silhouetted against the dramatic backdrop of a stormy sea, her back to the viewer. The rugged cliffs and choppy waves create a sense of melancholy and isolation, emphasizing the dramatic lighting and her contemplative pose.
Prompt
facial-expressions Curiosity: Contemplative, introspective ; A woman, standing at the edge of a cliff, gazing out at the vast ocean; eye-level; Single Person; dramatic cliffside with crashing waves; cinematic
Characteristic
Shot : A woman stands on a cliff overlooking a stormy sea with a dramatic cliff face in the background
Aesthetic Score : 0.7
Mood : dramatic, melancholy, contemplative
Quality
Entropy : 6.79
Noise : 74
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to be slightly overexposed, causing some loss of detail in the highlights.
Soldier Stands Amidst the Ruins of War
A lone soldier, silhouetted against a backdrop of flames, embodies the intensity and devastation of war. The burning city and the soldier’s stoic expression create a powerful and somber scene, highlighting the human cost of conflict.
Prompt
facial-expressions Curiosity: Brave, selfless ; A hero, standing in front of a burning building, ready to save people; eye-level; Hero; chaotic scene with smoke and flames; cinematic
Characteristic
Shot : A soldier stands in front of a burning building, the flames are high and the building is partially destroyed. The soldier is wearing a tactical vest and an American flag patch on his arm. There is smoke and debris all around. The image is shot from a low angle, looking up at the soldier.
Aesthetic Score : 0.6
Mood : intense, dramatic, chaotic
Quality
Entropy : 6.82
Noise : 78
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.60
Image errors : Some pixelation artifacts are visible in the smoke and flames, especially in the background.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.2, indicating it did not perform well in capturing the intended camera position. This suggests the generated image might have a significantly different camera angle or perspective than what was described in the prompt.
- Shot Analysis: The model scored 0.55, which is considered good. This means the generated image captured the scene elements and composition reasonably well, but there might be some discrepancies compared to the prompt’s description.
- Aesthetic Analysis: The model scored 0.09, which is considered very good. This indicates that the generated image’s aesthetic closely matched the expected aesthetic, suggesting the model successfully captured the desired visual style.
Overall, the model demonstrated a good understanding of the scene and its composition, but struggled with accurately capturing the intended camera position. The aesthetic of the generated image was very close to the expected aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://stability.ai