AI's Artistic Eye: Capturing Emotion, Missing the Scene with Flux-dev
- 9 minutes read - 1818 wordsTable of Contents
Facial expressions are a powerful tool for conveying emotions and telling stories. In the realm of generative AI, capturing these expressions accurately is crucial for creating compelling and engaging visuals. This blog post examines a case study where a generative AI model was tasked with generating images based on prompts describing facial expressions and scenes. While the model demonstrated impressive aesthetic capabilities, it struggled with accurately representing the intended camera position and scene details. This highlights the ongoing challenge of bridging the gap between human understanding and AI’s ability to interpret and translate complex prompts into visual representations.
Created with: flux-dev
Silhouetted Solitude: A Moment of Contemplation
A lone figure stands against the fiery backdrop of a setting sun, casting a long shadow across a vast, empty landscape. The scene evokes a sense of serenity and contemplation, tinged with a hint of loneliness. The stark contrast between the bright sun and the dark silhouette creates a dramatic effect, leaving the viewer to ponder the figure’s thoughts and the mysteries of the surrounding world.
Prompt
facial-expressions Curiosity: Melancholy, contemplative ; A lone figure, silhouetted against a setting sun; eye-level; Single Person; vast, empty desert landscape; cinematic
Characteristic
Shot : A lone woman stands silhouetted against a setting sun in a desert landscape.
Aesthetic Score : 0.7
Mood : tranquil, contemplative, serene
Quality
Entropy : 6.23
Noise : 38
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors.
Lost in the Glow: A Moment of Focus and Mystery
A young person sits engrossed in their computer, bathed in the soft glow of colorful lights. The dim room adds an air of intrigue, leaving us to wonder what secrets lie within the digital world they’re exploring.
Prompt
facial-expressions Curiosity: Intense, focused ; A gamer, hunched over a computer screen, eyes glued to the monitor; close-up; Gamer; dimly lit room with flashing lights from the screen; cinematic
Characteristic
Shot : A young man is sitting in a dimly lit room, looking at a computer screen. The room is lit by colorful lights, including a red light in the background. The man is wearing a dark hoodie and has a focused expression on his face.
Aesthetic Score : 0.6
Mood : focused, intense, digital
Quality
Entropy : 6.13
Noise : 70
Prompt Clip Score : 0.18
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some noise and compression artifacts, especially in the darker areas.
Lost in the City’s Tapestry
A solitary figure navigates the vibrant chaos of a bustling market, his pensive gaze reflecting the urban pulse. The blurred background creates a sense of isolation, drawing attention to his contemplative mood.
Prompt
facial-expressions Curiosity: Intrigued, observant ; A man, walking through a crowded marketplace, his eyes darting around; eye-level; Single Person; bustling marketplace with colorful stalls and vendors; cinematic
Characteristic
Shot : A man in a black jacket stands in a busy market, looking off to the side. The background is blurred, with colorful awnings and lights creating a sense of depth.
Aesthetic Score : 0.7
Mood : reflective, pensive, urban
Quality
Entropy : 6.71
Noise : 71
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors in this image.
A Little Hero’s Dream: A Silhouette Against the City Lights
A young boy, cloaked in a superhero cape, stands on the edge of a cityscape bathed in twilight. His silhouette, framed by the twinkling lights below, evokes a sense of hope, solitude, and adventure. The dramatic use of light and shadow creates a captivating scene, hinting at a story waiting to unfold.
Prompt
facial-expressions Curiosity: Determined, hopeful ; A superhero, standing atop a skyscraper, looking out at the city; eye-level; Hero; bustling cityscape with neon lights; cinematic
Characteristic
Shot : A lone figure, seemingly a child, wearing a red cape, stands on a rooftop overlooking a cityscape at dusk, with a blurry background of lights.
Aesthetic Score : 0.7
Mood : mysterious, hopeful, contemplative
Quality
Entropy : 6.71
Noise : 93
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to have some slight graininess and blur, particularly in the background.
Solitude on the Stormy Coast
A woman stands alone on a cliff edge, the wind whipping her hair as she gazes out at the tumultuous ocean below. The dramatic lighting and her solitary figure evoke a sense of isolation and contemplation, creating a powerful and melancholic scene.
Prompt
facial-expressions Curiosity: Contemplative, introspective ; A woman, standing at the edge of a cliff, gazing out at the vast ocean; eye-level; Single Person; dramatic cliffside with crashing waves; cinematic
Characteristic
Shot : A woman standing on a cliff overlooking the ocean, with waves crashing against the rocks below. The sky is cloudy and the air is misty.
Aesthetic Score : 0.8
Mood : dramatic, moody, contemplative
Quality
Entropy : 6.77
Noise : 72
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors
Man Stands Before the Flames, Adventure Beckons
A lone figure, clad in dark green and a vibrant red scarf, stands amidst a field, a roaring fire casting an intense glow behind him. The scene evokes a sense of adventure and drama, with the fire adding a touch of danger and excitement. The presence of others in the background suggests a larger story unfolding.
Prompt
facial-expressions Curiosity: Brave, resolute ; A hero, standing in the middle of a chaotic battle, looking determined; eye-level; Hero; smoke-filled battlefield with explosions and debris; cinematic
Characteristic
Shot : A man in a burgundy cloak, dark green long sleeve shirt, and a brown leather strap stands in front of a fiery backdrop, while several blurry figures of people stand around him
Aesthetic Score : 0.7
Mood : serious, determined, epic
Quality
Entropy : 6.80
Noise : 74
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors.
Lost in the Flames: A Moment of Melancholy
A solitary figure, cloaked in darkness, stands before a roaring fire, his thoughtful expression reflecting the somber mood. The stark contrast between the fire’s light and the man’s silhouette creates a dramatic scene, emphasizing his isolation and the power of the flames.
Prompt
facial-expressions Curiosity: Brave, selfless ; A hero, standing in front of a burning building, ready to save people; eye-level; Hero; chaotic scene with smoke and flames; cinematic
Characteristic
Shot : A man in a hooded jacket stands in front of a fiery backdrop, silhouetted against a large blaze. Other figures in the distance appear to be firefighters or emergency personnel, suggesting an event of chaos and danger.
Aesthetic Score : 0.5
Mood : intense, dramatic, somber
Quality
Entropy : 6.70
Noise : 64
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors, the image seems to have good quality, the fire might be slightly overexposed and the figures in the distance appear blurry, adding to the distance and lack of focus.
A Moment of Quiet Contemplation
A young woman, dressed casually, sits on a park bench, her gaze fixed directly on the camera. Her relaxed posture and introspective expression evoke a sense of quiet contemplation, inviting the viewer to share in her moment of reflection.
Prompt
facial-expressions Curiosity: Peaceful, observant ; A young woman, sitting on a park bench, watching children play; eye-level; Normal People; vibrant park with blooming flowers; cinematic
Characteristic
Shot : A young woman wearing a light blue shirt, sits on a park bench with a green background behind her, in a summery setting.
Aesthetic Score : 0.7
Mood : calm, relaxed, contemplative
Quality
Entropy : 6.69
Noise : 81
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors, slight blurriness in the background, especially the tree on the right.
Candlelit Laughter: A Cozy Gathering of Friends
A group of friends share warm smiles and laughter around a candlelit table, creating an intimate and inviting atmosphere. The soft lighting and relaxed expressions evoke a sense of closeness and friendship.
Prompt
facial-expressions Curiosity: Joyful, connected ; A group of friends, gathered around a table, sharing stories and laughter; eye-level; Normal People; cozy living room with warm lighting; cinematic
Characteristic
Shot : A group of friends are gathered around a table, enjoying a meal and conversation, the soft lighting creates a warm and intimate atmosphere.
Aesthetic Score : 0.7
Mood : cozy, warm, friendly
Quality
Entropy : 6.69
Noise : 75
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no obvious artifacts or errors in the image.
Immersed in the Game: A Boy’s Moment of Excitement
A young boy, headphones on, is completely engrossed in his video game. His surprised expression and focused gaze capture the intensity and excitement of the moment, drawing you into his world of virtual adventure.
Prompt
facial-expressions Curiosity: Excited, engaged ; A gamer, holding a controller, eyes wide with excitement; close-up; Gamer; brightly lit gaming room with colorful lights; cinematic
Characteristic
Shot : A young boy is playing video games, he is wearing headphones and looks surprised or excited, the scene is lit by colored light.
Aesthetic Score : 0.6
Mood : intense, excited, surprised
Quality
Entropy : 6.90
Noise : 63
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors, some noise and grain may be present but negligible.
Conclusion
The results show that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.25, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.49, which is also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflects it.
- Aesthetic Analysis: The model scored 0.11, which is considered very good. This means that the generated image closely matched the expected aesthetic style, despite the issues with camera position and scene understanding.
Overall, the model seems to be better at capturing the desired aesthetic than understanding the scene and camera position. This suggests that the model might need further training to improve its ability to interpret and translate prompts into accurate visual representations.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux/dev/api