AI's Artistic Eye: Capturing Aesthetics, Missing the Story with Flux-schnell
- 9 minutes read - 1816 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of producing stunning visuals based on text prompts. However, the ability to translate complex scenes into compelling images remains a challenge. This blog post examines a recent experiment where a generative AI model was tasked with creating images based on detailed descriptions, highlighting its strengths and weaknesses in capturing both the aesthetic and narrative aspects of the scene. We’ll explore the concept of ‘dramatic style facial-expressions’ and how it can be used to enhance the emotional impact of images, providing examples of its application in various contexts.
Created with: flux-schnell
Silhouetted Against the Setting Sun: A Moment of Solitude
A lone figure stands in silhouette against a vibrant sunset, evoking a sense of melancholy and contemplation. The dramatic lighting and the vastness of the landscape emphasize the individual’s isolation and introspective mood.
Prompt
facial-expressions Curiosity: Melancholy, contemplative ; A lone figure, silhouetted against a setting sun; eye-level; Single Person; vast, empty desert landscape; cinematic
Characteristic
Shot : A silhouette of a man standing in a desert landscape during sunset.
Aesthetic Score : 0.6
Mood : melancholy, contemplative, lonely
Quality
Entropy : 5.66
Noise : 20
Prompt Clip Score : 0.18
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, which makes it difficult to see the man’s face.
Superman Takes Flight Over a Dazzling Cityscape
A dramatic silhouette of Superman stands on a rooftop, bathed in the glow of a vibrant city at night. The Empire State Building shines in the distance, adding to the heroic and hopeful mood of the scene.
Prompt
facial-expressions Curiosity: Determined, hopeful ; A superhero, standing atop a skyscraper, looking out at the city; eye-level; Hero; bustling cityscape with neon lights; cinematic
Characteristic
Shot : A young man dressed as Superman stands on a rooftop overlooking a cityscape at dusk.
Aesthetic Score : 0.7
Mood : heroic, dramatic, hopeful
Quality
Entropy : 6.85
Noise : 75
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.60
Image errors : The image appears to be slightly over-exposed, resulting in a loss of detail in the highlights. Additionally, there is some noticeable grain in the background.
A Moment of Quiet Reflection in the Park
A young woman sits alone on a bench, her pensive expression and posture hinting at a moment of quiet contemplation. The soft lighting and blurred background create a sense of isolation, emphasizing the woman’s introspective mood. The presence of other figures in the distance adds a touch of loneliness to the scene.
Prompt
facial-expressions Curiosity: Peaceful, observant ; A young woman, sitting on a park bench, watching children play; eye-level; Normal People; vibrant park with blooming flowers; cinematic
Characteristic
Shot : A young woman sits on a park bench in a relaxed pose, with a blurred background of greenery and other people.
Aesthetic Score : 0.7
Mood : calm, contemplative, serene
Quality
Entropy : 6.79
Noise : 94
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors.
Lost in the Glow: A Moment of Intense Focus
A young man, bathed in the blue light of his computer screen, is completely absorbed in his work. The dimly lit room and out-of-focus background lights create a sense of isolation and intensity, highlighting the subject’s focused and contemplative mood.
Prompt
facial-expressions Curiosity: Intense, focused ; A gamer, hunched over a computer screen, eyes glued to the monitor; close-up; Gamer; dimly lit room with flashing lights from the screen; cinematic
Characteristic
Shot : A young man is sitting in front of a computer screen, wearing headphones. The lighting is dim and blueish, creating a sense of focus and concentration.
Aesthetic Score : 0.7
Mood : intense, focused, contemplative
Quality
Entropy : 5.96
Noise : 59
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly grainy, particularly in the darker areas.
A Moment of Reflection in the Marketplace
A man, captured in a close-up shot, walks through a bustling marketplace, his gaze fixed directly on the camera. His neutral expression suggests a moment of introspection, inviting the viewer to connect with his inner thoughts and feelings.
Prompt
facial-expressions Curiosity: Intrigued, observant ; A man, walking through a crowded marketplace, his eyes darting around; eye-level; Single Person; bustling marketplace with colorful stalls and vendors; cinematic
Characteristic
Shot : A close-up portrait of a young man with a beard, standing in a busy outdoor market setting. The image is a bit cluttered with people and objects in the background.
Aesthetic Score : 0.6
Mood : serious, introspective, contemplative
Quality
Entropy : 6.80
Noise : 92
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slight amount of grain and some noise, particularly in the darker areas of the image.
One Man Stands Against the Flames
A young soldier, clad in military gear, stands defiant amidst a raging inferno. Smoke and flames billow behind him, highlighting his isolation and determination in the face of overwhelming odds. This dramatic scene evokes a sense of suspense and urgency, capturing the intensity of war.
Prompt
facial-expressions Curiosity: Brave, resolute ; A hero, standing in the middle of a chaotic battle, looking determined; eye-level; Hero; smoke-filled battlefield with explosions and debris; cinematic
Characteristic
Shot : A young man in military gear stands in front of a blurry background of explosions and smoke. The scene is set in a war-torn city, and the man looks determined and ready for battle.
Aesthetic Score : 0.7
Mood : intense, dramatic, gritty
Quality
Entropy : 6.73
Noise : 74
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors. The background is slightly blurry, which could be intentional.
Friends Gather for a Cozy Evening of Laughter and Joy
A group of friends share a warm and inviting moment, filled with laughter and good company. The warm lighting and genuine smiles create a sense of happiness and togetherness, capturing the essence of a perfect evening with loved ones.
Prompt
facial-expressions Curiosity: Joyful, connected ; A group of friends, gathered around a table, sharing stories and laughter; eye-level; Normal People; cozy living room with warm lighting; cinematic
Characteristic
Shot : A group of four friends are gathered around a table, laughing and enjoying each other’s company. They are drinking and having a good time.
Aesthetic Score : 0.7
Mood : happy, relaxed, casual
Quality
Entropy : 6.86
Noise : 101
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no significant errors in the image.
Caught in the Moment: A Boy’s Video Game Excitement
A close-up shot captures a young boy’s intense focus and surprised expression as he plays a video game. The blur in the background adds to the sense of excitement and energy, making this a truly captivating image.
Prompt
facial-expressions Curiosity: Excited, engaged ; A gamer, holding a controller, eyes wide with excitement; close-up; Gamer; brightly lit gaming room with colorful lights; cinematic
Characteristic
Shot : A young boy playing video games with a controller in his hands. He’s wearing a yellow hoodie, and his eyes are wide with excitement. There’s a blurred figure in the background.
Aesthetic Score : 0.7
Mood : excited, intense, playful
Quality
Entropy : 6.89
Noise : 76
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The left eye of the boy appears to be slightly blurred and slightly distorted. The background seems a little too blurred.
Contemplating the Vastness: A Woman on the Edge of the World
A solitary figure stands on a windswept cliff, gazing out at a turbulent ocean. The dramatic contrast between the woman’s small form and the vastness of the sea creates a powerful sense of contemplation and longing. The crashing waves and cloudy sky add to the mood of serenity and drama.
Prompt
facial-expressions Curiosity: Contemplative, introspective ; A woman, standing at the edge of a cliff, gazing out at the vast ocean; eye-level; Single Person; dramatic cliffside with crashing waves; cinematic
Characteristic
Shot : A woman stands on a cliff overlooking a vast, stormy ocean. The cliff is jagged and rocky, and the waves are crashing against the shore.
Aesthetic Score : 0.7
Mood : dramatic, pensive, moody
Quality
Entropy : 6.78
Noise : 97
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.10
Image errors : No obvious errors. The image is slightly blurry.
Man Stands Defiant Against Burning Building
A lone figure, clad in a dark jacket, stands resolute in front of a blazing inferno. His serious expression and the intense flames create a dramatic scene, hinting at a story of courage and resilience in the face of danger.
Prompt
facial-expressions Curiosity: Brave, selfless ; A hero, standing in front of a burning building, ready to save people; eye-level; Hero; chaotic scene with smoke and flames; cinematic
Characteristic
Shot : A man in a dark jacket stands in front of a burning building, the flames are visible behind him and his face is lit by the fire.
Aesthetic Score : 0.7
Mood : dramatic, suspenseful, dark
Quality
Entropy : 6.69
Noise : 79
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is well-exposed and the colors are vibrant. There are no noticeable artifacts or errors.
Conclusion
The results of the image analysis indicate that the generative AI model performed well in terms of aesthetics and camera position, but struggled with understanding the scene in the prompt.
Here’s a breakdown:
- Aesthetic Analysis: The model achieved a score of 0.1, which falls within the “very good” range of -0.2 to 0.1. This means the generated image closely matched the expected aesthetic style.
- Camera Position Analysis: The model scored 0.1, which is considered “good” as it falls within the range of 0.5 to 0.75. This suggests the model was able to accurately capture the camera position described in the prompt.
- Shot Analysis: The model scored 0.425, which is below the “good” threshold of 0.5 to 0.75. This indicates that the model had difficulty understanding the scene described in the prompt and translating it into the generated image.
Overall, the model demonstrated strong performance in capturing the desired aesthetic and camera position, but struggled with accurately interpreting the scene. This suggests that the model may need further training to improve its ability to understand and represent complex scenes.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux/schnell/api