AI's Artistic Eye: Capturing Aesthetics, Missing the Shot with Flux-schnell
- 9 minutes read - 1804 wordsTable of Contents
In the realm of artificial intelligence, generative models are pushing the boundaries of creativity, generating images that mimic human artistry. However, these models are still under development, and their ability to fully grasp and translate complex visual concepts remains a challenge. This analysis delves into the performance of a generative AI model in creating images based on detailed scene descriptions, focusing on its ability to capture camera positions, shot composition, and aesthetic details. The results reveal a fascinating dichotomy: while the model excels at capturing the desired aesthetic, it struggles with accurately interpreting camera positions and shot composition instructions. This highlights the ongoing challenges in developing AI models that can fully understand and translate complex visual concepts.
Created with: flux-schnell
Lost in Thought: A Man’s Pensive Gaze in the Urban Maze
A solitary figure stands on a bustling city street, his gaze fixed on something unseen in the distance. The blurred background adds to the mystery, leaving the viewer to wonder what has captured his attention. The mood is pensive, urban, and tinged with intrigue, creating a captivating image that invites contemplation.
Prompt
facial-expressions Daydreaming: Melancholy, lost in thought ; A lone figure; eye-level; Single Person; bustling city street; cinematic
Characteristic
Shot : A man is looking up at the sky in a city street. The scene is busy with people walking and cars driving by. The buildings in the background are out of focus, giving the image a sense of depth.
Aesthetic Score : 0.6
Mood : pensive, mysterious, urban
Quality
Entropy : 6.77
Noise : 72
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : No major errors, slight blurriness in background is intentional
Superman Takes Flight in the City of Shadows
A lone figure, clad in the iconic red and blue, stands atop a rooftop, bathed in the dramatic glow of the city lights. The towering silhouette of a church steeple in the distance adds to the sense of mystery and heroism. This image captures the essence of Superman’s power and his unwavering commitment to justice.
Prompt
facial-expressions Daydreaming: Confident, determined ; A superhero standing on a rooftop; high angle; Hero; cityscape at night; cinematic
Characteristic
Shot : A man dressed as Superman stands on a rooftop overlooking a city skyline. The city lights and the twilight sky create a dramatic backdrop.
Aesthetic Score : 0.6
Mood : dramatic, heroic, contemplative
Quality
Entropy : 6.79
Noise : 80
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has slight blurriness and some graininess, particularly in the background.
Lost in Thought: A Moment of Calm at the Cafe
A young woman finds solace in a warm, inviting cafe, her thoughtful expression and the soft lighting creating a mood of calm contemplation. The scene captures a moment of introspective peace, inviting viewers to share in the quiet beauty of the moment.
Prompt
facial-expressions Daydreaming: Peaceful, content ; A woman sipping coffee in a cafe; eye-level; Normal People; warm, inviting cafe interior; cinematic
Characteristic
Shot : A young woman with long dark hair sits in a cafe, holding a white mug in her hands, and looking off to the side.
Aesthetic Score : 0.7
Mood : thoughtful, relaxed, cozy
Quality
Entropy : 6.66
Noise : 78
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors
Intense Focus: A Moment of Discovery
A young man, eyes wide with surprise, stares intently through his headset. The low light and close-up shot amplify the intensity of his focus, hinting at a moment of revelation or a critical decision.
Prompt
facial-expressions Daydreaming: Engrossed, excited ; A gamer intensely focused on a screen; close-up; Gamer; dimly lit room with gaming peripherals; cinematic
Characteristic
Shot : A young man wearing a headset is intensely focused on something, likely a computer screen or a game. The lighting is dramatic and emphasizes his profile.
Aesthetic Score : 0.7
Mood : intense, focused, determined
Quality
Entropy : 5.82
Noise : 53
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to have slight noise in the shadows, which can be attributed to a low-light setting or post-processing. The lighting is somewhat uneven.
A Child’s Window to Wonder
A young child gazes out of a window, their pensive expression hinting at a world of dreams and possibilities. The lush green garden beyond offers a glimpse of hope and beauty, inviting the viewer to share in the child’s contemplative mood.
Prompt
facial-expressions Daydreaming: Curious, imaginative ; A child staring out a window; eye-level; Single Person; lush green garden; cinematic
Characteristic
Shot : A young child is looking out of a window, their face is slightly out of focus, the background is blurred.
Aesthetic Score : 0.6
Mood : pensive, contemplative, curious
Quality
Entropy : 6.49
Noise : 76
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slight blur, particularly on the child’s face, which could be due to the camera settings, low light, or motion blur.
A Knight’s Journey Through the Misty Forest
A lone knight in shining armor rides through a sun-dappled forest, his path shrouded in mist. The dramatic play of light and shadow creates a sense of mystery and adventure, hinting at the noble quest that lies ahead.
Prompt
facial-expressions Daydreaming: Brave, adventurous ; A knight in shining armor riding through a forest; wide shot; Hero; mystical forest with dappled sunlight; cinematic
Characteristic
Shot : A young person in medieval armor is riding a horse through a forest. The sun is shining through the trees and creating a warm glow on the scene. The focus is on the rider, but the forest is also well defined.
Aesthetic Score : 0.7
Mood : mysterious, whimsical, adventurous
Quality
Entropy : 6.73
Noise : 92
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : There appears to be slight noise reduction applied, this is not always ideal but in this case it does not distract.
Friends Sharing Laughter and Joy on a Sunny Picnic
A heartwarming scene of four friends gathered on a checkered picnic blanket, radiating joy and laughter under the warm, natural light. The image captures the essence of friendship and carefree moments spent together.
Prompt
facial-expressions Daydreaming: Joyful, carefree ; A group of friends laughing together at a picnic; eye-level; Normal People; sunny park with picnic blanket; cinematic
Characteristic
Shot : A group of friends having a picnic in a park, laughing and enjoying themselves.
Aesthetic Score : 0.7
Mood : joyful, friendly, carefree
Quality
Entropy : 6.89
Noise : 98
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant image errors. The image is well-lit and the colors are balanced.
Lost in the Code: A Young Man’s Intense Focus in a Futuristic Setting
A young man sits in a dimly lit room, his gaze fixed on a computer keyboard. Headphones on, he’s deeply engrossed in his work, creating an atmosphere of mystery and intrigue. The vibrant colors on the monitor in the background add a touch of excitement and dynamism to the scene, hinting at a futuristic world where technology reigns supreme.
Prompt
facial-expressions Daydreaming: Thrilled, competitive ; A gamer’s hands rapidly moving across a keyboard; close-up; Gamer; brightly lit gaming setup with glowing screen; cinematic
Characteristic
Shot : A young man wearing headphones sits at a desk in a dimly lit room. He is using a computer keyboard, and there are two computer monitors in the background.
Aesthetic Score : 0.7
Mood : focused, serious, intense
Quality
Entropy : 6.72
Noise : 69
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry in the background. There is also some noise in the image, especially around the subject’s face.
Lost in the Moment: A Dreamy Beachscape
A young woman finds solace on a tranquil beach, her hair dancing in the wind. The soft light and shadows cast a dreamy spell, while the calming ocean whispers tales of peace and contemplation.
Prompt
facial-expressions Daydreaming: Reflective, introspective ; A woman walking alone on a beach; eye-level; Single Person; vast, empty beach with crashing waves; cinematic
Characteristic
Shot : A young woman standing on a beach, looking to the right side of the frame with a thoughtful expression. The ocean is in the background, with waves breaking on the shore.
Aesthetic Score : 0.7
Mood : peaceful, contemplative, serene
Quality
Entropy : 6.70
Noise : 87
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors.
Soaring High: A Young Superman Embraces the Adventure
This image captures the essence of youthful optimism and adventure. A young man, clad in a Superman shirt and backpack, flies through the clouds, his determined gaze fixed on the sky. The dramatic lighting creates a halo effect around his head, adding a touch of the divine to his heroic journey.
Prompt
facial-expressions Daydreaming: Empowered, triumphant ; A superhero soaring through the sky; high angle; Hero; dramatic cloudscape with city skyline in the distance; cinematic
Characteristic
Shot : A young man in a Superman t-shirt, with arms raised, is looking up at the sky. There are clouds in the background.
Aesthetic Score : 0.6
Mood : optimistic, hopeful, adventurous
Quality
Entropy : 6.69
Noise : 73
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.50
Image errors : None
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.26, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t quite capture the intended camera positions as described in the prompt.
- Shot Analysis: The model scored 0.45, also below the “good” range. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create the expected shot composition.
- Aesthetic Analysis: The model scored 0.14, which is within the “very good” range of -0.2 to 0.1. This means the generated image’s aesthetic closely matched the expected aesthetic described in the prompt.
Overall: While the model excelled in capturing the desired aesthetic, it struggled with accurately interpreting the camera position and shot composition instructions.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://fal.ai/models/fal-ai/flux/schnell/api