AI Captures the Essence, But Misses the Angle: A Look at 'style-aesthetic' Generation with Flux-dev
- 9 minutes read - 1759 wordsTable of Contents
The ‘style-aesthetic’ approach to image generation aims to capture the essence of a scene, not just its literal representation. This involves understanding the mood, emotion, and visual style desired, and translating them into a compelling image. In this experiment, we tested an AI model’s ability to generate images based on prompts that included specific scenes, camera positions, and a ‘style-aesthetic’ descriptor. The results reveal a fascinating insight into the model’s strengths and weaknesses, highlighting its ability to capture the desired aesthetic while struggling with accurate camera positioning. This exploration sheds light on the challenges and opportunities in developing AI models that can truly understand and translate human creative intent.
Created with: flux-dev
Lost in the Digital World: A Moment of Intense Focus
A young person, absorbed in their computer screen, embodies the immersive power of digital entertainment. The dim lighting and close-up shot create a sense of intimacy, highlighting their focused expression and highlighting their connection to the virtual world.
Prompt
style-aesthetic Cinema Verité: Focused, intense, absorbed ; A gamer’s face lit by the glow of a computer screen, eyes glued to the action; close-up; Gaming; Dark room with only the screen illuminating the face; cinematic
Characteristic
Shot : A young man is sitting in a dimly lit room, wearing headphones and looking intently at a computer monitor. The monitor is displaying a colorful, animated image. The man is holding a keyboard, likely playing a video game.
Aesthetic Score : 0.6
Mood : focused, intense, playful
Quality
Entropy : 5.89
Noise : 45
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to have some slight noise in the dark areas of the image, specifically in the background, which could be attributed to low-light conditions or compression.
Sun-Kissed Smiles and a Selfie Moment
Three friends, radiating happiness, capture a joyful moment in front of an archway. The bright sunlight and their infectious smiles create a warm and inviting atmosphere.
Prompt
style-aesthetic Cinema Verité: Joyful, celebratory, memorable ; A family laughing and taking photos in front of a famous landmark; medium shot; Tourism; Vibrant cityscape with iconic architecture; cinematic
Characteristic
Shot : A group of friends are taking a selfie in front of a large archway.
Aesthetic Score : 0.6
Mood : happy, casual, friendly
Quality
Entropy : 6.66
Noise : 58
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible artifacts or errors in the image.
Blur of Speed: Immersed in the Thrill of the Race
A gamer’s intense focus is captured as they navigate a virtual city street, the blurry background adding to the sense of speed and immersion in the racing game.
Prompt
style-aesthetic Cinema Verité: Intense, focused, exhilarating ; A gamer’s hands furiously manipulating a controller; close-up; Gaming; Blurred background of a computer screen displaying a fast-paced game; cinematic
Characteristic
Shot : A person is playing a video game, a racing game, on a large screen television. The person is holding a controller, and the game is showing a blurred city street.
Aesthetic Score : 0.5
Mood : focused, immersive, playful
Quality
Entropy : 6.76
Noise : 51
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts.
Silhouettes of Passion: A Stadium United in Celebration
A powerful image captures the collective energy of a crowd at a sporting event. Silhouetted against the bright lights, the raised arms of fans create a sense of unity and shared excitement, leaving viewers with a feeling of hope and passion.
Prompt
style-aesthetic Cinema Verité: Energetic, passionate, communal ; A group of friends cheering on their favorite team at a sporting event; wide shot; Heroism; Stadium filled with excited fans; cinematic
Characteristic
Shot : Silhouettes of people in a stadium celebrating, possibly after a football match.
Aesthetic Score : 0.7
Mood : joyful, excited, celebratory
Quality
Entropy : 6.47
Noise : 64
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to have some slight noise, particularly in the shadows and in the distant crowd.
Silhouetted Against Hope: A Moment of Contemplation at Sunset
A solitary figure stands on a hilltop, their silhouette stark against the vibrant hues of a setting sun. The cityscape stretches out before them, a canvas of distant lights and fading day. The scene evokes a sense of melancholic contemplation, yet also hints at a glimmer of hope in the fading light.
Prompt
style-aesthetic Cinema Verité: Tranquil, contemplative, awe-inspiring ; A backpacker gazing out at a breathtaking sunset over a foreign city; long shot; Travel; Silhouettes of buildings against a fiery sky; cinematic
Characteristic
Shot : A lone figure stands silhouetted against a fiery sunset, overlooking a city skyline.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, hopeful
Quality
Entropy : 6.60
Noise : 38
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.30
Image errors : No major errors, slight blur in the cityscape.
Silhouette of Courage: Firefighter Battles Blaze
A dramatic image captures the silhouette of a firefighter, hose in hand, battling a raging inferno. The intense flames illuminate the hero’s form, creating a powerful symbol of bravery and sacrifice.
Prompt
style-aesthetic Cinema Verité: Urgent, heroic, chaotic ; A firefighter battling a blaze; close-up; Heroism; Smoke and flames engulfing a building; cinematic
Characteristic
Shot : A firefighter in silhouette is fighting a fire, the scene is dramatic with fire in the background and a spray of water emanating from the firefighter’s hose.
Aesthetic Score : 0.6
Mood : intense, dramatic, heroic
Quality
Entropy : 6.72
Noise : 45
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no significant errors in the image, the resolution is sufficient and the image is well-exposed.
A Hiker’s Moment of Awe: Contemplating the Majestic Peaks
A lone hiker stands on a mountain trail, their gaze fixed on a breathtaking snow-capped mountain range. The vastness of the landscape evokes a sense of serenity and adventure, highlighting the beauty and power of nature.
Prompt
style-aesthetic Cinema Verité: Awe-inspiring, determined ; A lone hiker; wide shot; Adventure; Majestic mountain range with snow-capped peaks; cinematic
Characteristic
Shot : A lone hiker stands on a mountain trail, looking out at a majestic snow-capped peak in the distance. The sky is clear and blue, and the sun is shining brightly.
Aesthetic Score : 0.7
Mood : serene, inspiring, adventurous
Quality
Entropy : 6.49
Noise : 88
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed, resulting in a washed-out appearance.
A Moment of Wonder: A Child’s Gentle Touch and a Butterfly’s Flight
A child’s hand, delicate and small, cradles a butterfly in a moment of pure wonder. The soft focus of the background, with its lush green grass and white wildflowers, adds to the sense of innocence and hope. This image captures the beauty of nature and the joy of simple moments.
Prompt
style-aesthetic Cinema Verité: Innocent, curious, heartwarming ; A young child’s hand reaching out to touch a butterfly; close-up; Family; Lush green meadow with wildflowers; cinematic
Characteristic
Shot : A close-up of a child’s hand gently holding a butterfly in a field of flowers.
Aesthetic Score : 0.7
Mood : peaceful, delicate, whimsical
Quality
Entropy : 6.63
Noise : 59
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors.
Campfire Nights Under a Starry Sky: Cozy and Nostalgic
A group of friends gather around a crackling campfire, bathed in the warm glow of the flames. The night sky above is a canvas of twinkling stars, creating a sense of mystery and wonder. This scene evokes feelings of cozy comfort, peaceful serenity, and nostalgic memories.
Prompt
style-aesthetic Cinema Verité: Warm, intimate, nostalgic ; A family sharing a meal together around a campfire; medium shot; Family; Campsite under a starry night sky; cinematic
Characteristic
Shot : A group of friends are gathered around a campfire under a starry sky, creating a warm and inviting atmosphere. The firelight illuminates their faces and the surrounding scenery, while the night sky above sparkles with countless stars.
Aesthetic Score : 0.7
Mood : cozy, friendly, nostalgic
Quality
Entropy : 6.49
Noise : 76
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some noise and grain are present, especially in the darker areas of the image.
A Vibrant Romance: A Stroll Through India’s Colorful Market
Experience the lively atmosphere of an Indian market as a couple embarks on a romantic journey amidst a bustling crowd. With colorful fruits, vegetables, traditional shops, and vibrant fabrics as their backdrop, the couple’s silhouettes against the warm lighting create a captivating and intriguing scene.
Prompt
style-aesthetic Cinema Verité: Adventurous, curious, vibrant ; A couple exploring a bustling market in a foreign country; medium shot; Travel; Colorful stalls overflowing with exotic goods; cinematic
Characteristic
Shot : A couple walking through a bustling outdoor market in India, showcasing colorful fabrics, fruits, and vegetables.
Aesthetic Score : 0.6
Mood : romantic, exotic, vibrant
Quality
Entropy : 6.64
Noise : 103
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some noise and grain are visible, especially in the shadows.
Conclusion
The results indicate that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.56, which is considered average. This indicates that the model was able to understand the scene in the prompt to a reasonable degree, but there were some discrepancies between the prompt and the generated image.
- Aesthetic Analysis: The model scored 0.09, which is considered very good. This means that the generated image closely matched the expected aesthetic style, despite the other shortcomings.
Overall, the model shows promise in understanding the scene and achieving the desired aesthetic, but needs improvement in accurately capturing the intended camera position.
Sources:
- https://heartofnoir.com/knowing-noir/aesthetic-of-noir/
- https://www.yellowbrick.co/blog/film/maximizing-the-visual-impact-unveiling-the-art-of-film-aesthetics
- https://www.questjournals.org/jrhss/papers/vol10-issue8/1008255260.pdf
- https://www.jstor.org/stable/3331672
- https://www.cinepoetics.fu-berlin.de/activities/workshops/2020-12-ws/index.html
- https://resource.download.wjec.co.uk/vtc/2016-17/16-17_1-22/eng/Part%201%20What%20is%20Aesthetics.pdf
- https://fal.ai/models/fal-ai/flux/dev/api