AI's Artistic Struggle: Capturing the Essence of Style with Flux-schnell
- 10 minutes read - 1933 wordsTable of Contents
The world of artificial intelligence is rapidly evolving, with advancements in areas like image generation pushing the boundaries of what’s possible. However, one area where AI still struggles is in understanding and replicating aesthetic styles. This blog post explores this challenge through a case study, analyzing the performance of a generative AI model in capturing a specific aesthetic style. We’ll delve into the model’s strengths and weaknesses, highlighting its ability to understand camera positions and shot analysis, while also examining its limitations in capturing the desired aesthetic.
Created with: flux-schnell
Silhouette of Hope: Superhero Stands Tall at Sunset
A dramatic silhouette of a superhero stands on a rooftop, bathed in the golden light of sunset. The city skyline stretches out behind them, creating a powerful image of hope and heroism.
Prompt
style-aesthetic Pop art: Epic, hopeful ; A lone superhero, silhouetted against a blazing sunset; wide shot; Heroism; cityscape with towering skyscrapers; cinematic
Characteristic
Shot : A lone superhero silhouette stands against a large setting sun, overlooking a cityscape at sunset.
Aesthetic Score : 0.7
Mood : dramatic, hopeful, powerful
Quality
Entropy : 6.70
Noise : 44
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some pixelation is visible around the edges of the superhero and the sun. The cityscape is slightly blurry in the background.
Uncharted Horizons: Adventurers on the Brink of Discovery
Four explorers stand poised on a cliff, gazing towards ancient ruins nestled within a vibrant jungle. The scene evokes a sense of mystery and adventure, hinting at the secrets that lie ahead. Their hopeful expressions suggest a journey filled with both danger and promise.
Prompt
style-aesthetic Pop art: Excited, adventurous ; A group of adventurers, their faces painted with determination, standing on the edge of a jungle; medium shot; Adventure; lush green foliage and ancient ruins; cinematic
Characteristic
Shot : A group of four people, likely adventurers or explorers, are standing on a cliff overlooking a lush green jungle with ancient ruins in the background.
Aesthetic Score : 0.6
Mood : adventurous, mysterious, adventurous
Quality
Entropy : 6.92
Noise : 127
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to be slightly overexposed, causing some details in the background to be washed out. There is a slight softness in the image, particularly in the foreground, which might be due to a lack of sharpness.
Neon Lights & Focused Gameplay: The Thrill of Gaming Captured
This image captures the intensity of gaming with a young man engrossed in his game, bathed in vibrant red and blue neon lights. The scene exudes energy and anticipation, showcasing the excitement of the gaming experience.
Prompt
style-aesthetic Pop art: Intense, focused ; A gamer, eyes glued to the screen, fingers flying across the keyboard; close-up; Gaming; neon-lit gaming room with flashing lights; cinematic
Characteristic
Shot : A young man wearing a headset is gaming in a dimly lit room with a pink and red color scheme, focusing on his keyboard.
Aesthetic Score : 0.6
Mood : focused, intense, cool
Quality
Entropy : 6.46
Noise : 66
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some minor noise and graininess, particularly in the darker areas. The subject’s face appears slightly overexposed and the colors in the scene are a bit washed out.
Love in the City of Light: A Timeless Romance in Paris
Experience the magic of Paris as a couple stands hand-in-hand in front of the iconic Eiffel Tower. The whimsical and nostalgic mood is set by the charming Parisian street in the background, while the grandeur of the Eiffel Tower adds a touch of romance to this timeless scene.
Prompt
style-aesthetic Pop art: Romantic, nostalgic ; A couple, hand in hand, gazing at the Eiffel Tower; medium shot; Tourism; bustling Parisian street with vibrant colors; cinematic
Characteristic
Shot : A couple stands in front of the Eiffel Tower in Paris, with a street scene behind them.
Aesthetic Score : 0.7
Mood : romantic, charming, Parisian
Quality
Entropy : 6.82
Noise : 103
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : No major image errors. The image quality is good overall.
A Lone Hiker Conquers the Clouds
A vibrant orange jacket stands out against the breathtaking backdrop of a mountaintop, where a lone hiker gazes out over a sea of clouds and distant peaks. This inspiring scene captures the essence of adventure and the awe-inspiring power of nature.
Prompt
style-aesthetic Pop art: Free, adventurous ; A backpacker, with a map in hand, standing on a mountain peak; wide shot; Travel; breathtaking mountain range with clouds swirling below; cinematic
Characteristic
Shot : A lone hiker stands on a mountaintop, gazing out at a vast expanse of clouds and mountain ranges. The sky is a clear blue, and the sun is shining brightly.
Aesthetic Score : 0.7
Mood : serene, adventurous, contemplative
Quality
Entropy : 6.83
Noise : 88
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible errors in the image.
Family Fun in the Park: A Day of Laughter and Love
A heartwarming scene of a family enjoying a sunny day in the park. The father kneels on the grass, while the mother and daughter stand beside him, radiating joy and happiness. The soft lighting and natural colors create a warm and inviting atmosphere, capturing the essence of carefree family moments.
Prompt
style-aesthetic Pop art: Happy, heartwarming ; A family, laughing and playing in a park; medium shot; Family; bright green grass, blooming flowers, and a sunny sky; cinematic
Characteristic
Shot : A family of three, a man, woman and their young daughter, are enjoying a sunny day in a park. They are standing in a grassy area, with trees in the background. The woman is wearing a floral dress and the man is wearing a blue hoodie. The young girl is wearing a white shirt and blue jeans.
Aesthetic Score : 0.7
Mood : joyful, happy, playful
Quality
Entropy : 6.64
Noise : 97
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, which could be due to the low light conditions. The image has been slightly oversharpened, which is visible on the grass and some edges of the image.
Red Cape Hero Takes Flight Amidst a City in Flames
A powerful image captures a man in a red cape soaring through the air above a city skyline, leaving a dramatic trail of smoke in his wake. The scene evokes a sense of action and drama, hinting at a heroic struggle against overwhelming odds.
Prompt
style-aesthetic Pop art: Dynamic, powerful ; A superhero, leaping through the air, leaving a trail of colorful smoke; dynamic shot; Heroism; cityscape with iconic landmarks; cinematic
Characteristic
Shot : A man in a black suit and red cape leaps through the air over a city with smoke plumes in the background.
Aesthetic Score : 0.7
Mood : dramatic, action, urban
Quality
Entropy : 6.84
Noise : 104
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some minor artifacts and errors, such as the slight blurriness of the man’s legs and the unrealistic appearance of the smoke plumes.
Into the Unknown: A Journey Through the Icy Cave
Four figures, shrouded in winter gear, venture deeper into a dark, icy cave. The only light comes from a mysterious blue glow at the exit, casting long shadows and hinting at the secrets that lie ahead. This image evokes a sense of mystery, adventure, and a touch of somberness, leaving the viewer wondering what awaits beyond the cave’s entrance.
Prompt
style-aesthetic Pop art: Suspenseful, thrilling ; A group of adventurers, navigating a treacherous cave; close-up; Adventure; dark and mysterious cave with glowing crystals; cinematic
Characteristic
Shot : A group of people are walking through a dark cave towards a light at the end of the tunnel. The cave is lit up by the light and some icicles hanging from the ceiling. Some yellow objects resembling plants or mushrooms are on the right side of the image.
Aesthetic Score : 0.6
Mood : mysterious, adventurous, dark
Quality
Entropy : 6.19
Noise : 100
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : The yellow objects on the right side of the image look like they were created by a computer program.
Victory Dance! Gamer Celebrates Triumph in Style
A young man, radiating joy, celebrates a gaming victory in a dimly lit room adorned with colorful lights. His energetic pose and beaming smile capture the thrill of triumph, making this a moment of pure gaming bliss.
Prompt
style-aesthetic Pop art: Exuberant, joyful ; A gamer, celebrating a victory with a triumphant fist pump; close-up; Gaming; brightly colored video game interface with flashing lights; cinematic
Characteristic
Shot : A young man wearing headphones and sunglasses is celebrating a victory in a gaming environment. The room is lit up with colorful lights and there is a computer monitor in the background.
Aesthetic Score : 0.7
Mood : joyful, energetic, excited
Quality
Entropy : 6.84
Noise : 82
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant artifacts or errors are present.
Friends, Food, and Festive Lights: A Night of Joy at the Food Stall
Capture the vibrant energy of a group of friends sharing a meal under a canopy of colorful lanterns. The scene exudes joy, vibrancy, and a sense of community, making it a perfect snapshot of a fun-filled evening.
Prompt
style-aesthetic Pop art: Joyful, authentic ; A family, enjoying a delicious meal at a street food stall; medium shot; Travel; vibrant street market with colorful food stalls; cinematic
Characteristic
Shot : A group of friends enjoying a meal at a street food stall in a bustling market, likely in Southeast Asia.
Aesthetic Score : 0.7
Mood : joyful, vibrant, friendly
Quality
Entropy : 6.93
Noise : 101
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight blurriness on the background, especially the lanterns. Some minor overexposure.
Conclusion
The generative AI model performed moderately well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.3, indicating it’s not very good at understanding and implementing camera positions from the prompt. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
- Shot Analysis: The model scored 0.43, which is also not great. It suggests the model has some difficulty understanding the scene described in the prompt and translating it into a visually coherent shot. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
- Aesthetic Analysis: The model scored 0.34, which is significantly below the ideal range of -0.2 to 0.1. This indicates a considerable difference between the expected aesthetic of the image and the actual aesthetic of the generated image.
Overall, the model needs improvement in all three areas, particularly in understanding and implementing the desired aesthetic.
Sources:
- https://heartofnoir.com/knowing-noir/aesthetic-of-noir/
- https://www.yellowbrick.co/blog/film/maximizing-the-visual-impact-unveiling-the-art-of-film-aesthetics
- https://www.questjournals.org/jrhss/papers/vol10-issue8/1008255260.pdf
- https://www.jstor.org/stable/3331672
- https://www.cinepoetics.fu-berlin.de/activities/workshops/2020-12-ws/index.html
- https://resource.download.wjec.co.uk/vtc/2016-17/16-17_1-22/eng/Part%201%20What%20is%20Aesthetics.pdf
- https://fal.ai/models/fal-ai/flux/schnell/api