AI Struggles with Camera Angles: A Look at Generative Model Limitations with Flux-schnell
- 9 minutes read - 1848 wordsTable of Contents
Generative AI models are revolutionizing the way we create images, but they still face challenges in accurately capturing the nuances of camera positioning. This is crucial for conveying the intended mood, perspective, and impact of a scene. For example, a wide shot from a high angle can evoke a sense of grandeur and heroism, while a close-up shot can create intimacy and tension. This blog post delves into the challenges of camera position in AI-generated images, exploring the reasons behind these limitations and potential solutions for the future.
Created with: flux-schnell
A Solitary Figure Conquers the Clouds
A lone hiker stands triumphant on a mountain peak, dwarfed by the majestic expanse of clouds and snow-capped peaks. The scene evokes a sense of serenity, awe, and inspiration, capturing the breathtaking beauty of nature’s grandeur.
Prompt
camera-positions Worm’s eye view: inspiring, triumphant ; A lone hiker standing on a mountain peak; wide shot; heroism; a vast, breathtaking panorama of snow-capped mountains and clouds; cinematic
Characteristic
Shot : A lone hiker stands on a mountain peak, overlooking a vast expanse of clouds. The sun shines brightly in the sky, casting a warm glow over the scene.
Aesthetic Score : 0.75
Mood : serene, majestic, adventurous
Quality
Entropy : 6.73
Noise : 100
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, with some blown-out highlights in the sky. The clouds are also a bit too uniform and lack depth.
Hope Beckons in the Cave’s Shadow
A group of figures venture through a dark, mysterious cave, their path illuminated by a glimmer of light at the end. The scene evokes a sense of adventure, hope, and the unknown, leaving viewers captivated by the promise of what lies ahead.
Prompt
camera-positions Worm’s eye view: suspenseful, adventurous ; A group of explorers entering a dark, mysterious cave; medium shot; adventure; ancient stone walls and flickering torches; cinematic
Characteristic
Shot : A group of people are exploring a dark cave. They are silhouetted against the light from the opening of the cave.
Aesthetic Score : 0.6
Mood : mysterious, adventurous, eerie
Quality
Entropy : 4.95
Noise : 64
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no noticeable errors in the image. The image quality is acceptable for a low light scene.
Cyberpunk Dreams: A Glimpse into a Digital World
A shadowy figure, hands flying across a keyboard, faces a screen pulsing with vibrant, abstract patterns. This cyberpunk scene evokes a sense of mystery and intrigue, hinting at a world where technology and imagination collide.
Prompt
camera-positions Worm’s eye view: intense, focused ; A gamer’s hands furiously tapping on a keyboard; close-up; gaming; a brightly lit computer screen displaying a complex game interface; cinematic
Characteristic
Shot : A person is typing on a keyboard in front of a computer screen, the screen is displaying a colorful and detailed interface with a red and green color scheme.
Aesthetic Score : 0.6
Mood : intense, focused, techy
Quality
Entropy : 6.60
Noise : 66
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some glare on the screen, and some graininess in the shadows.
Bustling European Street Scene: A Snapshot of City Life
This image captures the vibrant energy of a European city street, filled with people, shops, and cafes. The scene is lively and bustling, but the composition lacks a strong focal point, giving it a slightly flat feel.
Prompt
camera-positions Worm’s eye view: lively, vibrant ; A bustling city square filled with tourists; wide shot; tourism; colorful buildings, street performers, and souvenir stalls; cinematic
Characteristic
Shot : A bustling street scene in a European city, with shops and buildings on either side and a crowd of people walking through the middle.
Aesthetic Score : 0.6
Mood : busy, vibrant, lively
Quality
Entropy : 6.86
Noise : 113
Prompt Clip Score : 0.17
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, and there are some artifacts in the sky.
A Journey Through Verdant Valleys: A Train Ride Filled with Hope
Experience the thrill of a red train speeding through a lush mountain valley. The perspective from inside the train captures the sense of motion and speed, creating a peaceful, adventurous, and hopeful mood. This scenic journey is sure to inspire wanderlust and a sense of optimism.
Prompt
camera-positions Worm’s eye view: tranquil, nostalgic ; A train speeding through a picturesque countryside; long shot; travel; rolling green hills, quaint villages, and a clear blue sky; cinematic
Characteristic
Shot : A red passenger train moving through a mountain valley on a sunny day, the camera is looking out the window of the train
Aesthetic Score : 0.7
Mood : adventure, travel, freedom
Quality
Entropy : 6.60
Noise : 65
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are some artifacts in the image, particularly in the areas of the blur and the sky. The color balance is slightly off.
Under a Starry Sky, Friendship Glows
A wide-angle lens captures the warmth and joy of a group of friends gathered around a crackling campfire under a breathtaking night sky. The vastness of the universe makes their laughter and camaraderie feel even more precious.
Prompt
camera-positions Worm’s eye view: joyful, intimate ; A group of friends laughing and celebrating around a campfire; medium shot; groups; a starry night sky, a crackling fire, and a sense of camaraderie; cinematic
Characteristic
Shot : A group of friends gathered around a campfire under a starry night sky.
Aesthetic Score : 0.7
Mood : warm, cozy, friendly
Quality
Entropy : 6.46
Noise : 92
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight noise and some slight chromatic aberration, but these are minor and do not detract from the image.
A Solitary Figure Contemplates the City’s Promise
A lone figure stands on a rooftop, silhouetted against a breathtaking sunset. The sprawling cityscape below, bathed in the warm glow of twilight, evokes a sense of hope and possibility. The dramatic sky and the figure’s isolation create a mood of contemplation and introspection.
Prompt
camera-positions Worm’s eye view: powerful, awe-inspiring ; A lone superhero standing atop a skyscraper; wide shot; heroism; a sprawling cityscape with twinkling lights and a dramatic storm in the distance; cinematic
Characteristic
Shot : A lone figure in a superhero costume stands on a rooftop overlooking a city skyline, with a dramatic stormy sky in the background.
Aesthetic Score : 0.6
Mood : dramatic, lonely, epic
Quality
Entropy : 6.53
Noise : 91
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image appears to be a composite of different sources, and the edges of the buildings and the sky seem slightly blurred and unnatural.
Lost in the Mist: A Tranquil Forest Adventure
A group of explorers ventures through a lush, misty forest, their path illuminated by the soft glow of the morning sun. A vibrant parrot perches on a nearby branch, its presence adding a touch of mystery and intrigue to the tranquil scene. This captivating image evokes a sense of adventure and wonder, inviting you to step into the heart of the forest and discover its secrets.
Prompt
camera-positions Worm’s eye view: mysterious, adventurous ; A group of adventurers navigating a dense jungle; medium shot; adventure; lush greenery, towering trees, and the sound of exotic birds; cinematic
Characteristic
Shot : A group of people walking through a lush green forest. There is a bird in the top right corner of the image.
Aesthetic Score : 0.6
Mood : serene, peaceful, adventurous
Quality
Entropy : 6.67
Noise : 121
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable artifacts or errors in the image.
Lost in the Game: A Moment of Focused Play
A player is fully immersed in their game, the blurred background highlighting their focused hands and the controller. The scene evokes a sense of playful concentration and relaxed enjoyment.
Prompt
camera-positions Worm’s eye view: immersive, captivating ; A gamer’s hands holding a controller, immersed in a virtual world; close-up; gaming; a blurry background of a game’s environment and characters; cinematic
Characteristic
Shot : A person is playing video games in a dimly lit room. In the background is a blurry image of what appears to be a video game scene playing on a large screen.
Aesthetic Score : 0.4
Mood : relaxed, focused, immersive
Quality
Entropy : 6.79
Noise : 50
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image appears to be slightly grainy. The lighting is uneven and there are some distracting shadows in the background.
Anticipation and Wonder: Tourists Approach the Taj Mahal
A tranquil scene unfolds as a group of tourists, filled with curiosity, walk towards the majestic Taj Mahal. The white marble mausoleum stands tall, its grandeur captivating the approaching visitors. The image captures the anticipation and awe that washes over those who witness this architectural marvel.
Prompt
camera-positions Worm’s eye view: awe-inspiring, majestic ; A group of travelers gazing at the majestic Taj Mahal; wide shot; tourism; the iconic white marble structure against a clear blue sky; cinematic
Characteristic
Shot : A group of tourists is visiting the Taj Mahal in India. The scene is sunny and the tourists are walking towards the famous mausoleum.
Aesthetic Score : 0.6
Mood : travel, exploration, awe
Quality
Entropy : 6.84
Noise : 76
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed, particularly in the sky and the white marble of the Taj Mahal. This makes the image appear flat and lacks contrast.
Conclusion
The results show that the generative AI model performed well in understanding the scene and shot composition, but struggled with camera positioning. Here’s a breakdown:
- Camera Position: The model scored a 0.3, indicating it had difficulty translating the intended camera position from the prompt into the generated image. This suggests the model needs improvement in understanding and implementing camera angles and perspectives.
- Shot Analysis: The model scored a 0.47, which is considered good. This means it was able to understand the scene and create a shot that was relatively close to what was described in the prompt.
- Aesthetic Analysis: The model scored a 0.38, which is considered good. This indicates that the generated image’s aesthetic was close to the expected aesthetic based on the prompt.
Overall: The model demonstrates a good understanding of scene composition and aesthetics, but needs improvement in accurately interpreting camera positions.
Sources:
- https://www.studiobinder.com/blog/types-of-camera-shot-angles-in-film/
- https://www.learnaboutfilm.com/film-language/picture/camera-position/
- https://boords.com/blog/16-types-of-camera-shots-and-angles-with-gifs
- https://shorthand.com/the-craft/8-tips-for-great-visual-storytelling/
- https://fal.ai/models/fal-ai/flux/schnell/api