AI's Eye for Storytelling: A Mixed Bag with Flux-schnell
- 9 minutes read - 1818 wordsTable of Contents
In the realm of visual storytelling, camera positions play a crucial role in conveying emotions, setting the scene, and guiding the viewer’s attention. Dramatic camera positions, such as low-angle shots for power or high-angle shots for vulnerability, are essential tools in the filmmaker’s arsenal. This blog post explores the capabilities of generative AI in understanding and implementing these camera positions, analyzing its performance in creating images based on prompts that include specific camera angles and shot compositions.
Created with: flux-schnell
A Solitary Figure Contemplates the Vastness of the Clouds
A lone figure stands on a mountain peak, dwarfed by the endless expanse of clouds below. The scene evokes a sense of tranquility, contemplation, and the majesty of nature. The small figure against the vastness of the clouds creates a dramatic effect, highlighting the feeling of isolation and the power of the natural world.
Prompt
camera-positions Point-of-view (POV) shot: Epic, triumphant, awe-inspiring ; A lone figure standing on a mountain peak; wide shot; heroism; dramatic cloudscape; cinematic
Characteristic
Shot : A lone figure stands on the peak of a mountain, overlooking a vast expanse of clouds below. The sky is a light blue, and the clouds are white and fluffy. The figure is silhouetted against the sky, giving the image a sense of mystery and isolation.
Aesthetic Score : 0.7
Mood : tranquil, contemplative, serene
Quality
Entropy : 6.29
Noise : 86
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.40
Image errors : The image appears to be slightly overexposed, resulting in a washed-out appearance. The clouds are also somewhat repetitive and lack much variation in texture.
A Hand Reaches for Light in the Eerie Depths
A mysterious hand emerges from the darkness, reaching towards a faint light source in a shadowy cave. A weathered wooden chest sits in the foreground, hinting at secrets and adventure. The stark contrast between light and shadow creates a dramatic and eerie atmosphere.
Prompt
camera-positions Point-of-view (POV) shot: Intriguing, suspenseful, adventurous ; A hand reaching for a treasure chest; close-up; adventure; dark, mysterious cave; cinematic
Characteristic
Shot : A hand reaching out from the darkness towards a treasure chest in a cave, bathed in a shaft of light.
Aesthetic Score : 0.6
Mood : mysterious, adventurous, hopeful
Quality
Entropy : 5.79
Noise : 50
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.30
Image errors : There is a slight blur in the background, which is likely due to the low light conditions.
Lost in the Game: A Moment of Focused Intensity
A dimly lit room, a controller gripped tight, and a sense of anticipation hanging in the air. This image captures the focused intensity of a gamer fully immersed in their virtual world. The dim lighting and the focus on the controller create a sense of intimacy and immersion, hinting at the thrill of the game.
Prompt
camera-positions Point-of-view (POV) shot: Focused, intense, exhilarating ; A player’s hands manipulating a controller; close-up; gaming; brightly lit gaming room; cinematic
Characteristic
Shot : A person holding a game controller in their hands, the controller is in focus. There is a TV screen in the background, the scene is blurry.
Aesthetic Score : 0.6
Mood : relaxing, cozy, gaming
Quality
Entropy : 6.78
Noise : 51
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry and the colors are a bit washed out. The lighting is also a bit too dark.
Hong Kong’s Vibrant Streets: A Symphony of Life
Experience the energy and chaos of Hong Kong’s bustling streets, where shops, businesses, and people converge in a vibrant tapestry of life. The image captures the depth and perspective of the city, with buildings receding into the distance, creating a sense of awe and wonder.
Prompt
camera-positions Point-of-view (POV) shot: Energetic, exciting, overwhelming ; A bustling city street; wide shot; tourism; vibrant, colorful buildings; cinematic
Characteristic
Shot : A street in Hong Kong, with many signs and shops lining both sides. The street is bustling with people and traffic.
Aesthetic Score : 0.6
Mood : busy, urban, lively
Quality
Entropy : 6.91
Noise : 112
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is some slight noise in the image, particularly in the shadows.
Tranquil Journey Through Rolling Hills
A serene view from a train window, capturing the beauty of rolling hills and farmland. The image evokes a sense of nostalgia and the thrill of exploration, as the landscape rushes by.
Prompt
camera-positions Point-of-view (POV) shot: Tranquil, contemplative, nostalgic ; A train window view of passing landscapes; medium shot; travel; rolling hills and fields; cinematic
Characteristic
Shot : A view of a rolling countryside from a train window. The sky is clear and blue, and the sun is shining. The landscape is a mixture of fields and hills.
Aesthetic Score : 0.6
Mood : tranquil, serene, peaceful
Quality
Entropy : 6.59
Noise : 61
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry and lacks sharp focus. The lighting is uneven, with some areas being too bright and others too dark.
Campfire Tales Under a Starry Sky
A group of friends gather around a crackling campfire, their faces illuminated by the warm glow. The night sky above is a canvas of twinkling stars, creating a cozy and inviting atmosphere. This scene evokes feelings of warmth, friendship, and shared stories.
Prompt
camera-positions Point-of-view (POV) shot: Warm, intimate, joyful ; A group of friends laughing and talking around a campfire; medium shot; groups; starry night sky; cinematic
Characteristic
Shot : A group of friends sitting around a campfire under a starry night sky.
Aesthetic Score : 0.7
Mood : warm, cozy, friendly
Quality
Entropy : 6.24
Noise : 77
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight blurriness in some areas, especially around the edges of the image, possible due to a low aperture.
Soaring Above the Clouds: A Pilot’s Perspective
Experience the thrill of flight from the cockpit of a small aircraft, as it navigates a sea of clouds towards a distant runway. This serene and adventurous scene evokes a sense of hope and freedom, capturing the immersive and exciting perspective of being airborne.
Prompt
camera-positions Point-of-view (POV) shot: Thrilling, exhilarating, powerful ; A pilot’s view of the cockpit during takeoff; close-up; heroism; runway and clouds; cinematic
Characteristic
Shot : A cockpit view of a plane flying over a vast sea of clouds.
Aesthetic Score : 0.7
Mood : serene, adventurous, calm
Quality
Entropy : 6.06
Noise : 75
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors.
Dive into Serenity: A Scuba Adventure in a Vibrant Reef
Experience the tranquility of the underwater world as a scuba diver explores a colorful coral reef. The sun’s rays illuminate the scene, casting a mysterious silhouette of the diver against the azure water. This serene and adventurous moment captures the beauty and wonder of the ocean.
Prompt
camera-positions Point-of-view (POV) shot: Peaceful, serene, awe-inspiring ; A diver exploring a coral reef; wide shot; adventure; colorful fish and marine life; cinematic
Characteristic
Shot : A scuba diver swims in the clear blue ocean, surrounded by coral reefs and colorful fish.
Aesthetic Score : 0.7
Mood : serene, peaceful, adventurous
Quality
Entropy : 6.75
Noise : 97
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : None
Lost in the Game: Immersive Fantasy RPG Captures Player’s Attention
A player is fully engrossed in a vibrant fantasy RPG, their focus drawn to the lush green environment on the large monitor. The image evokes a sense of playful immersion, capturing the captivating power of the game world.
Prompt
camera-positions Point-of-view (POV) shot: Immersive, engaging, exciting ; A gamer’s screen displaying a virtual world; close-up; gaming; vibrant, fantastical landscape; cinematic
Characteristic
Shot : A person is playing a video game on a large monitor, the game appears to be a fantasy style RPG.
Aesthetic Score : 0.6
Mood : focused, immersive, gaming
Quality
Entropy : 6.53
Noise : 65
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some graininess and noise are visible in the image, likely due to the lighting conditions.
Capturing Tranquility: Sunset Hues Over the Ocean
A solitary figure stands on the beach, camera in hand, capturing the breathtaking beauty of a sunset over the ocean. The warm colors of the sky create a sense of peace and tranquility, while the hand holding the camera adds a human element to the scene.
Prompt
camera-positions Point-of-view (POV) shot: Romantic, peaceful, serene ; A panoramic view of a sunset over a beach; wide shot; travel; golden light and waves; cinematic
Characteristic
Shot : A person is taking a picture of a beautiful sunset over the ocean, they are on the beach and their hand is in the foreground of the image
Aesthetic Score : 0.7
Mood : serene, tranquil, warm
Quality
Entropy : 6.86
Noise : 71
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to be slightly overexposed in the sky and there is some noise in the shadows.
Conclusion
The results show that the generative AI model performed okay in terms of understanding and implementing camera positions and shot composition.
Here’s a breakdown:
- Camera Position Analysis: The score of 0.35 indicates that the model’s ability to react to camera positions in the prompt is below average. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
- Shot Analysis: The score of 0.47 suggests that the model’s understanding of the scene in the prompt and its ability to create a corresponding shot is also below average. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
- Aesthetic Analysis: The score of 0.22 is very good, indicating that the generated image closely matches the expected aesthetic. A score between -0.2 and 0.1 is considered very good.
Overall: While the model excels in capturing the desired aesthetic, it struggles with accurately interpreting and implementing camera positions and shot composition. This suggests that the model may need further training to improve its understanding of these aspects of visual storytelling.
Sources:
- https://www.studiobinder.com/blog/types-of-camera-shot-angles-in-film/
- https://www.learnaboutfilm.com/film-language/picture/camera-position/
- https://boords.com/blog/16-types-of-camera-shots-and-angles-with-gifs
- https://shorthand.com/the-craft/8-tips-for-great-visual-storytelling/
- https://fal.ai/models/fal-ai/flux/schnell/api