AI Captures the Scene, But Struggles with the Shot with Leonardo-ai
- 9 minutes read - 1852 wordsTable of Contents
In the realm of AI image generation, capturing the essence of a scene goes beyond simply depicting objects and characters. It involves understanding the nuances of camera position, shot type, and the overall aesthetic. This blog post delves into an experiment that tested an AI model’s ability to generate images based on detailed scene descriptions, exploring its strengths and weaknesses in capturing these crucial elements.
Created with: leonardo-ai
Silhouetted Knight at Sunset: A Moment of Epic Loneliness
A lone knight stands in a field, bathed in the golden light of the setting sun. Their silhouette against the fiery sky evokes a sense of epic loneliness and heroic determination. The scene is both dramatic and mysterious, leaving the viewer to ponder the knight’s story and the battles they may have faced.
Prompt
poses fighting: epic, determined ; A lone warrior; wide shot; heroism; a desolate battlefield with the setting sun in the background; cinematic
Characteristic
Shot : A lone knight in full armor stands with his back to the viewer, gazing at a sunset over a barren landscape. His sword is drawn and held in his right hand. The light from the sunset is casting a warm glow on the scene.
Aesthetic Score : 0.7
Mood : epic, heroic, solitary
Quality
Entropy : 6.89
Noise : 96
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Uncharted Territory: A Temple Beckons in the Jungle’s Embrace
A group of intrepid adventurers, clad in their finest exploration gear, stand before a colossal stone temple shrouded in a dense jungle. The air hangs heavy with the promise of mystery and danger, as rain falls upon the overgrown ruins. What secrets lie within this ancient edifice? Will they find glory or face a perilous fate?
Prompt
poses fighting: intense, adventurous ; A group of adventurers; medium shot; adventure; a dense jungle with ancient ruins in the distance; cinematic
Characteristic
Shot : A group of adventurers are exploring a jungle temple, with tall palm trees and lush greenery surrounding the ancient ruins.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, dramatic
Quality
Entropy : 6.90
Noise : 117
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some minor artifacts, such as some blurring around the edges of the characters.
Neon City Enigma: A Woman on the Verge of Something Big
A mysterious figure in futuristic gear stands poised on a platform overlooking a neon-drenched cityscape. The dramatic lighting and her enigmatic pose hint at a story waiting to unfold. This cyberpunk scene evokes a sense of intrigue and anticipation, leaving you wondering what secrets lie ahead.
Prompt
poses fighting: dynamic, futuristic ; A player character; close-up; gaming; a neon-lit cityscape with holographic projections; cinematic
Characteristic
Shot : A woman in a futuristic outfit is crouching on a platform with a neon cityscape in the background.
Aesthetic Score : 0.7
Mood : cyberpunk, futuristic, mysterious
Quality
Entropy : 6.47
Noise : 93
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some slight artifacts around the woman’s hair and on the neon lights in the background.
Life in Motion: A Bustling Street Market in India
Experience the vibrant energy of a crowded Indian street market, where life unfolds in a whirlwind of colors, sounds, and smells. This image captures the chaotic beauty of daily life, with a sense of depth that draws you into the heart of the action.
Prompt
poses fighting: chaotic, humorous ; Two tourists; medium shot; tourism; a bustling marketplace with colorful stalls and vibrant crowds; cinematic
Characteristic
Shot : A busy street market in India. People are walking through the narrow streets, and there are shops on either side. The air is filled with smoke and the ground is wet.
Aesthetic Score : 0.6
Mood : chaotic, vibrant, bustling
Quality
Entropy : 6.90
Noise : 110
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible image errors
A Solitary Journey Across the Vast Desert
A lone figure traverses a breathtaking desert landscape, the vastness of the dunes and the clear blue sky evoking feelings of solitude, adventure, and hope. The dramatic lighting and soft colors create a contemplative atmosphere, highlighting the figure’s smallness against the grand scale of nature.
Prompt
poses fighting: isolated, desperate ; A lone traveler; long shot; travel; a vast desert landscape with a lone sand dune in the foreground; cinematic
Characteristic
Shot : A lone figure in a wide desert landscape, walking across sand dunes towards a mountain range in the distance. The sky is cloudy and the light is soft.
Aesthetic Score : 0.7
Mood : solitude, adventure, contemplation
Quality
Entropy : 6.65
Noise : 103
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly blurry, especially in the background. The figure’s shadow is a bit too dark and the colors are a bit muted.
Silhouettes of Danger: Rooftop Showdown Under City Lights
Three figures stand poised for battle on a rooftop, their silhouettes stark against the glittering cityscape. The night sky and urban glow amplify the intensity and suspense of the scene, hinting at a gritty, urban conflict.
Prompt
poses fighting: energetic, playful ; A group of friends; medium shot; groups; a rooftop overlooking a city skyline at night; cinematic
Characteristic
Shot : Three young adults, two women and one man, are standing on a rooftop at night, facing each other in a fighting stance. The city lights are visible in the background.
Aesthetic Score : 0.6
Mood : tense, dramatic, urban
Quality
Entropy : 6.52
Noise : 94
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some minor noise and compression artifacts are visible in the image, particularly in the darker areas.
A Warrior’s Burden: The Aftermath of Battle
A lone warrior, shrouded in smoke and fire, stands amidst a burning field, their back turned towards the viewer. The scene evokes a sense of somber reflection, hinting at a difficult choice made or a grim aftermath. The dramatic composition invites you to imagine the story unfolding, leaving you with a lingering sense of mystery and anticipation.
Prompt
poses fighting: tragic, determined ; A lone warrior; close-up; heroism; a burning village with smoke billowing in the air; cinematic
Characteristic
Shot : A lone warrior in armor stands in a field of fire, their back to the viewer. Black smoke fills the sky, and the figure’s silhouette is backlit by the flames. Behind them, a wooden hut is visible, also engulfed in flames. The ground is dark and scorched, and the overall mood is one of destruction and despair.
Aesthetic Score : 0.7
Mood : epic, dark, intense
Quality
Entropy : 6.52
Noise : 95
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.60
Image errors : The fire appears a bit artificial, and the smoke is very smooth, possibly AI-generated. The edges of the image look a bit soft, as if they were slightly blurred.
Lost in the Shadows: A Mysterious Cave Adventure
Three figures, silhouetted against the mist, navigate a dark and foreboding cave. Their flashlights pierce the gloom, revealing a hidden waterfall and a sense of adventure and suspense. What secrets lie within?
Prompt
poses fighting: suspenseful, adventurous ; A group of explorers; wide shot; adventure; a dark cave with flickering torches and mysterious shadows; cinematic
Characteristic
Shot : Three figures are silhouetted against a bright, misty waterfall, illuminated by handheld lamps, inside a dark cave. The light creates an ethereal, mysterious atmosphere
Aesthetic Score : 0.6
Mood : mysterious, suspenseful, adventurous
Quality
Entropy : 5.94
Noise : 93
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable artifacts or errors in the image.
The Future of Interaction: VR Blurs the Lines Between Reality and Fantasy
Two men engage in a playful, futuristic interaction. One, immersed in a virtual world through a VR headset, is guided by the other, creating a dynamic scene of technological exploration and human connection.
Prompt
poses fighting: immersive, intense ; A gamer; close-up; gaming; a virtual reality headset with a pixelated world projected in the background; cinematic
Characteristic
Shot : Two men are interacting in a dark room with screens behind them. The man on the left is wearing a VR headset and is gesturing with his hands. The man on the right has his hand raised as if to interact with the VR user.
Aesthetic Score : 0.6
Mood : futuristic, tech, interactive
Quality
Entropy : 6.13
Noise : 97
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry and the colors are a bit washed out.
The Rush Hour Symphony: A Sea of People at the Train Station
A bustling train station platform comes alive with a sea of people rushing to their destinations. The red train in the background adds a splash of color to the scene, creating a sense of movement and energy. This image captures the vibrant chaos of urban life.
Prompt
poses fighting: fast-paced, chaotic ; Two travelers; medium shot; travel; a crowded train station with people rushing in all directions; cinematic
Characteristic
Shot : A crowded train station platform, people are waiting for a train, the train is in the background, the platform is yellow and black striped
Aesthetic Score : 0.6
Mood : busy, crowded, urban
Quality
Entropy : 6.59
Noise : 109
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is a lot of noise in the image, especially in the shadows, there are also some artifacts in the train windows
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.45, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.56, which is considered average. This indicates that the model was able to understand the scene in the prompt to a reasonable degree, but not exceptionally well.
- Aesthetic Analysis: The model scored 0.08, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model demonstrated a decent understanding of the scene and camera position, but could benefit from improvements in accurately capturing the intended camera position. The model excelled in generating an image that matched the desired aesthetic.