AI's Artistic Eye: Capturing Poses, But Missing the Shot with Stable-diffusion
- 9 minutes read - 1778 wordsTable of Contents
In the realm of AI-generated art, capturing the essence of a scene goes beyond simply rendering pixels. It involves understanding the nuances of composition, camera angles, and the overall aesthetic. This blog post delves into the results of an experiment where an AI model was tasked with generating images based on specific poses and scene descriptions. The results reveal a fascinating insight into the strengths and limitations of AI in capturing the artistic vision.
Created with: stability-ai-core
A Lone Knight Stands Against the Setting Sun
An epic scene of a lone knight, silhouetted against the fiery sunset, stands amidst a desolate landscape with a group of warriors behind him. The dramatic lighting and composition evoke a sense of isolation and power, creating a heroic and dramatic mood.
Prompt
poses fighting: epic, determined ; A lone warrior; wide shot; heroism; a desolate battlefield with the setting sun in the background; cinematic
Characteristic
Shot : A lone warrior stands in a rocky field with his sword drawn, looking towards a group of soldiers in the distance, as the sun sets behind them in the sky.
Aesthetic Score : 0.7
Mood : epic, heroic, dramatic
Quality
Entropy : 6.77
Noise : 73
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.90
Image errors : There are some minor artifacts in the image, particularly in the background where the soldiers are blurred. Some of the rocks in the foreground are also pixelated.
Warriors on the Brink: A Dramatic Encounter in the Jungle
Four warriors, armed and ready, stand guard amidst ancient stone ruins in a sun-drenched jungle. The scene is charged with tension and anticipation, hinting at an epic battle to come. The warm glow of the sun offers a glimmer of hope amidst the impending danger.
Prompt
poses fighting: intense, adventurous ; A group of adventurers; medium shot; adventure; a dense jungle with ancient ruins in the distance; cinematic
Characteristic
Shot : A group of warriors stand in a jungle clearing, surrounded by ancient ruins.
Aesthetic Score : 0.75
Mood : adventurous, mysterious, epic
Quality
Entropy : 6.85
Noise : 86
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some of the textures, particularly on the rocks, appear slightly blurry and lack detail.
Neon Shadows, Cyberpunk City: A Lone Figure in the Night
A futuristic city street bathed in neon light, a lone figure in a cyberpunk-style suit stands in the foreground, their silhouette a stark contrast against the blurred background. The scene evokes a sense of mystery and tension, hinting at an action-packed story unfolding in the shadows.
Prompt
poses fighting: dynamic, futuristic ; A player character; close-up; gaming; a neon-lit cityscape with holographic projections; cinematic
Characteristic
Shot : A futuristic cyberpunk city street at night, with a lone cyborg in the foreground, a blurry background of other cyborgs, and neon lights reflecting off the wet pavement.
Aesthetic Score : 0.7
Mood : dark, futuristic, edgy
Quality
Entropy : 6.75
Noise : 67
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 1.00
Image errors : The image has a few minor artifacts, such as some blurring around the edges of the cyborg and some aliasing on the neon signs.
Tense Standoff in a Bustling Asian Market
Three men locked in a heated confrontation amidst the vibrant chaos of a crowded street market. Their dynamic poses and intense expressions create a palpable sense of urgency and tension, drawing the viewer into the heart of the conflict.
Prompt
poses fighting: chaotic, humorous ; Two tourists; medium shot; tourism; a bustling marketplace with colorful stalls and vibrant crowds; cinematic
Characteristic
Shot : Three men are wrestling in a crowded street in a Southeast Asian city.
Aesthetic Score : 0.6
Mood : intense, chaotic, energetic
Quality
Entropy : 6.85
Noise : 81
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is a slight amount of noise in the image.
Lost in the Golden Hour
A solitary figure traverses a desolate desert landscape, bathed in the warm glow of the setting sun. The vastness of the dunes emphasizes their smallness, creating a poignant sense of loneliness and contemplation.
Prompt
poses fighting: isolated, desperate ; A lone traveler; long shot; travel; a vast desert landscape with a lone sand dune in the foreground; cinematic
Characteristic
Shot : A lone figure walks across a vast desert landscape, his back to the viewer. The sun is setting, casting long shadows across the sand dunes.
Aesthetic Score : 0.7
Mood : mysterious, solitary, desolate
Quality
Entropy : 6.71
Noise : 63
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight blurriness to it. This could be due to the original photo being taken in low light or the digital editing process.
City Lights, Young Hearts: Rooftop Fun with a View
Four friends capture the energy of the city at night, their playful poses and smiles radiating youthful joy against the backdrop of a glittering cityscape. This vibrant scene evokes a sense of urban adventure and excitement, making it a perfect snapshot of carefree fun.
Prompt
poses fighting: energetic, playful ; A group of friends; medium shot; groups; a rooftop overlooking a city skyline at night; cinematic
Characteristic
Shot : Four young adults, two women and two men, are posing on a rooftop at night, with a cityscape background. They are smiling and appear to be having fun.
Aesthetic Score : 0.7
Mood : fun, youthful, energetic
Quality
Entropy : 6.52
Noise : 70
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors
Knight of Ashes: A Stoic Figure Amidst Devastation
A lone knight in full armor stands defiant against a backdrop of fiery destruction. Flames and smoke engulf the village, creating a scene of chaos and despair. Yet, the knight’s unwavering stance suggests resilience and a determination to overcome the devastation.
Prompt
poses fighting: tragic, determined ; A lone warrior; close-up; heroism; a burning village with smoke billowing in the air; cinematic
Characteristic
Shot : A lone knight stands in the middle of a burning village, flames raging around him. The smoke billows in the air, casting a dark shadow over the scene. The knight is heavily armed and armored, his face obscured by his helmet. He holds two swords at the ready, his eyes fixed on the viewer.
Aesthetic Score : 0.7
Mood : dramatic, intense, heroic
Quality
Entropy : 6.82
Noise : 78
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.60
Image errors : The image has some artifacts, especially in the smoke and flames. The edges of the image are also a bit blurry.
Lost in the Shadows: Adventurers Brave a Mysterious Cave
Five adventurers, armed with flickering torches, navigate the claustrophobic depths of a dark and narrow cave. The atmosphere is thick with mystery and suspense, as they venture deeper into the unknown. Will they find what they seek, or will the shadows claim them?
Prompt
poses fighting: suspenseful, adventurous ; A group of explorers; wide shot; adventure; a dark cave with flickering torches and mysterious shadows; cinematic
Characteristic
Shot : A group of adventurers armed with torches are exploring a dark cave, they are standing in a narrow canyon, a faint blue light is illuminating the scene from above.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, suspenseful
Quality
Entropy : 6.13
Noise : 72
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some artifacts are visible on the rocks in the background and the torch flames. The characters’ faces are also slightly unnatural in the background.
Lost in the Virtual Battlefield: A Man’s Immersive VR Experience
This image captures the thrill of virtual reality, showcasing a man fully immersed in a futuristic combat scenario. The dark room and large screen behind him create a sense of isolation and intensity, highlighting the immersive nature of the experience.
Prompt
poses fighting: immersive, intense ; A gamer; close-up; gaming; a virtual reality headset with a pixelated world projected in the background; cinematic
Characteristic
Shot : A man wearing a VR headset is immersed in a virtual reality experience. He is looking at a screen in front of him which depicts a futuristic, sci-fi environment.
Aesthetic Score : 0.6
Mood : futuristic, intense, immersive
Quality
Entropy : 6.23
Noise : 60
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor artifacts in the image, such as slight blurriness in the background and some graininess.
Desperate Pursuit: A Tense Encounter at the Station
A man’s desperate attempt to grab a woman’s arm as she flees, captured in a moment of intense drama. The dynamic lines, strong contrast, and sense of urgency create a powerful visual narrative, leaving the viewer questioning the events leading up to this tense encounter.
Prompt
poses fighting: fast-paced, chaotic ; Two travelers; medium shot; travel; a crowded train station with people rushing in all directions; cinematic
Characteristic
Shot : A man grabs a woman’s arm and pulls her away from a train at a train station. The woman is struggling to resist him.
Aesthetic Score : 0.7
Mood : suspense, intense, urgent
Quality
Entropy : 6.61
Noise : 78
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : no visible errors, blur is intentional
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.45, which is considered below average. This suggests that the model didn’t accurately capture the intended camera positions described in the prompt.
- Shot Analysis: The model scored 0.56, which is considered average. This indicates that the model was able to understand the scene in the prompt to a reasonable degree, but not exceptionally well.
- Aesthetic Analysis: The model scored 0.1, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model seems to be better at understanding the aesthetic style of the prompt than it is at accurately capturing the camera positions and shot composition.