AI Captures the Essence, But Misses the Shot with Stable-diffusion
- 9 minutes read - 1812 wordsTable of Contents
Dramatic poses are often used in visual storytelling to convey emotion, action, and a sense of grandeur. They can be found in everything from classic paintings to modern movies. In this experiment, we tested an AI model’s ability to generate images based on descriptions of dramatic poses and scenes. The results revealed that while the model excelled at capturing the desired aesthetic style, it struggled with accurately interpreting camera position and shot composition instructions. This highlights the ongoing challenge of teaching AI to understand and translate complex visual concepts.
Created with: stability-ai-core
Silhouetted Against the Setting Sun
A solitary figure stands in contemplation against a breathtaking sunset, casting a long shadow across the vast desert landscape. The scene evokes a sense of melancholy and wonder, leaving the viewer to ponder the figure’s thoughts and the mysteries of the desert.
Prompt
poses leaning: epic, hopeful ; A lone figure, silhouetted against a setting sun; wide shot; heroism; a vast, desolate landscape; cinematic
Characteristic
Shot : A silhouette of a lone figure standing on a hilltop overlooking a vast desert landscape at sunset.
Aesthetic Score : 0.6
Mood : solitude, dramatic, melancholic
Quality
Entropy : 6.19
Noise : 53
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slight overexposure, causing the sun to be blown out and the sky to appear washed out. There is also a bit of noise in the darker areas of the image.
Lost in the Shadows: Adventurers Face the Unknown
A group of brave explorers stand poised in a dimly lit cave, their faces illuminated by flickering torches. The atmosphere is thick with mystery and suspense, as they navigate the treacherous depths of the unknown. The dramatic lighting and composition heighten the sense of danger, leaving viewers wondering what secrets lie ahead.
Prompt
poses leaning: suspenseful, adventurous ; A group of adventurers, their faces illuminated by flickering torchlight; medium shot; adventure; a dark, mysterious cave; cinematic
Characteristic
Shot : A group of adventurers, dressed in explorer attire, are standing in a dark cave lit by torches. The cave walls are made of rough-hewn rock, and the adventurers are looking out towards the light of the torches.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, suspenseful
Quality
Entropy : 6.15
Noise : 77
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable artifacts or errors in the image.
The Typing in the Shadows
A close-up shot captures a hand furiously typing on a keyboard in a dimly lit room. The blurred background and low-light create an atmosphere of mystery and intensity, leaving the viewer wondering what secrets are being typed.
Prompt
poses leaning: intense, focused ; A gamer’s hands, fingers flying across a keyboard; close-up; gaming; a brightly lit gaming setup; cinematic
Characteristic
Shot : A person’s hand is reaching for a keyboard in a dimly lit room, with a computer screen in the background. The screen is displaying a blurry image of the word ‘Gamer’.
Aesthetic Score : 0.5
Mood : dark, mysterious, focused
Quality
Entropy : 6.26
Noise : 55
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is a slight blurriness in the image, particularly around the edges. The colors are also slightly muted and lack vibrancy.
Silhouettes of Love Against the Setting Sun
A couple embraces on a rooftop, their silhouettes painted against the fiery sunset sky. The cityscape stretches out below, adding a sense of grandeur to this intimate moment of love and affection.
Prompt
poses leaning: romantic, awe-inspiring ; A couple leaning on a railing, gazing out at a breathtaking cityscape; medium shot; tourism; a vibrant, bustling city; cinematic
Characteristic
Shot : A couple in love embracing on a rooftop overlooking a city skyline during a sunset.
Aesthetic Score : 0.7
Mood : romantic, intimate, hopeful
Quality
Entropy : 6.32
Noise : 56
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : No major artifacts or errors. Some minor noise in the background, but it’s acceptable for the style of the image.
A Hiker’s Moment of Awe in the Swiss Alps
A lone hiker stands on a gravel road, dwarfed by the majestic Swiss Alps. The tranquil scene evokes a sense of adventure and serenity, as the hiker contemplates the vastness of the mountains.
Prompt
poses leaning: reflective, adventurous ; A backpacker, leaning against a weathered signpost, looking out at a winding mountain road; medium shot; travel; a scenic mountain range; cinematic
Characteristic
Shot : A lone hiker stands on a gravel road in the Swiss Alps, looking towards a majestic mountain range in the distance. The road is surrounded by grass and small trees, while the mountains are covered in snow and vegetation.
Aesthetic Score : 0.7
Mood : serene, adventurous, inspiring
Quality
Entropy : 6.81
Noise : 80
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no noticeable artifacts or errors in the image.
Friendship Blooms on a Cobblestone Street
Four friends, radiating joy and laughter, stroll down a charming European street. The warm lighting and intimate composition capture the essence of their carefree camaraderie.
Prompt
poses leaning: joyful, carefree ; A group of friends, laughing and leaning on each other, as they walk down a cobblestone street; wide shot; groups; a charming, historic town; cinematic
Characteristic
Shot : A group of friends are walking down a cobblestone street in a European city. They are all smiling and laughing, and seem to be enjoying their time together.
Aesthetic Score : 0.7
Mood : happy, friendly, carefree
Quality
Entropy : 6.83
Noise : 84
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors.
A Solitary Figure Defies the Storm
A lone figure stands on a clifftop, arms outstretched, facing the raw power of a stormy sea. The dramatic contrast between the small human form and the vast, crashing waves evokes a sense of both vulnerability and resilience. This powerful image captures the beauty and awe-inspiring nature of a tempestuous seascape.
Prompt
poses leaning: powerful, defiant ; A lone figure, standing on a cliff edge, arms outstretched, leaning into the wind; wide shot; heroism; a dramatic, stormy sea; cinematic
Characteristic
Shot : A man stands on a cliff overlooking a stormy ocean with large waves crashing against the rocks. The sky is dark and cloudy.
Aesthetic Score : 0.7
Mood : dramatic, intense, powerful
Quality
Entropy : 6.71
Noise : 75
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible errors in the image.
Campfire Camaraderie in the Deep Woods
Four friends gather around a crackling campfire, their faces illuminated by the warm glow. The dense forest surrounding them adds an air of mystery and adventure, creating a cozy and inviting atmosphere.
Prompt
poses leaning: intimate, suspenseful ; A group of explorers, huddled around a campfire, sharing stories; medium shot; adventure; a dense, mysterious forest; cinematic
Characteristic
Shot : Four men are sitting around a campfire in a forest. The scene is lit by the fire and the surrounding trees are dark and mysterious. The men are all wearing outdoor clothing and appear to be enjoying their time around the fire.
Aesthetic Score : 0.7
Mood : rustic, adventurous, contemplative
Quality
Entropy : 6.30
Noise : 82
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : None detected.
Lost in the Digital Realm: A Moment of Intense Focus
A young man, bathed in the soft glow of his computer screen, is completely absorbed in his digital world. The low-light and his focused gaze create a sense of intensity, highlighting the power of technology to captivate and engage. The blurry background of computer monitors further emphasizes his immersion in this digital realm, leaving the viewer to wonder what captivating activity he is engaged in.
Prompt
poses leaning: intense, focused ; A gamer’s face, illuminated by the glow of a monitor, eyes wide with excitement; close-up; gaming; a dimly lit room; cinematic
Characteristic
Shot : A young man wearing headphones is sitting in front of a computer, looking at something out of the frame. He is wearing a dark blue hoodie and has a serious expression on his face.
Aesthetic Score : 0.6
Mood : intense, focused, serious
Quality
Entropy : 5.99
Noise : 59
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor artifacts in the image, particularly around the edges of the man’s head.
Silhouettes of Hope: A Family’s Sunset Moment
A serene and peaceful scene captures a family of four standing on a beach at sunset, their silhouettes against the vibrant sky creating a sense of hope and tranquility. The moment evokes a feeling of calm and connection, as they gaze out towards the vast ocean.
Prompt
poses leaning: peaceful, heartwarming ; A family, leaning on each other, watching a sunset over a vast ocean; wide shot; travel; a serene, sandy beach; cinematic
Characteristic
Shot : A family of four stands on a beach at sunset, looking out at the ocean.
Aesthetic Score : 0.7
Mood : peaceful, serene, hopeful
Quality
Entropy : 6.77
Noise : 66
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : None
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is considered below average. This suggests that the model didn’t accurately capture the intended camera positions described in the prompt.
- Shot Analysis: The model scored 0.495, which is also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create the expected shot composition.
- Aesthetic Analysis: The model scored 0.07, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model seems to have difficulty interpreting the camera position and shot composition instructions, but it successfully captured the desired aesthetic style.