AI Captures Poses, But Struggles with the Feel with Stability-ai-ultra
- 9 minutes read - 1848 wordsTable of Contents
The ability to generate images from text prompts is a rapidly evolving field, with AI models constantly pushing the boundaries of what’s possible. One key aspect of this technology is the ability to capture not just the literal elements of a scene, but also its mood and aesthetic. This blog post examines the results of a recent experiment, where a generative AI model was tasked with creating images based on detailed scene descriptions. While the model demonstrates a strong understanding of camera positions and shot types, it struggles to capture the desired aesthetic, highlighting the ongoing challenge of replicating human artistic vision in AI.
Created with: stability-ai-ultra
Soldiers Brace for Impact Amidst Exploding Chaos
A line of camouflaged soldiers stands firm, rifles at the ready, as a massive explosion erupts behind them. The intensity of the moment is palpable, with the soldiers’ focused expressions reflecting the gravity of the situation.
Prompt
poses standing-in-a-row: determined, courageous, hopeful ; A group of soldiers; wide shot; heroism; a battlefield with smoke and explosions in the background; cinematic
Characteristic
Shot : A group of soldiers stand in a line, armed with rifles, in front of a large explosion.
Aesthetic Score : 0.7
Mood : intense, powerful, dramatic
Quality
Entropy : 6.55
Noise : 98
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly overexposed and has some noise, especially in the background.
Uncharted Territory: Explorers Brace for Jungle Adventure
Five intrepid explorers, gear in hand, stand poised before a mysterious jungle temple. Their backpacks hint at the journey ahead, filled with anticipation and the promise of discovery. The ancient temple, shrouded in foliage, adds an air of mystery and wonder, beckoning them deeper into the unknown.
Prompt
poses standing-in-a-row: excited, curious, adventurous ; A team of explorers; medium shot; adventure; a lush jungle with ancient ruins in the distance; cinematic
Characteristic
Shot : Five men, wearing khaki shirts and pants and carrying backpacks, stand in a jungle setting with a stone structure and steps in the background.
Aesthetic Score : 0.6
Mood : adventurous, mysterious, daring
Quality
Entropy : 6.76
Noise : 121
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to have been generated using AI, with noticeable artifacts, particularly in the foliage and the stone structure. The textures and details lack natural variation and depth.
In the Heat of the Game: Gamers Immersed in a World of Competition
A dimly lit room pulsates with vibrant blue and red lighting, highlighting a row of gamers intensely focused on their computers. Headsets isolate them, immersing them in a world of competition. The dramatic lighting and composition draw the viewer into the heart of the action, capturing the raw energy and focus of these dedicated players.
Prompt
poses standing-in-a-row: focused, competitive, passionate ; A group of gamers; close-up shot; gaming; a brightly lit esports arena with cheering fans; cinematic
Characteristic
Shot : A group of young men wearing headsets are sitting at computers in a brightly lit room, likely participating in a gaming competition or tournament.
Aesthetic Score : 0.7
Mood : intense, focused, competitive
Quality
Entropy : 6.79
Noise : 73
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no significant errors in the image. However, the lighting is a bit harsh and creates some unwanted glare in the foreground.
Family Adventure in the Majestic Mountains
A heartwarming scene of a family of seven enjoying a picturesque mountain vista. Their smiles and relaxed postures capture the joy of adventure and the awe-inspiring beauty of nature.
Prompt
poses standing-in-a-row: happy, relaxed, joyful ; A family of tourists; long shot; tourism; a breathtaking view of a mountain range with a clear blue sky; cinematic
Characteristic
Shot : A group of seven people, including adults and a child, are standing in a line in front of a mountain range. They are all wearing backpacks and smiling.
Aesthetic Score : 0.6
Mood : happy, adventurous, family
Quality
Entropy : 6.42
Noise : 86
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight blurriness to it, but it’s not noticeable until you look closely.
Sun-Kissed Adventure: A Journey Through the Jungle
Four friends, backpacks in tow, embark on a sun-drenched adventure through a lush jungle. The setting sun casts a golden glow, creating an atmosphere of mystery and intrigue as they walk along the dirt road, surrounded by towering palm trees. This serene and hopeful scene captures the essence of exploration and the promise of what lies ahead.
Prompt
poses standing-in-a-row: free-spirited, adventurous, optimistic ; A group of backpackers; medium shot; travel; a dusty road leading to a distant village with palm trees; cinematic
Characteristic
Shot : Four people with backpacks walking away from the camera on a dirt road in a tropical jungle setting. The sun is setting behind them, creating a warm glow in the sky.
Aesthetic Score : 0.7
Mood : serene, adventurous, hopeful
Quality
Entropy : 6.78
Noise : 105
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors
Shadows and Secrets: A Gathering in the Dark
A dimly lit room, bathed in the glow of spotlights, holds a group of silhouetted figures. The atmosphere is heavy with mystery and anticipation, hinting at a gathering shrouded in secrecy and somber undertones.
Prompt
poses standing-in-a-row: harmonious, powerful, emotional ; A choir singing in harmony; close-up shot; groups; a dimly lit stage with spotlights; cinematic
Characteristic
Shot : A group of people are silhouetted in front of a stage with spotlights. The people are standing in a circle, and the man in the center is facing the audience.
Aesthetic Score : 0.5
Mood : mysterious, dramatic, anticipation
Quality
Entropy : 5.69
Noise : 76
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some minor noise is present in the dark areas of the image, and the silhouette edges are a bit rough.
Dazzling Silhouettes: Women Shine Under the Spotlight
A vibrant stage comes alive with a group of women in colorful dresses, bathed in dramatic lighting that accentuates their silhouettes and the celebratory mood. The scene exudes glamour and energy, capturing a moment of pure joy and spectacle.
Prompt
poses standing-in-a-row: energetic, synchronized, joyful ; A line of dancers; wide shot; groups; a brightly lit stage with colorful costumes; cinematic
Characteristic
Shot : A group of women in colorful outfits are lined up on a stage, with spotlights shining down on them. The stage is empty except for them, and there is a sense of anticipation and excitement in the air.
Aesthetic Score : 0.7
Mood : colorful, festive, confident
Quality
Entropy : 6.89
Noise : 86
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some slight blurriness in some of the subjects. Some minor artifacts in the lighting.
Golden Hour Friendships: Silhouettes of Joy at Sunset
Capture the essence of friendship and nostalgia with this breathtaking sunset scene. Silhouetted figures against the golden light evoke a sense of mystery and romance, while the calm water reflects the warm and inviting mood.
Prompt
poses standing-in-a-row: relaxed, happy, nostalgic ; A group of friends; medium shot; groups; a sunset over a beach with waves crashing in the background; cinematic
Characteristic
Shot : A group of friends stand on a beach at sunset, silhouetted against the golden sky.
Aesthetic Score : 0.7
Mood : joyful, carefree, nostalgic
Quality
Entropy : 6.27
Noise : 81
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no obvious artifacts or errors in the image.
Intriguing Discovery: Scientists Gather in a High-Tech Lab
A group of scientists, bathed in the cool blue and white glow of a high-tech laboratory, stand in a moment of intense focus. The woman in the foreground, arms crossed, is the center of attention, her gaze fixed on something unseen. The dramatic lighting and the scientists’ serious expressions create an atmosphere of mystery and anticipation, hinting at a groundbreaking discovery.
Prompt
poses standing-in-a-row: focused, determined, innovative ; A team of scientists; close-up shot; groups; a laboratory with complex machinery and glowing screens; cinematic
Characteristic
Shot : A group of scientists or lab technicians standing in a sterile lab environment, a woman in the foreground with her arms crossed, a man to her left with his arms crossed, other people blurred in the background, a computer monitor and laboratory equipment in the foreground
Aesthetic Score : 0.6
Mood : serious, professional, scientific
Quality
Entropy : 6.86
Noise : 78
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight blur in the background, which is likely due to the focus being on the foreground subjects.
Protesters Take to the Streets in a Show of Unity and Defiance
A passionate crowd of protesters filled the city streets, their raised arms and signs creating a powerful display of unity and defiance. Messages written in various languages conveyed their shared message of resistance.
Prompt
poses standing-in-a-row: determined, passionate, hopeful ; A group of protesters; long shot; groups; a city street with banners and signs; cinematic
Characteristic
Shot : A group of people are protesting in the street, holding signs and banners. They are standing in the middle of the road, with a busy city street in the background.
Aesthetic Score : 0.6
Mood : intense, passionate, determined
Quality
Entropy : 6.57
Noise : 97
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some slight blurriness, especially on the people in the background.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.46, which is slightly below the “good” range of 0.5 to 0.75. This suggests that the model’s ability to interpret and recreate camera positions from the prompt is decent, but could be improved.
- Shot Analysis: The model scored 0.52, which falls within the “good” range. This indicates that the model is capable of understanding the scene described in the prompt and generating images that reflect the intended shot type.
- Aesthetic Analysis: The model scored 0.13, which is significantly lower than the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated from the expected aesthetic described in the prompt.
Overall, the model demonstrates a good understanding of camera positions and shot types, but needs improvement in capturing the desired aesthetic.