AI Captures the Scene, But Misses the Mood with Stability-ai-ultra
- 9 minutes read - 1830 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunning visuals based on text prompts. However, the journey towards replicating human creativity is still ongoing. This experiment focused on testing an AI model’s ability to generate images based on specific poses and scene descriptions. While the model demonstrated impressive capabilities in understanding camera angles and shot composition, it struggled to capture the intended aesthetic, highlighting the ongoing challenges in replicating human artistic vision.
Created with: stability-ai-ultra
A Moment of Solitude on the Mountaintop
A lone hiker stands silhouetted against a breathtaking backdrop of snow-capped peaks and swirling clouds, capturing the essence of serenity, contemplation, and adventure. The vastness of the landscape evokes a sense of awe and solitude, inviting viewers to imagine themselves standing on the precipice of discovery.
Prompt
poses hands-in-pockets: determined, confident ; A lone adventurer, standing on a mountain peak; wide shot; heroism; dramatic sky with clouds; cinematic
Characteristic
Shot : A lone hiker stands on the peak of a rocky mountain, looking out at a distant snow-capped range obscured by clouds.
Aesthetic Score : 0.7
Mood : serene, contemplative, adventurous
Quality
Entropy : 6.82
Noise : 79
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors, the image is clear and well-exposed.
A Boy’s Journey to the Ancient Temple
A young explorer stands on the edge of a lush jungle, gazing towards a majestic stone temple. The vastness of the surroundings and the boy’s small stature create a sense of wonder and adventure, hinting at a mysterious and hopeful journey ahead.
Prompt
poses hands-in-pockets: curious, excited ; A young explorer, gazing at a vast jungle; medium shot; adventure; lush green foliage and ancient ruins; cinematic
Characteristic
Shot : A young boy stands on a rocky outcropping, looking toward a stone temple in the middle of a lush jungle. The jungle is dense and green, with large leaves and vines.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, serene
Quality
Entropy : 6.85
Noise : 110
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has some slight aliasing and pixelation, especially in the foliage and the boy’s hair. The lighting is a bit flat, and the shadows could be more defined.
Neon Glow, Intense Focus: A Gamer’s Sanctuary
Immerse yourself in the vibrant world of a dedicated gamer, bathed in pink and blue neon lights. The dramatic lighting highlights their silhouette as they focus intently on the game, capturing the energy and excitement of the gaming experience.
Prompt
poses hands-in-pockets: focused, intense ; A gamer, sitting at a desk with a controller in hand; close-up; gaming; neon lights and computer screens; cinematic
Characteristic
Shot : A young man is sitting in a dimly lit room, wearing headphones and holding a video game controller, with multiple computer monitors in the background. The image is lit in a colorful, neon style.
Aesthetic Score : 0.6
Mood : intense, focused, gaming
Quality
Entropy : 6.74
Noise : 70
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.40
Image errors : The image appears slightly over-processed with a noticeable digital effect on the subject’s skin.
Sunset Stroll: A Moment of Tranquility in a European City
A young woman, bathed in the golden glow of sunset, walks towards a grand building in a European city. Her solitary figure evokes a sense of peace and nostalgia, while the warm light casts a hopeful aura over the scene.
Prompt
poses hands-in-pockets: amazed, happy ; A tourist, admiring a famous landmark; medium shot; tourism; bustling city streets and iconic architecture; cinematic
Characteristic
Shot : A woman is walking away from the camera in a European city square. The woman has a backpack on and is looking towards a grand building in the distance. There are people walking around the square.
Aesthetic Score : 0.7
Mood : tranquil, wanderlust, urban
Quality
Entropy : 6.85
Noise : 67
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight blur around the edges. The woman’s figure is slightly out of focus. Some of the people in the background are not in focus.
Finding Tranquility in the Wildflower Fields
A lone hiker finds peace amidst the vastness of nature, walking a winding road through a field of vibrant yellow wildflowers. Rolling hills in the distance create a sense of isolation and serenity, capturing the adventurous spirit of exploring the great outdoors.
Prompt
poses hands-in-pockets: free, adventurous ; A backpacker, walking along a scenic road; medium shot; travel; rolling hills and vibrant wildflowers; cinematic
Characteristic
Shot : A lone hiker walks down a winding road in a beautiful mountainous valley, with vibrant yellow flowers blooming on both sides of the road.
Aesthetic Score : 0.8
Mood : serene, peaceful, adventurous
Quality
Entropy : 6.74
Noise : 95
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant image errors. Some slight pixelation is visible at higher zoom levels, but overall the image is crisp and clear.
Golden Hour Friendships: Silhouettes of Joy at Sunset
A group of friends stand on a beach, their silhouettes painted against the vibrant hues of a setting sun. The scene evokes a sense of joy, carefree abandon, and nostalgia, capturing the warmth and beauty of shared moments under a golden sky.
Prompt
poses hands-in-pockets: relaxed, joyful ; A group of friends, standing on a beach at sunset; wide shot; groups; golden sand and crashing waves; cinematic
Characteristic
Shot : A group of friends is standing on a beach at sunset. The sun is setting behind them, and they are all looking at each other and smiling. The beach is sandy, and there are waves crashing in the background.
Aesthetic Score : 0.7
Mood : joyful, friendly, carefree
Quality
Entropy : 6.82
Noise : 75
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight noise and graininess. The image could be sharper.
Firefighter Faces the Flames: A Moment of Courage
A firefighter, silhouetted against the fiery inferno, stares directly at the camera, embodying heroism and the dramatic intensity of the situation. The stark contrast between the dark figure and the bright flames creates a powerful visual impact.
Prompt
poses hands-in-pockets: brave, determined ; A firefighter, standing in front of a burning building; medium shot; heroism; smoke and flames; cinematic
Characteristic
Shot : A firefighter in full gear stands in front of a burning building. The flames are bright and intense, and the smoke is thick. The firefighter is looking directly at the camera with a serious expression on his face.
Aesthetic Score : 0.6
Mood : intense, dramatic, heroic
Quality
Entropy : 6.82
Noise : 82
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry, possibly due to motion blur.
Hope Shines Through the Cave
A group of hikers venture deep into a mysterious cave, their path illuminated by a glimmer of light at the end of the tunnel. The scene evokes a sense of adventure, hope, and the promise of what lies beyond the darkness.
Prompt
poses hands-in-pockets: cautious, curious ; A group of explorers, navigating a dark cave; medium shot; adventure; stalactites and stalagmites; cinematic
Characteristic
Shot : A group of hikers are walking through a cave towards a bright light at the end. The cave is lit by the light shining through the entrance, and the hikers are silhouetted against the light.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, hopeful
Quality
Entropy : 6.32
Noise : 93
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is a bit dark. There is some noise in the shadows.
Victory Dance: A Moment of Pure Joy
A young man in a vibrant red jacket celebrates a hard-earned victory, his arms raised in triumph amidst a shower of confetti. The scene is bursting with energy and joy, capturing the pure exhilaration of success.
Prompt
poses hands-in-pockets: excited, triumphant ; A gamer, celebrating a victory with friends; close-up; gaming; celebratory confetti and flashing lights; cinematic
Characteristic
Shot : A young man is celebrating a victory with his arms raised and confetti falling around him. The man is dressed casually in a blue shirt and a red jacket. The background is blurred, suggesting the presence of a crowd and a celebratory atmosphere.
Aesthetic Score : 0.7
Mood : joyful, celebratory, exciting
Quality
Entropy : 6.72
Noise : 74
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image exhibits a slight blur, particularly on the background, and a few instances of slightly unnatural-looking confetti. The lighting in some areas seems slightly artificial and the color saturation is somewhat high.
Golden Hour Family Memories in the Plaza
A heartwarming scene of a family of five enjoying a sunset in a plaza, bathed in the warm glow of the golden hour. The tall monument in the background adds a sense of grandeur and nostalgia to the moment.
Prompt
poses hands-in-pockets: happy, united ; A family, standing in front of a famous monument; wide shot; tourism; historical landmark and sunny sky; cinematic
Characteristic
Shot : A family of four are standing in a square or public square. They are standing on a cobblestone path with lush grass and a tall monument behind them. The sun is setting in the background.
Aesthetic Score : 0.6
Mood : happy, friendly, carefree
Quality
Entropy : 6.65
Noise : 70
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight blurring in the background and slight overexposure in the sky.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.45, which is considered good. This indicates that the model was able to accurately capture the camera position described in the prompt.
- Shot Analysis: The model scored 0.51, also considered good. This suggests that the model understood the scene described in the prompt and was able to create an image that reflected that understanding.
- Aesthetic Analysis: The model scored 0.07, which is not very good. This means that the generated image’s aesthetic deviated significantly from the expected aesthetic based on the prompt.
Overall, the model demonstrates a good understanding of camera position and shot composition, but needs improvement in generating images that match the desired aesthetic.