AI's Artistic Eye: A Look at Generative Models and Scene Composition with Stability-ai-ultra
- 9 minutes read - 1829 wordsTable of Contents
The world of generative AI is rapidly evolving, with models capable of creating stunning and realistic images from text prompts. One key aspect of this technology is its ability to understand and translate scene composition, including camera positions, shot analysis, and aesthetic style. This blog post explores the performance of a generative AI model in capturing these elements, analyzing its strengths and areas for improvement. We’ll delve into the concept of ‘dramatic style’ and explore how it’s used in various creative contexts, providing examples of its application in film, photography, and digital art.
Created with: stability-ai-ultra
Warrior’s Silhouette Against a Hopeful Sunset
A lone warrior walks into a breathtaking orange sunset, their silhouette stark against the vast sky. A distant city with a towering spire and mountains adds to the epic scale of the scene, while the hopeful mood suggests a journey of purpose and resilience.
Prompt
Stylized: Epic and melancholic ; A lone warrior; wide shot; Heroism; A desolate battlefield with a setting sun; cinematic
Characteristic
Shot : A lone figure walks away from the viewer towards a setting sun, with a large, golden sun behind the figure, creating a silhouette. The landscape is a barren field with a few small hills in the background.
Aesthetic Score : 0.7
Mood : epic, lonely, dramatic
Quality
Entropy : 6.09
Noise : 79
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has a few noticeable artifacts, like the grass, and an obvious, soft glow to the edges of the mountains, which appears to be a technical artifact from the image generation.
Lost Treasure Beckons in a Glowing Cave
A treasure chest overflowing with gold coins lies open in a dark cave, illuminated by a mysterious blue glow. The scene evokes a sense of wonder and excitement, hinting at the discovery of a lost fortune. Will you dare to explore?
Prompt
Stylized: Excitement and wonder ; A treasure chest overflowing with gold; close-up; Adventure; A dark and mysterious cave; cinematic
Characteristic
Shot : A treasure chest overflowing with gold coins, set against a backdrop of a dark cave entrance with a blue glow emanating from the depths.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, wealthy
Quality
Entropy : 6.63
Noise : 94
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly blurry, especially the gold coins. The texture of the rocks is somewhat repetitive and lacks detail.
A Lone Sentinel in a City of Dreams
A solitary figure in futuristic armor stands on a platform overlooking a sprawling neon-lit city. The setting sun casts a warm glow on the scene, with flying vehicles dotting the sky. This epic and hopeful image evokes a sense of mystery and anticipation, leaving the viewer wondering what lies ahead for this lone sentinel.
Prompt
Stylized: Triumphant and futuristic ; A player’s avatar, a powerful warrior, standing triumphantly; medium shot; Gaming; A vibrant and futuristic cityscape; cinematic
Characteristic
Shot : A futuristic cityscape with a lone figure standing on a platform overlooking the city. The city is illuminated by neon lights and a setting sun, creating a vivid, colorful landscape.
Aesthetic Score : 0.8
Mood : futuristic, cyberpunk, contemplative
Quality
Entropy : 6.90
Noise : 97
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.90
Image errors : There are some minor image artifacts, particularly around the edges of the image. The figure’s pose is a little stiff.
City Lights and Bustling Energy
Capture the vibrant pulse of city life with this energetic scene. Tall buildings adorned with advertising, a bustling street filled with cars and pedestrians, and a breathtaking sunset sky create a dynamic and captivating image.
Prompt
Stylized: Energetic and lively ; A panoramic view of a bustling city; long shot; Tourism; A vibrant and colorful cityscape; cinematic
Characteristic
Shot : A stylized depiction of a city street, possibly Times Square in New York, with tall buildings, vibrant billboards, and a bustling flow of traffic.
Aesthetic Score : 0.6
Mood : energetic, vibrant, urban
Quality
Entropy : 6.84
Noise : 86
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 1.00
Image errors : The image has a slightly artificial, cartoonish style. The edges of objects are sometimes jagged and the textures are not particularly realistic.
Solitude in the Setting Sun
A lone figure contemplates the vastness of the desert as the sun sets on the horizon, creating a dramatic scene of isolation and beauty.
Prompt
Stylized: Serene and contemplative ; A lone traveler gazing at a breathtaking sunset; medium shot; Travel; A vast desert landscape; cinematic
Characteristic
Shot : A lone figure stands in a vast desert landscape, gazing at a setting sun. The sky is a vibrant orange and red, with a large sun in the distance. The ground is covered in rocks and sand.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, hopeful
Quality
Entropy : 6.75
Noise : 81
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some artifacts are visible in the sky and on the ground. The figure is somewhat pixelated.
Sun-Kissed Smiles: A Family’s Moment of Joy
A heartwarming scene of a family basking in the sunshine, their smiles radiating happiness and love. The vibrant colors and warm atmosphere create a sense of joy and contentment.
Prompt
Stylized: Joyful and heartwarming ; A family laughing and playing in a park; medium shot; Family; A sunny and idyllic park setting; cinematic
Characteristic
Shot : A family of three, a man, a woman, and their baby girl, are sitting together on a grassy patch in a park on a sunny day. The baby is sitting on her mother’s lap, and the parents are looking at her with loving smiles.
Aesthetic Score : 0.8
Mood : happy, loving, joyful
Quality
Entropy : 6.80
Noise : 75
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : No apparent artifacts or errors
A Solitary Figure Contemplates the Storm’s Fury
A lone figure stands defiant against the elements, silhouetted against a stormy sky. The crashing waves and dark clouds create a dramatic and melancholic scene, evoking a sense of power and suspense.
Prompt
Stylized: Dramatic and powerful ; A lone figure standing on a cliff overlooking a vast ocean; long shot; Heroism; A stormy sea with dramatic clouds; cinematic
Characteristic
Shot : A lone figure stands on a cliff overlooking a stormy sea with large waves crashing against the rocks.
Aesthetic Score : 0.7
Mood : dramatic, ominous, solitude
Quality
Entropy : 6.84
Noise : 98
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some slight artifacts in the clouds and water.
A World of Memories: Vintage Map Lit by Warm Candlelight
A nostalgic scene of a vintage world map adorned with push pins, bathed in the soft glow of a warm lamp and a flickering candle. The ambiance evokes a sense of adventure, cozy comfort, and the longing for faraway places.
Prompt
Stylized: Intriguing and mysterious ; A map with pins marking locations of hidden treasures; close-up; Adventure; A dimly lit room with antique furniture; cinematic
Characteristic
Shot : A world map is spread out on a wooden table, with various pins placed on it. The map is lit by a warm, yellow light from a lamp and a candle. The map has a retro feel and is slightly faded, giving it a sense of history.
Aesthetic Score : 0.7
Mood : nostalgic, warm, cozy
Quality
Entropy : 6.74
Noise : 75
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry in certain areas.
The Archer’s Focus: A Moment of Tense Anticipation
A hooded figure, likely an archer, stands poised in a misty forest, their bow drawn and arrow aimed. The shallow depth of field draws the viewer’s attention to the archer’s hand and face, creating a sense of suspense and anticipation for the moment of action. This mysterious and tense scene evokes a feeling of action and intrigue.
Prompt
Stylized: Intense and focused ; A player’s character, a skilled archer, aiming at a target; close-up; Gaming; A dark and mysterious forest; cinematic
Characteristic
Shot : A hooded archer in a misty forest, aiming with a bow and arrow.
Aesthetic Score : 0.7
Mood : mysterious, dramatic, intense
Quality
Entropy : 6.68
Noise : 72
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.50
Image errors : The image appears to be slightly blurred, and some of the details are not as sharp as they could be.
A Night of Romance and Liveliness in the Heart of the City
Experience the perfect blend of intimacy and vibrancy as you dine with friends at a cozy outdoor restaurant. Nestled in a bustling city street, the warm string lights create a romantic ambiance while the energetic cityscape adds a lively touch to your evening.
Prompt
Stylized: Social and celebratory ; A group of friends enjoying a meal at a restaurant with a view; medium shot; Tourism; A bustling city street with vibrant lights; cinematic
Characteristic
Shot : A group of four friends are having dinner at an outdoor restaurant, lit by string lights and neon signs. The scene is set in a bustling night market with an Asian aesthetic.
Aesthetic Score : 0.6
Mood : romantic, lively, festive
Quality
Entropy : 6.69
Noise : 76
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to be AI-generated, with some blurring and artifacts present, particularly in the background and on the faces of the people.
Conclusion
The results show that the generative AI model performed okay in terms of understanding and reacting to camera positions and scene composition.
Here’s a breakdown:
- Camera Position: The model scored 0.35, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t always accurately capture the intended camera positions described in the prompts.
- Shot Analysis: The model scored 0.55, which falls within the “good” range. This indicates that the model generally understood the scene descriptions in the prompts and produced images that reflected those descriptions.
- Aesthetic Analysis: The model scored 0.01, which is within the “very good” range of -0.2 to 0.1. This means that the generated images closely matched the expected aesthetic style.
Overall, the model demonstrates a decent ability to understand and translate prompts into images, but it could benefit from improvements in its ability to accurately capture camera positions.
Sources:
- https://heartofnoir.com/knowing-noir/aesthetic-of-noir/
- https://www.yellowbrick.co/blog/film/maximizing-the-visual-impact-unveiling-the-art-of-film-aesthetics
- https://www.questjournals.org/jrhss/papers/vol10-issue8/1008255260.pdf
- https://www.jstor.org/stable/3331672
- https://www.cinepoetics.fu-berlin.de/activities/workshops/2020-12-ws/index.html
- https://resource.download.wjec.co.uk/vtc/2016-17/16-17_1-22/eng/Part%201%20What%20is%20Aesthetics.pdf
- https://stability.ai