AI's Artistic Struggle: Capturing the Scene, Not the Feeling with Stable-diffusion
- 9 minutes read - 1867 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on text prompts has become increasingly sophisticated. However, the challenge of capturing the essence of a scene, not just its literal elements, remains a hurdle. This blog post examines an experiment where an AI model was tasked with creating images based on detailed descriptions, revealing its strengths and weaknesses in capturing the desired aesthetic, scene understanding, and camera position.
Created with: stability-ai-core
Silhouetted Against the Sunset: A Moment of Solitude and Vastness
A lone figure stands in stark contrast against a breathtaking sunset over a majestic mountain range. The scene evokes a sense of epic grandeur, serenity, and contemplation, with the silhouette emphasizing the feeling of solitude and the vastness of the landscape.
Prompt
poses profile: Epic, hopeful, determined ; A lone figure, silhouetted against a setting sun; wide shot; Heroism; A vast, mountainous landscape; cinematic
Characteristic
Shot : A lone figure stands on a mountaintop at sunrise, looking out over a vast valley with distant mountains.
Aesthetic Score : 0.7
Mood : serene, contemplative, epic
Quality
Entropy : 6.78
Noise : 61
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.70
Image errors : Some minor aliasing artifacts on the mountains and the figure, slight blurriness, the background has a less realistic and slightly more cartoonish feel
A Hiker’s Perspective: Serenity and Adventure in a Lush Valley
Experience the breathtaking beauty of a deep valley, where a lone hiker stands on a rocky cliff, dwarfed by the vastness of nature. Lush greenery, a winding river, and cascading waterfalls create a serene and inspiring scene, highlighting the adventurous spirit of exploration.
Prompt
poses profile: Adventurous, free-spirited, awe-inspired ; A backpacker standing on a cliff edge, looking out at a breathtaking view; medium shot; Adventure; A sprawling valley with cascading waterfalls; cinematic
Characteristic
Shot : A man standing on a cliff overlooking a valley with a river winding through it and waterfalls cascading down the mountainside. The sky is blue with white clouds.
Aesthetic Score : 0.8
Mood : tranquil, majestic, inspiring
Quality
Entropy : 6.44
Noise : 82
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors are visible in the image.
Lost in the Game: A Gamer’s World Unveiled
A young man, headphones on, is completely absorbed in a dimly lit room filled with screens displaying vibrant video game scenes. The intense focus and immersive atmosphere create a sense of mystery and intrigue, drawing you into the world of a dedicated gamer.
Prompt
poses profile: Focused, intense, passionate ; A gamer’s hands, illuminated by the glow of a monitor, holding a controller; close-up; Gaming; A dimly lit room with gaming posters on the walls; cinematic
Characteristic
Shot : A young man wearing headphones is sitting in front of a computer. The room is dark and there are multiple computer screens lit up.
Aesthetic Score : 0.6
Mood : intense, focused, dark
Quality
Entropy : 5.68
Noise : 57
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is a slight color shift in the background screens, and the image appears slightly grainy in areas.
Tranquility Amidst the City: A Cathedral’s Majestic Presence
A peaceful scene unfolds in a cobblestone square, where a man contemplates the imposing architecture of a grand cathedral. The clear blue sky and bustling activity around him create a sense of urban serenity.
Prompt
poses profile: Curious, excited, appreciative ; A tourist gazing up at a majestic cathedral; medium shot; Tourism; A bustling city square with cobblestone streets; cinematic
Characteristic
Shot : A man stands in a cobblestone square in front of a large cathedral, looking at the building with a thoughtful expression. There are other people in the square, some in the background, and some walking past him. The day is bright and sunny.
Aesthetic Score : 0.7
Mood : pensive, urban, majestic
Quality
Entropy : 6.86
Noise : 81
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors or artifacts in this image.
A Journey of Contemplation
A man, lost in thought, gazes out the train window at a passing landscape. His pensive expression evokes a sense of longing and contemplation, capturing the essence of a journey both physical and internal.
Prompt
poses profile: Reflective, contemplative, nostalgic ; A traveler sitting on a train, looking out the window at passing scenery; medium shot; Travel; A scenic train journey through rolling hills and fields; cinematic
Characteristic
Shot : A man sits by a window on a train, looking out at a rural landscape. The train appears to be old and the interior is dark and slightly worn. The man has a thoughtful expression on his face.
Aesthetic Score : 0.7
Mood : pensive, contemplative, melancholic
Quality
Entropy : 6.31
Noise : 72
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slightly grainy texture, and there is some noise in the darker areas. The exposure is also slightly low.
Friends, Laughter, and Good Times: A Celebration Captured
This heartwarming image captures the essence of a joyful gathering. A group of friends, bathed in the warm glow of string lights, share laughter and good times in a cozy, celebratory setting. The scene radiates a sense of friendship, joy, and spontaneity, making it a perfect snapshot of a memorable occasion.
Prompt
poses profile: Joyful, celebratory, connected ; A group of friends laughing and celebrating together; wide shot; Groups; A lively party with colorful decorations and music; cinematic
Characteristic
Shot : A group of friends are gathered around a table, laughing and enjoying each other’s company. There are balloons and string lights in the background, suggesting a party or celebration. The table is set with food and drinks, indicating that they are having a meal together.
Aesthetic Score : 0.7
Mood : happy, festive, joyful
Quality
Entropy : 6.67
Noise : 74
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly out of focus, particularly in the background. There are also some minor artifacts around the edges of the image.
Superman Stands Tall as the Sun Sets on Metropolis
A dramatic silhouette against the setting sun, Superman stands atop a building, his iconic suit and cape billowing in the wind. The scene evokes a sense of hope and anticipation, as the Man of Steel prepares to face whatever challenges lie ahead.
Prompt
poses profile: Powerful, confident, inspiring ; A superhero standing tall, cape billowing in the wind; medium shot; Heroism; A cityscape with towering skyscrapers; cinematic
Characteristic
Shot : Superman stands on a rooftop overlooking a city skyline with his cape billowing behind him.
Aesthetic Score : 0.7
Mood : heroic, powerful, dramatic
Quality
Entropy : 6.76
Noise : 73
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has a slight blurriness, especially in the background. The subject’s costume appears to be digitally added, and the seams are not perfectly aligned.
Lost in the Jungle: Exploring an Ancient Temple
A group of adventurers stand before a mysterious stone structure, overgrown with vines and trees. Sunlight filters through the canopy, casting an ethereal glow on the scene. This serene jungle setting evokes a sense of wonder and mystery, inviting you to explore the secrets hidden within.
Prompt
poses profile: Intrigued, adventurous, determined ; A group of explorers navigating a dense jungle; wide shot; Adventure; Lush greenery, ancient ruins, and dappled sunlight; cinematic
Characteristic
Shot : A group of five people are standing in front of an ancient temple overgrown with foliage in a lush tropical jungle setting. The temple appears to be in a state of disrepair, with vines and trees growing over its walls.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, serene
Quality
Entropy : 6.85
Noise : 97
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to be slightly overexposed, and there is some noise in the shadows.
In the Zone: A Hacker’s Focus
A young man, bathed in the glow of multiple computer monitors, sits intently at his keyboard, headphones on, lost in a world of code. The dim lighting and focused expression convey a sense of intense concentration and determination, capturing the essence of a hacker in their element.
Prompt
poses profile: Focused, competitive, determined ; A gamer’s face, lit by the screen, showing intense concentration; close-up; Gaming; A dimly lit room with a gaming setup and neon lights; cinematic
Characteristic
Shot : A young man wearing headphones is sitting in front of a computer, typing on a keyboard. He is looking at the screen in concentration.
Aesthetic Score : 0.6
Mood : focused, serious, intense
Quality
Entropy : 6.00
Noise : 58
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some slight noise and grain, especially in the darker areas. Some of the colors also appear slightly overexposed.
Sunset Romance on the Beach
A couple strolls hand-in-hand along a sandy beach as the sun sets, casting a warm glow on their silhouettes. The woman’s flowing blue dress and the man’s casual attire create a picture of effortless romance, while the gentle waves and long shadows add a sense of tranquility and intimacy to the scene.
Prompt
poses profile: Romantic, peaceful, serene ; A couple holding hands, walking along a beach at sunset; medium shot; Tourism; A golden beach with turquoise waters and a vibrant sky; cinematic
Characteristic
Shot : A couple walks hand-in-hand along a sandy beach at sunset.
Aesthetic Score : 0.7
Mood : romantic, peaceful, serene
Quality
Entropy : 6.64
Noise : 62
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.465, which is also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflects it.
- Aesthetic Analysis: The model scored 0.02, which is considered very good. This means that the generated image closely matched the expected aesthetic style described in the prompt.
Overall, the model seems to be better at capturing the desired aesthetic than understanding the scene and camera position. This suggests that the model might need further training to improve its ability to interpret and translate prompts into accurate visual representations.