AI's Artistic Struggle: Capturing the Essence of a Scene with Midjourney
- 9 minutes read - 1799 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunning visuals from text prompts. However, achieving the perfect balance between technical accuracy and artistic expression remains a challenge. This blog post examines the results of an experiment that tested the ability of a generative AI model to translate textual descriptions into visually compelling images. While the model demonstrated a strong understanding of camera positions and shot composition, it struggled to capture the desired aesthetic, highlighting the ongoing quest for AI to truly understand and replicate human artistic vision.
Created with: midjourney
Silhouetted Warrior at Sunset’s Edge
A lone figure, cloaked in tradition and wielding a sword, stands against the fiery backdrop of a setting sun. Smoke swirls in the distance, hinting at a story of conflict and loss. This dramatic silhouette evokes a sense of mystery, epic grandeur, and melancholic beauty.
Prompt
staggered-pose staggered-pose: Epic, determined ; A lone warrior; wide shot; Heroism; A desolate battlefield with a setting sun; cinematic
Characteristic
Shot : A lone warrior stands with his sword, silhouetted against a setting sun.
Aesthetic Score : 0.7
Mood : epic, dramatic, melancholic
Quality
Entropy : 6.30
Noise : 106
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible errors in the image.
Unveiling the Secrets of the Mist-Shrouded Temple
Four intrepid explorers ascend a moss-covered stairway towards a colossal ancient temple, its secrets hidden within the swirling mist. The scene evokes a sense of mystery, adventure, and awe, with dramatic lighting and shadows adding to the intrigue.
Prompt
staggered-pose staggered-pose: Curious, adventurous ; A group of explorers; medium shot; Adventure; A dense jungle with ancient ruins in the background; cinematic
Characteristic
Shot : A group of four adventurers are standing on a stone staircase leading to a large, ancient temple that is half-hidden in a lush jungle, with fog and sunlight filtering through the trees.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, eerie
Quality
Entropy : 6.53
Noise : 129
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some of the textures in the temple and the jungle foliage appear slightly blurry or repetitive.
Immersed in the Game: A Gamer’s Focused Intensity
A young man, bathed in pink and blue light, sits in his gaming chair, headphones on, eyes locked on the vibrant action unfolding on his curved monitor. The dramatic lighting and composition capture the intensity and focus of a gamer fully immersed in their digital world.
Prompt
staggered-pose staggered-pose: Focused, intense ; A gamer; close-up; Gaming; A brightly lit gaming setup with a monitor displaying a thrilling game; cinematic
Characteristic
Shot : A young man is sitting in a gaming chair, wearing headphones, and playing a video game on a computer. The room is lit with colorful LED lights.
Aesthetic Score : 0.6
Mood : intense, focused, futuristic
Quality
Entropy : 6.05
Noise : 97
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some blurriness around the edges of the screen. The image is slightly underexposed.
Silhouettes of Hope: A Father and Sons Embrace the Vastness of Nature
A serene sunset paints the sky as a father and his two sons stand on a mountain ridge, their silhouettes stark against the expansive valley below. The scene evokes a sense of awe and wonder, highlighting the tranquility and hopefulness of the moment.
Prompt
staggered-pose staggered-pose: Joyful, relaxed ; A family; medium shot; Tourism; A breathtaking view of a mountain range with a clear blue sky; cinematic
Characteristic
Shot : A father and his two sons are standing on a mountaintop, looking out at a breathtaking vista of distant, snow-capped peaks. The scene is bathed in the soft, warm light of the setting sun.
Aesthetic Score : 0.7
Mood : tranquil, serene, contemplative
Quality
Entropy : 6.24
Noise : 80
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable artifacts or errors in the image.
A Solitary Journey into the Mountains
A woman, dwarfed by the vastness of the landscape, walks a winding road towards the mountains as the sun sets, casting long shadows across the green grass. The scene evokes a sense of tranquility, serenity, and adventure.
Prompt
staggered-pose staggered-pose: Free-spirited, adventurous ; A backpacker; long shot; Travel; A winding road leading to a distant village nestled in a valley; cinematic
Characteristic
Shot : A lone hiker, a woman with long red hair, walks down a paved road in a valley surrounded by green hills and mountains. The sun is shining and the sky is blue. There are some houses and other structures in the distance.
Aesthetic Score : 0.8
Mood : tranquil, adventurous, hopeful
Quality
Entropy : 6.80
Noise : 100
Prompt Clip Score : 0.18
AI Evaluation
Likelihood of AI : 0.30
Image errors : No significant errors, just a slight blur in the background.
Black and White Joy: Capturing the Energy of a Party
A group of friends dance the night away, their laughter and joy palpable even in black and white. The dramatic effect of the monochrome palette highlights the movement and emotion of the scene, creating a timeless and captivating image.
Prompt
staggered-pose staggered-pose: Energetic, celebratory ; A group of friends; medium shot; Groups; A lively party scene with people dancing and laughing; cinematic
Characteristic
Shot : A group of young people are dancing at a party, one girl is in the foreground, she is smiling, her hair is flowing in the air.
Aesthetic Score : 0.7
Mood : joyful, lively, carefree
Quality
Entropy : 6.41
Noise : 102
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor artifacts in the image, but overall the image quality is good. Some details are slightly blurry
A Hero Stands Tall, Hopeful Against the Sunset
A lone superhero, bathed in the golden light of a New York City sunset, gazes out at the sprawling cityscape. Their pose, powerful and confident, evokes a sense of anticipation and hope, promising a dramatic and epic story to unfold.
Prompt
staggered-pose staggered-pose: Powerful, confident ; A superhero; close-up; Heroism; A cityscape with towering skyscrapers and a dramatic sky; cinematic
Characteristic
Shot : A superhero stands on a skyscraper rooftop, overlooking a city skyline at sunset. The cityscape is full of tall buildings and the sky is filled with dramatic clouds.
Aesthetic Score : 0.7
Mood : epic, heroic, dramatic
Quality
Entropy : 6.44
Noise : 103
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor artifacts, particularly around the edges of the buildings and the clouds.
Silhouetted Figures on a Mountaintop, Contemplating the Vast Desert
Three figures stand on a rocky mountainside, their silhouettes stark against the bright blue sky and fluffy clouds. The vast desert landscape stretches out before them, with a shimmering lake in the distance. The scene evokes a sense of epic adventure and desolate beauty, emphasizing the grandeur and scale of the natural world.
Prompt
staggered-pose staggered-pose: Hopeful, determined ; A group of adventurers; wide shot; Adventure; A vast desert landscape with a lone oasis in the distance; cinematic
Characteristic
Shot : Three figures stand on a rocky cliff overlooking a vast desert landscape with a distant mountain range and a small lake in the foreground.
Aesthetic Score : 0.7
Mood : epic, desolate, adventurous
Quality
Entropy : 6.73
Noise : 112
Prompt Clip Score : 0.18
AI Evaluation
Likelihood of AI : 0.70
Image errors : Slight blurriness in the distance and some rough edges in the figures.
Lost in the Game: A Moment of Intense Focus
A solitary figure hunches over a computer screen in a dimly lit room, completely absorbed in a game. The low lighting and his focused expression create a palpable sense of immersion and intensity, capturing the essence of a gamer lost in the digital world.
Prompt
staggered-pose staggered-pose: Focused, strategic ; A gamer; close-up; Gaming; A dimly lit room with a computer screen displaying a complex strategy game; cinematic
Characteristic
Shot : A man sits in a darkened room, looking intently at a computer screen. The screen shows a video game with a galaxy-like scene. A dimly lit keyboard sits in front of him.
Aesthetic Score : 0.4
Mood : focused, intense, mysterious
Quality
Entropy : 5.91
Noise : 87
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some noise, and the lighting is not ideal. The background is very dark and there’s a slight color shift in the top of the screen.
Silhouettes of Love at Sunset
A romantic and nostalgic scene of a couple standing in the surf at sunset, their silhouettes framed against the warm glow of the sky. The dramatic effect of the silhouettes and the serene mood create a timeless and beautiful image.
Prompt
staggered-pose staggered-pose: Romantic, peaceful ; A couple; medium shot; Travel; A romantic sunset over a beach with the ocean waves crashing in the background; cinematic
Characteristic
Shot : A couple silhouetted against a dramatic sunset on the beach, waves gently lapping at their feet.
Aesthetic Score : 0.8
Mood : romantic, dreamy, peaceful
Quality
Entropy : 6.57
Noise : 108
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : The sky has a slight color banding issue.
Conclusion
The results show that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic aspect of the image.
Here’s a breakdown:
- Camera Position: The model scored 0.46, which is slightly below the “good” range of 0.5 to 0.75. This suggests that the model’s ability to interpret and implement camera positions in the generated image is decent, but could be improved.
- Shot Analysis: The model scored 0.6, falling within the “good” range. This indicates that the model was able to understand the scene described in the prompt and translate it into a visually coherent shot.
- Aesthetic Analysis: The model scored 0.09, which is significantly higher than the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated significantly from the expected aesthetic based on the prompt.
Overall, the model demonstrates a good understanding of camera positions and shot composition, but needs improvement in generating images that align with the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://midjourney.com