AI's Artistic Journey: Capturing Poses, But Missing the Shot with Dall-e-3
- 10 minutes read - 2061 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on textual descriptions is a fascinating area of exploration. This blog post delves into the results of an experiment where an AI model was tasked with creating images based on detailed scene descriptions, including camera position, shot type, and aesthetic style. While the model demonstrated impressive capabilities in capturing the desired aesthetic, it struggled with accurately translating camera position and shot composition. This analysis sheds light on the current state of AI image generation and highlights the areas where further development is needed.
Created with: dall-e-3
A Lone Hiker Contemplates the Majesty of the Mountains
An epic scene unfolds as a solitary hiker stands on a rocky peak, gazing out at a breathtaking panorama of snow-capped mountains and swirling clouds. The dramatic isolation of the figure against the vastness of nature evokes a sense of awe and wonder.
Prompt
poses face-to-face: Determined, awe-inspiring ; A lone adventurer, standing on a mountain peak; wide shot; Adventure; Majestic mountain range with clouds swirling around; cinematic
Characteristic
Shot : A lone hiker stands on a mountain peak overlooking a vast, snow-capped mountain range shrouded in clouds. The hiker is looking out over the landscape with a camera in hand.
Aesthetic Score : 0.8
Mood : serene, majestic, adventurous
Quality
Entropy : 6.76
Noise : 112
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly over-saturated, giving it a slightly artificial look. The clouds have a bit of an unnatural texture. The textures in the clouds and on the mountains are very smooth, as if they’ve been heavily post-processed. The hiker’s figure has some slight blurring that is not realistic.
Silhouettes in the Shadowy Forest
A group of figures stand shrouded in the darkness of a dense forest, their forms barely visible against the backdrop of a distant, ethereal light. The scene evokes a sense of mystery and tension, hinting at a fantastical or sci-fi setting.
Prompt
poses face-to-face: Suspenseful, mysterious ; A group of friends, huddled together in a dark forest; medium shot; Adventure; Tall trees casting long shadows, sunlight filtering through the leaves; cinematic
Characteristic
Shot : A group of people stand in a forest, silhouetted against a bright light in the distance. A vehicle is parked in the left foreground.
Aesthetic Score : 0.7
Mood : mysterious, suspenseful, foreboding
Quality
Entropy : 6.39
Noise : 106
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image is slightly overexposed, and the light in the distance is too bright. The silhouettes of the people are not very detailed, and the vehicle in the foreground is a bit blurry.
Man vs. Monster: A Roman Soldier Faces His Fate
A weathered Roman soldier, helmet askew, stares in awe and fear at a fire-breathing dragon. The image captures the raw intensity of the moment, highlighting the contrast between human vulnerability and the mythical power of the beast. This dramatic scene evokes a sense of tension and suspense, leaving the viewer breathless.
Prompt
poses face-to-face: Brave, intense ; A seasoned warrior, facing down a fearsome dragon; close-up; Heroism; Fiery dragon with glowing eyes, smoke billowing around; cinematic
Characteristic
Shot : A Roman soldier stands facing a fiery dragon, the dragon is large and ominous, the soldier is small and vulnerable, the scene is dark and dramatic
Aesthetic Score : 0.6
Mood : epic, suspenseful, dramatic
Quality
Entropy : 6.76
Noise : 105
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image has a few errors, the dragon’s scales are too smooth, and the fire is not very realistic
Lost in the Neon Labyrinth: A Glimpse into a Mysterious Future
A young man’s face, illuminated by the vibrant glow of a futuristic cityscape, reveals a captivating blend of intensity and intrigue. The close-up shot and dramatic lighting heighten the sense of mystery, drawing you into a world where the future holds both promise and unknown dangers.
Prompt
poses face-to-face: Focused, determined ; A young gamer, staring intently at a computer screen; close-up; Gaming; Vibrant, futuristic cityscape reflected in the screen; cinematic
Characteristic
Shot : A young man is staring intently at a computer screen, showing a futuristic cityscape with neon lights and a mysterious figure walking down the street.
Aesthetic Score : 0.7
Mood : intense, futuristic, cyberpunk
Quality
Entropy : 6.46
Noise : 91
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image is slightly blurry and the colors are a bit oversaturated. The cityscape appears somewhat generic and repetitive.
Love in the City of Light: A Timeless Moment in Paris
Experience the magic of a young couple’s romantic rendezvous in front of the iconic Eiffel Tower. The man, with camera in hand, gazes lovingly at his partner as she returns his affection with a radiant smile. Bathed in the warm sunlight and soft clouds, this nostalgic scene encapsulates the dreamy essence of Paris, the city of love.
Prompt
poses face-to-face: Romantic, nostalgic ; A couple, gazing at each other in front of the Eiffel Tower; medium shot; Tourism; Romantic Parisian cityscape with the Eiffel Tower in the background; cinematic
Characteristic
Shot : A young couple standing in front of the Eiffel Tower in Paris. The man is holding a camera and looking at the woman. The woman is looking back at the man with a smile on her face.
Aesthetic Score : 0.8
Mood : romantic, nostalgic, hopeful
Quality
Entropy : 6.93
Noise : 100
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight graininess in the image. No other major errors observed.
Lost in the Labyrinth: A Moment of Intrigue in a Bustling Market
A young woman, her backpack slung over her shoulder, stands amidst the vibrant chaos of an Asian market. Her gaze, direct and unwavering, draws the viewer into her world, hinting at a story waiting to unfold. The bustling background fades into a blur, isolating her in a moment of quiet contemplation, leaving us to wonder what secrets she holds and where her journey will lead.
Prompt
poses face-to-face: Curious, vibrant ; A traveler, standing on a bustling street market; medium shot; Travel; Colorful stalls overflowing with exotic goods, people bustling around; cinematic
Characteristic
Shot : A young woman is walking through a bustling marketplace, her expression is calm and contemplative as she gazes directly at the camera. The marketplace is a vibrant mix of colors, textures, and smells, creating a sense of excitement and energy.
Aesthetic Score : 0.8
Mood : intrigued, vibrant, calm
Quality
Entropy : 6.95
Noise : 103
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to have some minor noise, particularly in the background.
Campfire Tales in the Shadow of the Forest
A group of friends huddle around a crackling campfire, their faces illuminated by the warm glow. The surrounding forest is shrouded in darkness, creating a mysterious and cozy atmosphere. This scene evokes a sense of adventure and the thrill of the unknown.
Prompt
poses face-to-face: Intimate, suspenseful ; A group of explorers, huddled around a campfire; medium shot; Adventure; Dark forest with flickering flames illuminating their faces; cinematic
Characteristic
Shot : A group of friends are gathered around a campfire in a dark, mysterious forest. The fire is casting a warm glow on their faces, and the air is filled with the sound of crackling flames and whispers.
Aesthetic Score : 0.6
Mood : mystical, suspenseful, cozy
Quality
Entropy : 6.40
Noise : 98
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to be artificially generated, with some unnatural textures and lighting.
Tiny Dreamer in a City of Giants
A young girl stands in awe beneath a towering skyscraper, her small figure dwarfed by the futuristic cityscape. The image captures a sense of wonder and the vastness of urban life.
Prompt
poses face-to-face: Awe-inspiring, hopeful ; A young girl, looking up at a towering skyscraper; wide shot; Tourism; Modern cityscape with towering skyscrapers and bustling streets; cinematic
Characteristic
Shot : A young girl stands in a busy city street, looking up at a tall skyscraper. The buildings surrounding her are old and worn, while the skyscraper is sleek and modern. There is a sense of awe and wonder in the scene, as the girl contemplates the vastness of the city around her.
Aesthetic Score : 0.7
Mood : awe, wonder, urban
Quality
Entropy : 6.61
Noise : 107
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some minor blurriness in the background, and the edges of the buildings are a bit jagged.
The Joy of Victory: Friends Celebrate a Gaming Triumph
A warm glow illuminates a table where four friends, headphones on and faces alight with excitement, celebrate a video game victory. The image captures the pure joy and camaraderie of shared gaming experiences.
Prompt
poses face-to-face: Joyful, celebratory ; A group of friends, celebrating a victory in a video game; close-up; Gaming; Brightly lit gaming room with controllers and headsets; cinematic
Characteristic
Shot : A group of four friends are playing a video game. They are all wearing headphones and are excited. There are controllers on the table in front of them. The image is lit from the side, creating a dramatic effect.
Aesthetic Score : 0.7
Mood : excited, joyful, competitive
Quality
Entropy : 6.61
Noise : 105
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly blurry and some of the subjects are not in focus. The lighting is uneven and there is some noise in the image.
Silhouetted Against the Sunset: A Moment of Melancholy on the Beach
A lone figure, backpack in tow, stands on a sandy shore, gazing at the fiery hues of the setting sun. The silhouette against the vibrant sky evokes a sense of contemplation and wanderlust, hinting at a journey both physical and emotional.
Prompt
poses face-to-face: Melancholy, contemplative ; A lone traveler, standing on a deserted beach; wide shot; Travel; Vast ocean stretching out to the horizon, golden sunset; cinematic
Characteristic
Shot : A man with a backpack is standing on a beach at sunset. He is looking towards the horizon. He is holding the backpack strap. The man is in the foreground and the beach and sunset are in the background. The image is framed in a photo.
Aesthetic Score : 0.6
Mood : serene, contemplative, adventurous
Quality
Entropy : 6.44
Noise : 74
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has a slight blur in the background, possibly an artifact, which could be due to lens distortion or camera shake. The subject has an unnatural glow on his face, possibly due to heavy editing or AI processing, which detracts from the image’s realism. The color grading feels over-saturated, especially the sky and sunset, making it appear unnatural.
Conclusion
The results of the analysis show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.3, which is considered below average. This indicates that the generated image didn’t accurately reflect the camera position described in the prompt.
- Shot Analysis: The model scored 0.47, which is also below average. This suggests that the model didn’t fully grasp the intended shot composition from the prompt.
- Aesthetic Analysis: The model scored 0.03, which is considered very good. This means the generated image closely matched the expected aesthetic style.
Overall, the model seems to be better at capturing the desired aesthetic than understanding the scene and camera position. This suggests that the model might need further training to improve its ability to interpret and translate complex visual instructions.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://openai.com/index/dall-e-3/