AI Struggles to Capture the Essence of Poses with Midjourney
- 9 minutes read - 1757 wordsTable of Contents
In the realm of artificial intelligence, generative models are revolutionizing the way we create images. These models can generate stunning visuals based on text prompts, offering endless possibilities for artists, designers, and content creators. However, capturing the nuances of human poses and translating them into visually compelling images remains a challenge. This blog post explores the results of a recent experiment where a generative AI model was tasked with creating images based on specific poses and scene descriptions. The results reveal both promising progress and areas where the model needs improvement, particularly in understanding and translating the desired aesthetic.
Created with: midjourney
A Lone Figure Contemplates the Storm
A solitary man in medieval garb stands on a rocky precipice, his gaze fixed on a turbulent sky. The dramatic lighting and his contemplative posture evoke a sense of mystery and isolation.
Prompt
classic-headshot classic-headshot: determined, confident ; A lone adventurer, standing on a mountain peak; close-up; heroism; dramatic sky with clouds; cinematic
Characteristic
Shot : A lone man, dressed in medieval garb, stands on a rocky outcrop, looking up at the cloudy sky.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, solitary
Quality
Entropy : 6.66
Noise : 88
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, and there is some noise in the shadows. The subject’s left boot appears to be slightly blurred, potentially due to motion.
Lost at Sea: A Pirate’s Dramatic Quest
A weathered pirate, his face etched with determination, holds a compass against the backdrop of a raging storm. A lone ship sails on the horizon, adding to the sense of mystery and adventure. This dramatic scene evokes a feeling of intrigue and anticipation, leaving the viewer wondering what secrets lie ahead.
Prompt
classic-headshot classic-headshot: bold, adventurous ; A pirate captain, holding a compass; medium shot; adventure; stormy sea with a ship in the background; cinematic
Characteristic
Shot : A man dressed as a pirate captain stands in front of a stormy sea holding a compass. A sailing ship is in the background.
Aesthetic Score : 0.7
Mood : dramatic, mysterious, adventurous
Quality
Entropy : 6.52
Noise : 99
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.40
Image errors : The image appears to be somewhat grainy. There are some minor artifacts in the background.
Neon Intensity: Gamer’s Focus in a Futuristic World
A young man, eyes locked on the camera, is immersed in a video game. Neon lights illuminate the scene, creating a dramatic contrast and a futuristic atmosphere. His intense focus and the vibrant colors capture the thrill of the game.
Prompt
classic-headshot classic-headshot: focused, intense ; A gamer, holding a controller; close-up; gaming; neon lights and a gaming setup in the background; cinematic
Characteristic
Shot : A young man with curly hair is wearing headphones and playing a video game. He is lit by neon pink and blue lights. The focus is on his face and the game controller.
Aesthetic Score : 0.7
Mood : intense, focused, futuristic
Quality
Entropy : 6.24
Noise : 78
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some slight artifacts around the edges, but they are not very noticeable. The focus on the subject is excellent, and the color balance is good.
City Smiles: Capturing Joy in the Urban Landscape
A close-up portrait of a man radiating happiness, his smile the focal point against a backdrop of city buildings and a partly cloudy sky. The shallow depth of field draws your eye to his infectious joy, creating a mood of positivity and lightheartedness.
Prompt
classic-headshot classic-headshot: happy, excited ; A tourist, smiling in front of a famous landmark; medium shot; tourism; bustling city street; cinematic
Characteristic
Shot : A portrait of a young man smiling brightly, wearing glasses and a dark jacket, in an urban environment with blurred buildings in the background
Aesthetic Score : 0.8
Mood : happy, joyful, friendly
Quality
Entropy : 6.11
Noise : 83
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors
Lost in Thought: A Moment of Melancholy on the Train
A young woman gazes out the window of a moving train, her expression a mix of wistfulness and contemplation. The blurred landscape outside adds to the sense of mystery and intrigue, leaving the viewer wondering about her thoughts and destination.
Prompt
classic-headshot classic-headshot: reflective, contemplative ; A traveler, looking out of a train window; close-up; travel; scenic landscape passing by; cinematic
Characteristic
Shot : A young woman is looking out the window of a train, with a blurred landscape of green hills and a cloudy sky in the reflection.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, introspective
Quality
Entropy : 5.81
Noise : 82
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some slight blurriness, particularly in the reflection in the window.
Laughter in the Sunlight: Friends Share a Joyful Moment
Three young women bask in the warmth of friendship, their laughter echoing through the sun-dappled trees. The image captures a genuine moment of joy and connection, radiating a carefree and playful mood.
Prompt
classic-headshot classic-headshot: joyful, carefree ; A group of friends, laughing together; medium shot; groups; vibrant outdoor setting; cinematic
Characteristic
Shot : Three young women are laughing together outdoors. They are in a natural setting, likely a park or garden, with greenery and sunlight in the background.
Aesthetic Score : 0.8
Mood : joyful, carefree, happy
Quality
Entropy : 6.56
Noise : 92
Prompt Clip Score : 0.14
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable artifacts or errors in the image.
Heroic Stance Amidst the Flames
A superhero, clad in striking red and black, stands defiant against a fiery explosion, their stoic expression highlighting the dramatic intensity of the scene. The contrast between the hero and the fiery backdrop creates a powerful visual, capturing the essence of their heroic struggle.
Prompt
classic-headshot classic-headshot: brave, heroic ; A superhero, standing in front of a burning building; close-up; heroism; city skyline with smoke and flames; cinematic
Characteristic
Shot : A superhero in a red and black costume stands in front of a burning building. The city skyline is visible in the background.
Aesthetic Score : 0.6
Mood : dramatic, intense, hopeful
Quality
Entropy : 6.68
Noise : 88
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to be somewhat blurry, and the fire seems slightly artificial.
Lost in the Jungle: A Determined Explorer’s Quest
An intrepid explorer, clad in rugged gear, stands amidst the lush greenery of a dense jungle, his gaze fixed on a weathered map. The ruins in the background hint at a forgotten past, while the dramatic lighting and composition create an atmosphere of mystery and intrigue. This image captures the adventurous spirit and unwavering determination of those who seek to uncover the secrets of the unknown.
Prompt
classic-headshot classic-headshot: curious, adventurous ; An explorer, holding a map; medium shot; adventure; dense jungle with ancient ruins in the background; cinematic
Characteristic
Shot : A man in a jungle, looking at a map. He is in front of a stone structure and has a pistol on his hip.
Aesthetic Score : 0.7
Mood : adventurous, mysterious, focused
Quality
Entropy : 6.56
Noise : 116
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.70
Image errors : There are some minor artifacts in the image, particularly in the background.
Lost in the Virtual World: A Moment of Pure Excitement
This image captures the raw emotion of virtual reality immersion. The man’s wide-eyed surprise and open mouth tell a story of wonder and excitement, while the blurry background hints at a vibrant, immersive world beyond the screen. The dramatic lighting and smoky atmosphere add to the sense of intensity and wonder, drawing the viewer into the moment.
Prompt
classic-headshot classic-headshot: immersed, excited ; A gamer, wearing VR headset; close-up; gaming; futuristic virtual reality environment; cinematic
Characteristic
Shot : A man wearing a VR headset and headphones is looking up in excitement with his mouth wide open.
Aesthetic Score : 0.6
Mood : excited, futuristic, vibrant
Quality
Entropy : 6.58
Noise : 86
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors, the lighting is slightly unnatural and the background is a bit generic.
Sunset Smiles: A Family’s Moment of Joy on the Beach
Capture the warmth and happiness of a family’s sunset stroll on the beach. The golden light bathes them in a glow, reflecting their joyful expressions and creating a heartwarming scene.
Prompt
classic-headshot classic-headshot: happy, relaxed ; A family, standing in front of a sunset; medium shot; tourism; beach with golden sand and waves; cinematic
Characteristic
Shot : A family of three standing on a beach at sunset, the parents are holding their young daughter.
Aesthetic Score : 0.7
Mood : happy, loving, family
Quality
Entropy : 6.70
Noise : 92
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position Analysis: The score of 0.4 indicates that the model’s ability to react to camera positions in the prompt is below average. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
- Shot Analysis: The score of 0.45 indicates that the model’s ability to understand the scene in a prompt is below average. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
- Aesthetic Analysis: The score of 1.1102230246251566e-17 is essentially zero, indicating that the model failed to meet the expected aesthetic of the prompt. A score between -0.2 and 0.1 would be considered very good.
Overall: The model needs improvement in its ability to understand and translate camera positions, scene descriptions, and aesthetic expectations from the prompt into the generated image.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://midjourney.com