AI's Artistic Journey: Capturing Poses, But Missing the Mood with Flux-dev
- 9 minutes read - 1778 wordsTable of Contents
In the realm of AI image generation, capturing the essence of a scene goes beyond simply replicating the elements described. It involves understanding the nuances of composition, lighting, and most importantly, the desired aesthetic. This blog post examines the performance of a generative AI model in creating images based on prompts that include specific poses and aesthetics. While the model demonstrates proficiency in understanding camera positions and shot types, it falls short in capturing the intended aesthetic. We delve into the results, analyzing the model’s strengths and weaknesses, and discuss the implications for future AI image generation.
Created with: flux-dev
Silhouetted Warrior at Sunset’s Edge
A lone figure, possibly a warrior, stands with their back to the viewer, silhouetted against a large, glowing sun. The figure holds two swords, creating a sense of tension and anticipation against the backdrop of a desolate landscape. The image evokes a mood of epic mystery and contemplation.
Prompt
poses staggered-pose: Epic, determined ; A lone warrior; wide shot; Heroism; A desolate battlefield with a setting sun; cinematic
Characteristic
Shot : A lone figure in silhouette, facing away from the viewer, stands on a barren landscape with a large, setting sun behind them. They hold two swords, one in each hand.
Aesthetic Score : 0.7
Mood : dramatic, solitary, contemplative
Quality
Entropy : 6.57
Noise : 53
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to be rendered in a realistic style, with no visible errors or artifacts.
Unveiling the Secrets of the Ancient Temple
A group of adventurers embark on a journey through a misty forest, their destination: a grand, ancient temple shrouded in mystery. The scene evokes a sense of wonder and anticipation, promising a thrilling exploration of the unknown.
Prompt
poses staggered-pose: Curious, adventurous ; A group of explorers; medium shot; Adventure; A dense jungle with ancient ruins in the background; cinematic
Characteristic
Shot : A group of people are walking through a forest, with a large stone structure in the background. The trees are tall and thick, and the air is hazy. The people are dressed in casual clothing and are carrying backpacks.
Aesthetic Score : 0.6
Mood : mysterious, atmospheric, contemplative
Quality
Entropy : 6.83
Noise : 119
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is a slight blurriness to the image, particularly in the background. The colours are a bit washed out and lack vibrancy.
Lost in the Code: A Hacker’s Focus
A young man, bathed in the glow of blue and red lights, sits intently at his desk, headphones on, eyes fixed on the computer screen. The dim lighting and his focused expression create an air of mystery and intrigue, hinting at a story unfolding within the digital realm.
Prompt
poses staggered-pose: Focused, intense ; A gamer; close-up; Gaming; A brightly lit gaming setup with a monitor displaying a thrilling game; cinematic
Characteristic
Shot : A young man in a dimly lit room wearing headphones and looking intently at a computer screen.
Aesthetic Score : 0.6
Mood : focused, techy, mysterious
Quality
Entropy : 6.43
Noise : 59
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry, particularly the subject’s face.
Family Adventure on the Mountaintop
A heartwarming image of a family of four standing on a mountain peak, enjoying the breathtaking panoramic view of the valley below. Their smiles and relaxed postures radiate happiness and a sense of adventure, capturing the essence of freedom and peace found in nature.
Prompt
poses staggered-pose: Joyful, relaxed ; A family; medium shot; Tourism; A breathtaking view of a mountain range with a clear blue sky; cinematic
Characteristic
Shot : A family of four is hiking in the mountains. They are all wearing backpacks and looking at the view. The sky is blue and the mountains are green. The picture is taken from a low angle, looking up at the family.
Aesthetic Score : 0.6
Mood : serene, adventurous, happy
Quality
Entropy : 6.65
Noise : 60
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors
Embracing the Mountain Majesty
A lone hiker stands on a winding mountain road, arms outstretched, capturing the breathtaking panorama. The wide-angle shot evokes a sense of freedom and adventure, reflecting a serene and contemplative mood.
Prompt
poses staggered-pose: Free-spirited, adventurous ; A backpacker; long shot; Travel; A winding road leading to a distant village nestled in a valley; cinematic
Characteristic
Shot : A man with a backpack stands on a mountain road with his arms outstretched, facing a view of distant mountains.
Aesthetic Score : 0.7
Mood : inspirational, adventurous, hopeful
Quality
Entropy : 6.66
Noise : 74
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : None
Silhouettes of Joy: Dancing Under the Neon Glow
Capture the energy of a vibrant party with this image. Backlighting creates dramatic silhouettes of dancers against a backdrop of red and pink lights, evoking a fun, festive, and energetic mood.
Prompt
poses staggered-pose: Energetic, celebratory ; A group of friends; medium shot; Groups; A lively party scene with people dancing and laughing; cinematic
Characteristic
Shot : A group of young people are dancing at a party or club, lit by red and pink lights. The focus is on a woman in the center who is facing the camera with her arms raised.
Aesthetic Score : 0.6
Mood : energetic, playful, fun
Quality
Entropy : 6.23
Noise : 58
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is some noise in the image, particularly in the shadows, which could indicate that the photo was taken in low light or with a high ISO setting.
Superman Stands Tall, Ready to Face the Challenge
A dramatic image captures Superman, clad in his iconic suit, gazing upwards with a determined expression. The cityscape behind him fades into the bright, cloudy sky, emphasizing his heroic stance and the gravity of the moment.
Prompt
poses staggered-pose: Powerful, confident ; A superhero; close-up; Heroism; A cityscape with towering skyscrapers and a dramatic sky; cinematic
Characteristic
Shot : A man dressed as Superman stands in a cityscape with a cape billowing behind him.
Aesthetic Score : 0.7
Mood : heroic, powerful, dramatic
Quality
Entropy : 6.22
Noise : 66
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some minor artifacts, particularly in the shadows and around the subject’s edges. There is a slight halo effect around the subject. The image is also slightly oversharpened, which gives the image a slightly artificial look.
Silhouettes of Hope in the Desert Sunrise
Five figures stand in a line, their backs to the camera, silhouetted against a breathtaking golden sunrise in the desert. The scene evokes a sense of peace, mystery, and hope, with the flowing clothing and dramatic lighting adding to the captivating atmosphere.
Prompt
poses staggered-pose: Hopeful, determined ; A group of adventurers; wide shot; Adventure; A vast desert landscape with a lone oasis in the distance; cinematic
Characteristic
Shot : Five people stand in silhouette against a desert landscape with a sunset in the background.
Aesthetic Score : 0.7
Mood : mysterious, contemplative, hopeful
Quality
Entropy : 6.40
Noise : 59
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly blurry and the colors are a bit muted.
Lost in the Game: A Moment of Intense Focus
A solitary figure, shrouded in shadow, is completely absorbed in a video game. The glow of the computer screen illuminates their focused face, highlighting the intensity of their concentration. The scene evokes a sense of isolation and immersion, capturing the captivating power of gaming.
Prompt
poses staggered-pose: Focused, strategic ; A gamer; close-up; Gaming; A dimly lit room with a computer screen displaying a complex strategy game; cinematic
Characteristic
Shot : A person wearing a headset is sitting in front of a computer screen. The screen is displaying a game or program. The room is lit with a soft blue and purple light.
Aesthetic Score : 0.4
Mood : focused, serious, intense
Quality
Entropy : 5.90
Noise : 60
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly grainy and the edges of the screen are not perfectly sharp. There is a little bit of chromatic aberration around the edges of the screen.
Silhouettes of Love at Sunset
A couple stands hand-in-hand, their silhouettes painted against a breathtaking sunset on a serene beach. The scene evokes a sense of intimacy, romance, and tender affection.
Prompt
poses staggered-pose: Romantic, peaceful ; A couple; medium shot; Travel; A romantic sunset over a beach with the ocean waves crashing in the background; cinematic
Characteristic
Shot : A couple silhouetted against a sunset on a beach, kissing
Aesthetic Score : 0.7
Mood : romantic, dreamy, hopeful
Quality
Entropy : 6.80
Noise : 55
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.51, which falls within the “good” range (0.5 to 0.75). This means the model was able to accurately capture the camera position described in the prompt.
- Shot Analysis: The model scored 0.6, also within the “good” range. This indicates the model understood the scene described in the prompt and created an image that reflects that understanding.
- Aesthetic Analysis: The model scored 0.12, which is significantly lower than the “very good” range (-0.2 to 0.1). This suggests that the generated image’s aesthetic deviated from the expected aesthetic described in the prompt.
Overall, the model demonstrates a good understanding of camera position and shot composition, but needs improvement in capturing the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://fal.ai/models/fal-ai/flux/dev/api