AI's Artistic Struggle: Capturing the Essence of Poses with Flux-dev
- 9 minutes read - 1752 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate realistic and aesthetically pleasing images is a constant pursuit. One area where AI models often face challenges is in capturing the essence of poses, particularly those that convey emotion, action, or a specific aesthetic style. This blog post delves into the results of an experiment where an AI model was tasked with generating images based on detailed scene descriptions, focusing on the model’s ability to understand and translate poses into visual form.
Created with: flux-dev
Silhouetted Against the Sunset: A Moment of Solitude and Inspiration
A lone figure stands on a mountain peak, silhouetted against a breathtaking sunset. The vibrant sky and vast landscape create a sense of awe and wonder, while the solitude of the figure invites introspection and hope.
Prompt
poses high-angle: epic, triumphant ; A lone figure standing on a mountain peak, silhouetted against the setting sun; wide shot; heroism; vast, rugged mountain range; cinematic
Characteristic
Shot : A lone figure stands on a mountaintop, silhouetted against a vibrant orange sunset.
Aesthetic Score : 0.7
Mood : serene, hopeful, inspiring
Quality
Entropy : 6.34
Noise : 55
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has some slight artifacts, particularly in the sky. The figure is also a bit blurry.
Tranquil Forest Hike Bathed in Sunlight
Three friends embark on a peaceful adventure through a lush forest, sunlight filtering through the trees creating a sense of mystery and depth. The scene evokes a tranquil and adventurous mood.
Prompt
poses high-angle: adventurous, suspenseful ; A group of explorers navigating a dense jungle, their path illuminated by the sun filtering through the canopy; medium shot; adventure; lush, green jungle; cinematic
Characteristic
Shot : Three people walking on a path in a forest with sunlight shining through the trees
Aesthetic Score : 0.6
Mood : tranquil, peaceful, adventurous
Quality
Entropy : 6.58
Noise : 119
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is a bit blurry, and the figures are slightly pixelated.
Lost in the Glow: A Gamer’s Focus Under Dimly Lit Lights
A silhouette emerges from the shadows, their hands gripping a gamepad as they navigate a vibrant cityscape on the TV screen. The dimly lit room adds a sense of drama and mystery, highlighting the intensity of their focus and the playful nature of their pursuit.
Prompt
poses high-angle: intense, focused ; A gamer’s hands manipulating a controller, the screen displaying a vibrant, futuristic cityscape; close-up; gaming; a dimly lit room with gaming peripherals; cinematic
Characteristic
Shot : A person is playing video games. The scene is lit by the glow of the monitor and the headphones.
Aesthetic Score : 0.6
Mood : focused, intense, playful
Quality
Entropy : 6.44
Noise : 66
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible image errors.
Sun-Drenched Cityscape: A Vibrant Urban Scene
A wide shot captures the bustling energy of a city street, bathed in warm sunlight. The perspective from above emphasizes the scale and grandeur of the buildings, creating a lively and inviting atmosphere.
Prompt
poses high-angle: lively, energetic ; A bustling city square filled with tourists, capturing the iconic landmarks and vibrant street life; wide shot; tourism; a vibrant, bustling city with historical architecture; cinematic
Characteristic
Shot : A wide shot of a bustling city street lined with buildings. There are many people walking about, and the sun is shining brightly. The buildings are mostly tall and narrow, with some having ornate details.
Aesthetic Score : 0.7
Mood : lively, urban, bright
Quality
Entropy : 6.89
Noise : 105
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors
Silhouetted Against the Setting Sun: A Moment of Contemplation in the Desert
A lone figure stands in a vast desert landscape, their back to the camera as the sun dips below the horizon. The silhouette against the fiery sky evokes a sense of tranquility and contemplation, hinting at a journey of self-discovery and hope.
Prompt
poses high-angle: reflective, contemplative ; A lone traveler gazing out at a vast desert landscape, the setting sun casting long shadows; medium shot; travel; a vast, desolate desert with sand dunes; cinematic
Characteristic
Shot : A solitary figure stands on a sand dune, facing the sun which is setting behind him. The landscape is a vast expanse of sand, with rolling dunes in the background.
Aesthetic Score : 0.7
Mood : serene, contemplative, hopeful
Quality
Entropy : 6.32
Noise : 47
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Campfire Tales Under a Starry Sky
A group of young men share stories and laughter around a crackling campfire, the warm glow illuminating their faces against the backdrop of a vast, star-filled night. The scene evokes a sense of relaxed camaraderie and nostalgic memories.
Prompt
poses high-angle: warm, intimate ; A group of friends gathered around a campfire, sharing stories and laughter under a starry night sky; medium shot; groups; a serene campsite with a campfire and a starry sky; cinematic
Characteristic
Shot : A group of young men sitting around a campfire in the woods at night.
Aesthetic Score : 0.7
Mood : cozy, relaxed, friendly
Quality
Entropy : 6.48
Noise : 64
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight blurriness in some areas.
Silhouetted Hero: A Lone Figure Commands the Cityscape
A solitary figure, cloaked in red, stands defiant atop a towering building. Their cape billows in the wind, creating a dramatic silhouette against the hazy cityscape. The scene evokes a sense of power, epic scale, and undeniable drama.
Prompt
poses high-angle: powerful, awe-inspiring ; A superhero soaring through the air, the city sprawling beneath them; wide shot; heroism; a sprawling cityscape with towering buildings; cinematic
Characteristic
Shot : A superhero, wearing a red cape, stands atop a tall building overlooking a city skyline with a hazy morning atmosphere.
Aesthetic Score : 0.7
Mood : dramatic, heroic, hopeful
Quality
Entropy : 6.43
Noise : 90
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some slight artifacts are present around the edges of the cape and the buildings. The overall sharpness is good, but some details could be sharper.
Rappel into the Mist: A Thrilling Descent into the Unknown
Four adventurers brave a steep canyon wall, disappearing into a veil of mist. The dramatic scene evokes a sense of awe and danger, highlighting the sheer scale of the canyon.
Prompt
poses high-angle: thrilling, dangerous ; A group of adventurers rappelling down a steep cliff face, their ropes dangling against the rock; medium shot; adventure; a dramatic cliff face with a breathtaking view; cinematic
Characteristic
Shot : Four climbers are rappelling down a steep rock face with a hazy background.
Aesthetic Score : 0.7
Mood : dramatic, adventurous, suspenseful
Quality
Entropy : 6.64
Noise : 97
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant errors.
Lost in the Glow: A Moment of Focus in the Digital Realm
A solitary figure, bathed in the soft light of their computer screen, is completely absorbed in their task. Headphones on, eyes fixed on the display, they embody the essence of concentration and dedication in a world driven by technology. The dimly lit room adds an air of mystery, leaving us to wonder what captivating digital world they’ve entered.
Prompt
poses high-angle: immersive, captivating ; A gamer’s face illuminated by the screen, their eyes focused on the intense action unfolding in the virtual world; close-up; gaming; a dimly lit room with a gaming setup; cinematic
Characteristic
Shot : A young person in a dimly lit room, wearing headphones and looking intently at a computer screen. The room is filled with blue light from the screen and the image has a dark and moody vibe.
Aesthetic Score : 0.6
Mood : dark, focused, intense
Quality
Entropy : 6.05
Noise : 51
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight noise in the background. The red light reflection in the computer monitor is also an error.
Silhouettes of Adventure: A Sunrise Hike on the Mountain Ridge
Four hikers stand on a mountain ridge, their figures outlined against a breathtaking sunrise. The dramatic play of light and shadow creates a sense of mystery and wonder, capturing the serene, hopeful, and adventurous spirit of the moment.
Prompt
poses high-angle: inspiring, hopeful ; A group of travelers standing on a mountaintop, their faces lit by the sunrise, gazing out at the breathtaking panorama; medium shot; travel; a majestic mountain range with a panoramic view; cinematic
Characteristic
Shot : A group of four hikers standing on a mountain ridge silhouetted against a breathtaking sunrise over a vast, misty landscape.
Aesthetic Score : 0.7
Mood : tranquil, hopeful, adventurous
Quality
Entropy : 6.70
Noise : 52
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.35, which is below the “good” range of 0.5 to 0.75. This indicates that the model didn’t perfectly capture the intended camera positions described in the prompt.
- Shot Analysis: The model scored 0.47, also below the “good” range. This suggests that while the model understood the scene to some degree, it didn’t fully translate the prompt’s description into the final image.
- Aesthetic Analysis: The model scored 0.33, which is significantly lower than the “very good” range of -0.2 to 0.1. This means the generated image’s aesthetic deviated considerably from the expected aesthetic described in the prompt.
Overall, the model demonstrated some understanding of the scene and camera positions, but its ability to capture the desired aesthetic was lacking.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://fal.ai/models/fal-ai/flux/dev/api