AI's Artistic Journey: Capturing Poses, But Missing the Mood with Flux-dev
- 9 minutes read - 1796 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on textual descriptions is a rapidly evolving field. This experiment aimed to assess an AI model’s capacity to translate specific poses and scene descriptions into visually compelling images. While the model demonstrated a strong grasp of camera positions and shot analysis, it struggled to capture the intended aesthetic style, revealing a gap in its artistic understanding. This blog post delves into the results, exploring the model’s strengths and weaknesses, and highlighting the ongoing challenges in AI’s artistic journey.
Created with: flux-dev
A Moment of Serenity Amidst the Clouds
A solitary figure stands on a rocky cliff, dwarfed by the endless expanse of white clouds below. The scene evokes a sense of peace and contemplation, with the vastness of the landscape inspiring awe and wonder. The soft blue sky and fluffy clouds create a serene and hopeful mood.
Prompt
poses ankle-cross: Determined, confident, facing the unknown ; A lone adventurer, standing atop a windswept mountain peak; wide shot; Adventure; Dramatic sky with swirling clouds; cinematic
Characteristic
Shot : A lone figure stands on a rocky peak overlooking a sea of clouds. The sky is a soft blue with fluffy white clouds. The figure is silhouetted against the light.
Aesthetic Score : 0.7
Mood : tranquil, contemplative, serene
Quality
Entropy : 6.34
Noise : 69
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Silhouetted Hope: A Lone Figure Against the Setting Sun
A dramatic and inspiring scene unfolds as a lone figure in a cape stands against a vibrant sunset, silhouetted against a distant cityscape. The image evokes a sense of hope and mystery, leaving the viewer to ponder the figure’s journey and the possibilities that lie ahead.
Prompt
poses ankle-cross: Powerful, heroic, standing tall ; A superhero, silhouetted against a blazing sunset; medium shot; Heroism; City skyline with towering buildings; cinematic
Characteristic
Shot : A lone figure in a cape stands silhouetted against a sunset over a city skyline.
Aesthetic Score : 0.7
Mood : epic, heroic, dramatic
Quality
Entropy : 6.61
Noise : 36
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors.
Lost in the Neon Glow: A Moment of Contemplation in the Digital Age
A young person, enveloped in the embrace of a VR headset, sits alone in a dimly lit room bathed in vibrant neon light. The scene evokes a sense of futuristic isolation, leaving the viewer to ponder the mysteries unfolding within the digital realm.
Prompt
poses ankle-cross: Immersed, concentrated, in the zone ; A gamer, intensely focused on a virtual reality headset; close-up; Gaming; Futuristic, neon-lit gaming room; cinematic
Characteristic
Shot : A young person wearing a VR headset is sitting in a room with a pink and blue neon light. There is an image of a figure in the background, which is lit with blue neon.
Aesthetic Score : 0.6
Mood : futuristic, dreamy, contemplative
Quality
Entropy : 6.63
Noise : 62
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors in the image.
A Solitary Figure Contemplates the City’s Vastness
A lone traveler stands at the peak of ancient stone steps, silhouetted against the setting sun. The city sprawls below, a tapestry of life and history. The scene evokes a sense of tranquility, contemplation, and adventure, as the figure’s small size against the grand landscape emphasizes their isolation and the vastness of the world.
Prompt
poses ankle-cross: Awe-struck, contemplative, taking in the beauty ; A tourist, gazing out at a breathtaking vista; medium shot; Tourism; Ancient ruins with a panoramic view; cinematic
Characteristic
Shot : A lone man stands on a stone staircase overlooking a city and a distant horizon with a backpack on. The man is facing away from the camera. The sky is blue and the sun is shining.
Aesthetic Score : 0.6
Mood : tranquil, contemplative, solitary
Quality
Entropy : 6.80
Noise : 78
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors
Finding Freedom in the Desert Sun
A woman stands with arms raised, embracing the vastness of the desert landscape. The setting sun casts a warm glow, creating a sense of serenity and hope. This image captures the feeling of liberation and the promise of new beginnings.
Prompt
poses ankle-cross: Free-spirited, adventurous, embracing the unknown ; A backpacker, standing at the edge of a vast desert; wide shot; Travel; Endless sand dunes stretching into the horizon; cinematic
Characteristic
Shot : A person is standing in a desert with their arms raised in the air, they are wearing a backpack and the sun is setting in the background
Aesthetic Score : 0.7
Mood : peaceful, serene, liberating
Quality
Entropy : 6.27
Noise : 34
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : None
Friendship Glows in the City Lights
Three friends radiate joy and warmth as they stand together in a bustling night scene. The backlighting creates a sense of intimacy, highlighting their connection and playful energy.
Prompt
poses ankle-cross: Joyful, carefree, enjoying each other’s company ; A group of friends, laughing and celebrating; medium shot; Groups; Vibrant, bustling street scene with colorful lights; cinematic
Characteristic
Shot : Three young women are standing together in a brightly lit street at night, likely in a city with festive lights, enjoying themselves.
Aesthetic Score : 0.6
Mood : happy, carefree, vibrant
Quality
Entropy : 6.64
Noise : 76
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor artifacts and noise in the background, but not significantly detracting from the overall image quality.
A Shadowy Figure Before a Majestic Castle
A lone figure in a black robe stands before a grand stone castle, its imposing presence accentuated by a moat and framed by a stone archway. The play of light and shadow, the figure’s isolation, and the castle’s vastness create a mood of mystery and somber drama.
Prompt
poses ankle-cross: Stoic, vigilant, protecting the realm ; A lone warrior, standing guard at a castle gate; medium shot; Heroism; Majestic castle with a moat and drawbridge; cinematic
Characteristic
Shot : A lone figure in a dark cloak stands in the doorway of a stone archway, overlooking a moat leading to a large, imposing castle. The scene is bathed in a soft, ethereal light.
Aesthetic Score : 0.6
Mood : mystical, dramatic, solitary
Quality
Entropy : 6.64
Noise : 84
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some minor artifacts in the background, particularly around the edges of the castle walls.
Mystery and Camaraderie Around the Campfire
Four figures huddled around a crackling fire in a dense, foggy forest. The scene evokes a sense of cozy adventure, with the silhouettes of the men against the flames adding a touch of mystery and intrigue.
Prompt
poses ankle-cross: Intrigued, curious, sharing stories ; A group of explorers, huddled around a campfire; close-up; Adventure; Dense forest with flickering flames; cinematic
Characteristic
Shot : Four men are sitting around a campfire in a forest setting. They are dressed warmly and look relaxed. There is a lot of fog or smoke in the air.
Aesthetic Score : 0.7
Mood : cozy, adventurous, relaxed
Quality
Entropy : 6.56
Noise : 75
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some slight blurriness in the background and slight noise in the image. The foreground trees and vegetation might be a little too dark.
Victory Dance in the Digital Arena
A gamer’s silhouette is illuminated against a vibrant screen, their raised fist a testament to their triumph. The scene captures the raw excitement and energy of a hard-fought victory, showcasing the thrill of digital competition.
Prompt
poses ankle-cross: Excited, victorious, celebrating success ; A gamer, triumphantly raising their hands after winning a game; close-up; Gaming; Brightly lit gaming console with flashing lights; cinematic
Characteristic
Shot : A person is sitting in front of a computer, excited and celebrating a win in a video game. The scene is lit with purple and red lights, creating a vibrant and energetic atmosphere.
Aesthetic Score : 0.5
Mood : excitement, joy, celebration
Quality
Entropy : 6.54
Noise : 61
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some noise and grain, particularly in the darker areas. The sharpness is also not optimal.
Silhouettes of Love Against the City Lights
A black and white photograph captures the romantic silhouette of a couple standing on a balcony, overlooking a sprawling cityscape at night. The dramatic contrast of their forms against the twinkling lights evokes a sense of peace, nostalgia, and isolation, making for a powerful and evocative image.
Prompt
poses ankle-cross: Intimate, romantic, enjoying the view together ; A couple, standing on a balcony overlooking a bustling city; medium shot; Travel; Romantic cityscape with twinkling lights; cinematic
Characteristic
Shot : A silhouetted couple standing on a rooftop overlooking a city at night.
Aesthetic Score : 0.7
Mood : romantic, serene, mysterious
Quality
Entropy : 6.61
Noise : 77
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, and the shadows are a bit harsh.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
- Camera Position: The model scored 0.45, which is slightly below the “good” range of 0.5 to 0.75. This suggests that the model is not perfectly capturing the intended camera positions described in the prompts.
- Shot Analysis: The model scored 0.54, which falls within the “good” range. This indicates that the model is generally able to understand the scene descriptions in the prompts and create images that reflect those descriptions.
- Aesthetic Analysis: The model scored 0.14, which is significantly lower than the “very good” range of -0.2 to 0.1. This suggests that the generated images are not consistently matching the expected aesthetic style described in the prompts.
Overall, the model shows promise in understanding scene descriptions and camera positions, but needs improvement in capturing the desired aesthetic style.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://fal.ai/models/fal-ai/flux/dev/api