AI's Artistic Journey: Capturing Poses, But Missing the Mood with Flux-dev

AI's Artistic Journey: Capturing Poses, But Missing the Mood with Flux-dev

Contents

In the realm of AI image generation, capturing the essence of a scene goes beyond simply replicating the elements described. It involves understanding the nuances of composition, lighting, and most importantly, the desired aesthetic. This blog post examines the performance of a generative AI model in creating images based on prompts that include specific poses and aesthetics. While the model demonstrates proficiency in understanding camera positions and shot types, it falls short in capturing the intended aesthetic. We delve into the results, analyzing the model’s strengths and weaknesses, and discuss the implications for future AI image generation.

Created with: flux-dev

Silhouetted Warrior at Sunset’s Edge

A lone figure, possibly a warrior, stands with their back to the viewer, silhouetted against a large, glowing sun. The figure holds two swords, creating a sense of tension and anticipation against the backdrop of a desolate landscape. The image evokes a mood of epic mystery and contemplation.

Silhouetted Warrior at Sunset’s Edge

Prompt

poses staggered-pose: Epic, determined ; A lone warrior; wide shot; Heroism; A desolate battlefield with a setting sun; cinematic

Characteristic

Shot : A lone figure in silhouette, facing away from the viewer, stands on a barren landscape with a large, setting sun behind them. They hold two swords, one in each hand.

Aesthetic Score : 0.7

Mood : dramatic, solitary, contemplative

Quality

Entropy : 6.57

Noise : 53

Prompt Clip Score : 0.24

AI Evaluation

Likelihood of AI : 0.80

Image errors : The image appears to be rendered in a realistic style, with no visible errors or artifacts.

Unveiling the Secrets of the Ancient Temple

A group of adventurers embark on a journey through a misty forest, their destination: a grand, ancient temple shrouded in mystery. The scene evokes a sense of wonder and anticipation, promising a thrilling exploration of the unknown.

Unveiling the Secrets of the Ancient Temple

Prompt

poses staggered-pose: Curious, adventurous ; A group of explorers; medium shot; Adventure; A dense jungle with ancient ruins in the background; cinematic

Characteristic

Shot : A group of people are walking through a forest, with a large stone structure in the background. The trees are tall and thick, and the air is hazy. The people are dressed in casual clothing and are carrying backpacks.

Aesthetic Score : 0.6

Mood : mysterious, atmospheric, contemplative

Quality

Entropy : 6.83

Noise : 119

Prompt Clip Score : 0.25

AI Evaluation

Likelihood of AI : 0.10

Image errors : There is a slight blurriness to the image, particularly in the background. The colours are a bit washed out and lack vibrancy.

Lost in the Code: A Hacker’s Focus

A young man, bathed in the glow of blue and red lights, sits intently at his desk, headphones on, eyes fixed on the computer screen. The dim lighting and his focused expression create an air of mystery and intrigue, hinting at a story unfolding within the digital realm.

Lost in the Code: A Hacker’s Focus

Prompt

poses staggered-pose: Focused, intense ; A gamer; close-up; Gaming; A brightly lit gaming setup with a monitor displaying a thrilling game; cinematic

Characteristic

Shot : A young man in a dimly lit room wearing headphones and looking intently at a computer screen.

Aesthetic Score : 0.6

Mood : focused, techy, mysterious

Quality

Entropy : 6.43

Noise : 59

Prompt Clip Score : 0.25

AI Evaluation

Likelihood of AI : 0.20

Image errors : The image is slightly blurry, particularly the subject’s face.

Family Adventure on the Mountaintop

A heartwarming image of a family of four standing on a mountain peak, enjoying the breathtaking panoramic view of the valley below. Their smiles and relaxed postures radiate happiness and a sense of adventure, capturing the essence of freedom and peace found in nature.

Family Adventure on the Mountaintop

Prompt

poses staggered-pose: Joyful, relaxed ; A family; medium shot; Tourism; A breathtaking view of a mountain range with a clear blue sky; cinematic

Characteristic

Shot : A family of four is hiking in the mountains. They are all wearing backpacks and looking at the view. The sky is blue and the mountains are green. The picture is taken from a low angle, looking up at the family.

Aesthetic Score : 0.6

Mood : serene, adventurous, happy

Quality

Entropy : 6.65

Noise : 60

Prompt Clip Score : 0.28

AI Evaluation

Likelihood of AI : 0.20

Image errors : No visible artifacts or errors

Embracing the Mountain Majesty

A lone hiker stands on a winding mountain road, arms outstretched, capturing the breathtaking panorama. The wide-angle shot evokes a sense of freedom and adventure, reflecting a serene and contemplative mood.

Embracing the Mountain Majesty

Prompt

poses staggered-pose: Free-spirited, adventurous ; A backpacker; long shot; Travel; A winding road leading to a distant village nestled in a valley; cinematic

Characteristic

Shot : A man with a backpack stands on a mountain road with his arms outstretched, facing a view of distant mountains.

Aesthetic Score : 0.7

Mood : inspirational, adventurous, hopeful

Quality

Entropy : 6.66

Noise : 74

Prompt Clip Score : 0.21

AI Evaluation

Likelihood of AI : 0.20

Image errors : None

Silhouettes of Joy: Dancing Under the Neon Glow

Capture the energy of a vibrant party with this image. Backlighting creates dramatic silhouettes of dancers against a backdrop of red and pink lights, evoking a fun, festive, and energetic mood.

Silhouettes of Joy: Dancing Under the Neon Glow

Prompt

poses staggered-pose: Energetic, celebratory ; A group of friends; medium shot; Groups; A lively party scene with people dancing and laughing; cinematic

Characteristic

Shot : A group of young people are dancing at a party or club, lit by red and pink lights. The focus is on a woman in the center who is facing the camera with her arms raised.

Aesthetic Score : 0.6

Mood : energetic, playful, fun

Quality

Entropy : 6.23

Noise : 58

Prompt Clip Score : 0.22

AI Evaluation

Likelihood of AI : 0.10

Image errors : There is some noise in the image, particularly in the shadows, which could indicate that the photo was taken in low light or with a high ISO setting.

Superman Stands Tall, Ready to Face the Challenge

A dramatic image captures Superman, clad in his iconic suit, gazing upwards with a determined expression. The cityscape behind him fades into the bright, cloudy sky, emphasizing his heroic stance and the gravity of the moment.

Superman Stands Tall, Ready to Face the Challenge

Prompt

poses staggered-pose: Powerful, confident ; A superhero; close-up; Heroism; A cityscape with towering skyscrapers and a dramatic sky; cinematic

Characteristic

Shot : A man dressed as Superman stands in a cityscape with a cape billowing behind him.

Aesthetic Score : 0.7

Mood : heroic, powerful, dramatic

Quality

Entropy : 6.22

Noise : 66

Prompt Clip Score : 0.24

AI Evaluation

Likelihood of AI : 0.20

Image errors : The image has some minor artifacts, particularly in the shadows and around the subject’s edges. There is a slight halo effect around the subject. The image is also slightly oversharpened, which gives the image a slightly artificial look.

Silhouettes of Hope in the Desert Sunrise

Five figures stand in a line, their backs to the camera, silhouetted against a breathtaking golden sunrise in the desert. The scene evokes a sense of peace, mystery, and hope, with the flowing clothing and dramatic lighting adding to the captivating atmosphere.

Silhouettes of Hope in the Desert Sunrise

Prompt

poses staggered-pose: Hopeful, determined ; A group of adventurers; wide shot; Adventure; A vast desert landscape with a lone oasis in the distance; cinematic

Characteristic

Shot : Five people stand in silhouette against a desert landscape with a sunset in the background.

Aesthetic Score : 0.7

Mood : mysterious, contemplative, hopeful

Quality

Entropy : 6.40

Noise : 59

Prompt Clip Score : 0.25

AI Evaluation

Likelihood of AI : 0.30

Image errors : The image is slightly blurry and the colors are a bit muted.

Lost in the Game: A Moment of Intense Focus

A solitary figure, shrouded in shadow, is completely absorbed in a video game. The glow of the computer screen illuminates their focused face, highlighting the intensity of their concentration. The scene evokes a sense of isolation and immersion, capturing the captivating power of gaming.

Lost in the Game: A Moment of Intense Focus

Prompt

poses staggered-pose: Focused, strategic ; A gamer; close-up; Gaming; A dimly lit room with a computer screen displaying a complex strategy game; cinematic

Characteristic

Shot : A person wearing a headset is sitting in front of a computer screen. The screen is displaying a game or program. The room is lit with a soft blue and purple light.

Aesthetic Score : 0.4

Mood : focused, serious, intense

Quality

Entropy : 5.90

Noise : 60

Prompt Clip Score : 0.22

AI Evaluation

Likelihood of AI : 0.20

Image errors : The image is slightly grainy and the edges of the screen are not perfectly sharp. There is a little bit of chromatic aberration around the edges of the screen.

Silhouettes of Love at Sunset

A couple stands hand-in-hand, their silhouettes painted against a breathtaking sunset on a serene beach. The scene evokes a sense of intimacy, romance, and tender affection.

Silhouettes of Love at Sunset

Prompt

poses staggered-pose: Romantic, peaceful ; A couple; medium shot; Travel; A romantic sunset over a beach with the ocean waves crashing in the background; cinematic

Characteristic

Shot : A couple silhouetted against a sunset on a beach, kissing

Aesthetic Score : 0.7

Mood : romantic, dreamy, hopeful

Quality

Entropy : 6.80

Noise : 55

Prompt Clip Score : 0.27

AI Evaluation

Likelihood of AI : 0.10

Image errors : No visible errors.

Conclusion

The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.

Here’s a breakdown:

  • Camera Position: The model scored 0.51, which falls within the “good” range (0.5 to 0.75). This means the model was able to accurately capture the camera position described in the prompt.
  • Shot Analysis: The model scored 0.6, also within the “good” range. This indicates the model understood the scene described in the prompt and created an image that reflects that understanding.
  • Aesthetic Analysis: The model scored 0.12, which is significantly lower than the “very good” range (-0.2 to 0.1). This suggests that the generated image’s aesthetic deviated from the expected aesthetic described in the prompt.

Overall, the model demonstrates a good understanding of camera position and shot composition, but needs improvement in capturing the desired aesthetic.

Sources: