AI Captures the Pose, But Misses the Mood with Flux-dev
- 9 minutes read - 1717 wordsTable of Contents
The world of AI image generation is constantly evolving, with new models emerging that promise to revolutionize the way we create visual content. One such model was recently put to the test, tasked with generating images based on a series of prompts describing different scenes and poses. While the model demonstrated impressive capabilities in understanding and implementing camera positions and shot types, it struggled to capture the desired aesthetic, highlighting a key challenge in the field of AI image generation. This blog post delves into the results of this experiment, exploring the model’s strengths and weaknesses, and discussing the potential for future improvements.
Created with: flux-dev
Silhouetted Solitude: A Moment of Contemplation on the Mountaintop
A lone figure stands silhouetted against the misty sky, their dark jacket and jeans blending with the clouds. The low angle shot emphasizes the vastness of the landscape, creating a sense of isolation and contemplation. This moody image evokes feelings of solitude and introspection.
Prompt
poses hands-in-pockets: determined, confident ; A lone adventurer, standing on a mountain peak; wide shot; heroism; dramatic sky with clouds; cinematic
Characteristic
Shot : A lone figure stands on a mountain overlooking a misty landscape.
Aesthetic Score : 0.6
Mood : solitude, contemplative, atmospheric
Quality
Entropy : 6.20
Noise : 55
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor artifacts in the sky, but they are not very noticeable.
A Boy’s Journey Begins: A Serene Forest Adventure
A young boy, backpack in tow, stands amidst the tranquil forest, his gaze fixed on the distant horizon. The scene evokes a sense of serene contemplation and adventurous spirit, leaving the viewer to wonder what mysteries lie ahead. The solitary figure and the unclear destination create a captivating sense of anticipation and mystery.
Prompt
poses hands-in-pockets: curious, excited ; A young explorer, gazing at a vast jungle; medium shot; adventure; lush green foliage and ancient ruins; cinematic
Characteristic
Shot : A young boy with a backpack standing in a forest, looking away from the camera.
Aesthetic Score : 0.5
Mood : melancholy, contemplative, hopeful
Quality
Entropy : 6.87
Noise : 81
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors or artifacts in the image
Immersed in the Game: A Gamer’s Focus Under Neon Lights
A young gamer, bathed in vibrant purple and blue lighting, sits intently in their gaming chair, headphones on, eyes locked on the screen. The low lighting and focused expression create a palpable sense of intensity and immersion in the digital world.
Prompt
poses hands-in-pockets: focused, intense ; A gamer, sitting at a desk with a controller in hand; close-up; gaming; neon lights and computer screens; cinematic
Characteristic
Shot : A young person is sitting in a gaming chair in a dimly lit room with colorful lighting, they’re wearing headphones and holding a gaming controller in their hand, likely playing a video game.
Aesthetic Score : 0.6
Mood : focused, intense, gaming
Quality
Entropy : 6.31
Noise : 56
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a slight noise and artifacting in the shadows and dark areas, especially in the background. There is a slight blurriness on the edges of the subject, particularly on the person’s face and headphones.
City Lights, Open Skies: A Moment of Hope
A young woman finds joy in the simple act of walking, her gaze drawn upwards to the bright sky. The contrast between the bustling city and the open expanse above creates a sense of hope and carefree optimism.
Prompt
poses hands-in-pockets: amazed, happy ; A tourist, admiring a famous landmark; medium shot; tourism; bustling city streets and iconic architecture; cinematic
Characteristic
Shot : A woman is walking on a city street, looking up at the sky, bathed in warm sunlight.
Aesthetic Score : 0.7
Mood : happy, hopeful, carefree
Quality
Entropy : 6.74
Noise : 65
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is slightly overexposed, resulting in some blown-out highlights in the sky.
Finding Tranquility Amidst the Wildflowers
A lone figure, backpack in tow, traverses a path through a field of vibrant yellow wildflowers. The majestic mountain range in the background and the vast blue sky create a sense of calm and solitude, inviting contemplation and adventure.
Prompt
poses hands-in-pockets: free, adventurous ; A backpacker, walking along a scenic road; medium shot; travel; rolling hills and vibrant wildflowers; cinematic
Characteristic
Shot : A person with a backpack is walking on a path in the mountains, with yellow flowers on either side.
Aesthetic Score : 0.6
Mood : tranquil, contemplative, serene
Quality
Entropy : 6.54
Noise : 66
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors detected
Golden Hour Friendships on the Beach
Capture the joy and carefree spirit of a sunset gathering with friends. This image evokes a sense of happiness and relaxation, with the warm glow of the setting sun casting a beautiful light on the scene.
Prompt
poses hands-in-pockets: relaxed, joyful ; A group of friends, standing on a beach at sunset; wide shot; groups; golden sand and crashing waves; cinematic
Characteristic
Shot : A group of six friends standing on a beach at sunset, looking at the horizon.
Aesthetic Score : 0.6
Mood : happy, relaxed, friendly
Quality
Entropy : 6.50
Noise : 63
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly underexposed, resulting in a loss of detail in the shadows. The silhouettes are well-defined, but the overall image lacks sharpness.
Silhouetted Hero: Firefighter Braves the Blaze
A firefighter, clad in full gear, stands in stark silhouette against a backdrop of raging flames. The intense fire creates a dramatic and somber atmosphere, highlighting the danger and heroism of the scene. The firefighter’s stoic form evokes a sense of bravery in the face of adversity.
Prompt
poses hands-in-pockets: brave, determined ; A firefighter, standing in front of a burning building; medium shot; heroism; smoke and flames; cinematic
Characteristic
Shot : A firefighter in full gear stands in front of a fire, silhouetted against the flames.
Aesthetic Score : 0.6
Mood : dramatic, intense, heroic
Quality
Entropy : 6.69
Noise : 62
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There is some slight noise and grain in the image, but nothing too distracting.
Shadows and Secrets: A Journey into the Unknown
Three figures, shrouded in mystery, navigate a dark, cave-like environment illuminated by an ethereal light source. The interplay of light and shadow creates a sense of suspense and adventure, leaving the viewer to wonder what lies ahead.
Prompt
poses hands-in-pockets: cautious, curious ; A group of explorers, navigating a dark cave; medium shot; adventure; stalactites and stalagmites; cinematic
Characteristic
Shot : Three figures silhouetted against a bright, ethereal glow, walking through a narrow canyon with rock walls on both sides. The figures are in the middle of the frame and the light source is behind them, creating a dramatic backlight effect.
Aesthetic Score : 0.6
Mood : mysterious, ominous, adventurous
Quality
Entropy : 6.05
Noise : 81
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant image errors observed.
Lost in the Music: Dancing Under a Sea of Lights
A vibrant scene captures the joy and energy of a concert or rave. The person, lost in the music, dances with raised arms under a dazzling display of colorful lights, surrounded by a lively crowd. The image evokes a sense of carefree excitement and the pure thrill of being part of the moment.
Prompt
poses hands-in-pockets: excited, triumphant ; A gamer, celebrating a victory with friends; close-up; gaming; celebratory confetti and flashing lights; cinematic
Characteristic
Shot : A silhouette of a person with headphones raised in the air, surrounded by other people dancing at a party, with a soft purple and pink lighting and confetti
Aesthetic Score : 0.6
Mood : energetic, happy, festive
Quality
Entropy : 6.33
Noise : 51
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.30
Image errors : There are some slight artifacts in the image, but they are not very noticeable.
A Family’s Journey Through Time
A mother, father, and their young daughter walk hand-in-hand towards a mysterious archway, bathed in the golden light of the setting sun. The scene evokes a sense of peace, hope, and the promise of adventure as they embark on a journey together.
Prompt
poses hands-in-pockets: happy, united ; A family, standing in front of a famous monument; wide shot; tourism; historical landmark and sunny sky; cinematic
Characteristic
Shot : A family of three, a couple and their young daughter, stand in front of a large, imposing archway. The scene appears to be taking place in a park or outdoor setting.
Aesthetic Score : 0.7
Mood : peaceful, happy, hopeful
Quality
Entropy : 6.54
Noise : 65
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No obvious artifacts or errors
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.5, which is considered good. This means the generated image’s camera position closely matched the prompt’s instructions.
- Shot Analysis: The model scored 0.56, also considered good. This indicates the generated image’s shot composition was fairly aligned with the prompt’s description.
- Aesthetic Analysis: The model scored 0.16, which is not very good. This suggests the generated image’s aesthetic deviated significantly from the expected aesthetic based on the prompt.
Overall, the model seems to be capable of understanding and implementing camera positions and shot types, but it needs improvement in generating images that match the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://fal.ai/models/fal-ai/flux/dev/api