AI Captures the Pose, But Misses the Mood with Flux-dev
- 9 minutes read - 1743 wordsTable of Contents
In the realm of AI image generation, capturing the essence of a scene goes beyond simply placing objects and characters in the right positions. Dramatic poses, for example, are often used to convey emotion, action, or a specific mood. This blog post explores the results of testing an AI model’s ability to generate images based on specific poses and scenes, focusing on the model’s success in capturing the intended aesthetic.
Created with: flux-dev
Clash of Titans: Silhouettes Battle at Sunset
Two figures locked in a fierce sword fight, their silhouettes stark against the fiery sunset. The dramatic backlighting and epic composition evoke a sense of heroism and grandeur. A third figure, partially obscured in the distance, adds a layer of mystery to this captivating scene.
Prompt
poses fighting: epic, determined ; A lone warrior; wide shot; heroism; a desolate battlefield with the setting sun in the background; cinematic
Characteristic
Shot : Two silhouetted figures, one with a sword, the other with a shield, stand facing each other in a field with a sunset behind them, another figure is visible in the background.
Aesthetic Score : 0.7
Mood : epic, dramatic, nostalgic
Quality
Entropy : 6.43
Noise : 52
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.40
Image errors : The image appears to be slightly overexposed, which may be an intentional effect.
Warriors on the Brink of Mystery
A group of warriors, silhouetted against a misty jungle, stand poised before a looming, ancient structure. The scene evokes a sense of mystery and adventure, hinting at a dramatic confrontation or a perilous quest.
Prompt
poses fighting: intense, adventurous ; A group of adventurers; medium shot; adventure; a dense jungle with ancient ruins in the distance; cinematic
Characteristic
Shot : A group of warriors with swords are standing in a forest with a misty background. The scene is set in a fantasy world.
Aesthetic Score : 0.6
Mood : mysterious, epic, dramatic
Quality
Entropy : 6.65
Noise : 104
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.70
Image errors : There are some minor artifacts in the image, particularly in the areas of high contrast. The edges of the warriors and the swords are somewhat jagged.
Neon City Enigma: A Woman’s Determined Walk Through a Cyberpunk World
A young woman with a ponytail strides through a vibrant cyberpunk city, bathed in neon light. Her focused expression and mysterious pose hint at a hidden purpose. A blurry figure trails behind, adding to the intrigue of this futuristic urban scene.
Prompt
poses fighting: dynamic, futuristic ; A player character; close-up; gaming; a neon-lit cityscape with holographic projections; cinematic
Characteristic
Shot : A woman in a black jacket and jeans stands in a futuristic city setting, illuminated by neon lights. Another person, blurry and out of focus, stands behind her in a similar pose.
Aesthetic Score : 0.6
Mood : urban, futuristic, mysterious
Quality
Entropy : 6.82
Noise : 71
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image contains some slight artifacts and compression, particularly noticeable in the blurred background and the woman’s hair.
Friendship in the City Lights
Two young men, clad in casual wear and backpacks, stroll through a bustling Asian city, their laughter echoing through the crowded streets. The contrasting light and shadow, along with the slightly blurred background, add a touch of drama to this heartwarming scene of friendship.
Prompt
poses fighting: chaotic, humorous ; Two tourists; medium shot; tourism; a bustling marketplace with colorful stalls and vibrant crowds; cinematic
Characteristic
Shot : Two men are walking down a crowded street in a city, they are greeting each other with a handshake.
Aesthetic Score : 0.7
Mood : friendly, urban, casual
Quality
Entropy : 6.67
Noise : 85
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, and there is some noise in the background.
A Solitary Figure Against the Vastness of the Desert
A lone traveler walks across a sand dune, their silhouette stark against the clear blue sky. The scene evokes a sense of serenity, adventure, and hope, with the vastness of the desert emphasizing the individual’s journey.
Prompt
poses fighting: isolated, desperate ; A lone traveler; long shot; travel; a vast desert landscape with a lone sand dune in the foreground; cinematic
Characteristic
Shot : A lone figure in a brown coat walks across a vast expanse of sand dunes in the desert. The figure is walking away from the viewer, looking over their shoulder, with their arm outstretched.
Aesthetic Score : 0.6
Mood : adventurous, solitary, vast
Quality
Entropy : 5.92
Noise : 36
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has a slight amount of noise and grain, especially in the shadows. The image also has a few artifacts in the sky which could have been edited.
Silhouettes of Conflict: A City at Dusk
Two figures stand in silhouette against the backdrop of a city at dusk, their tense embrace hinting at a dramatic confrontation. The interplay of light and shadow creates a sense of mystery and intrigue, leaving the viewer to ponder the story unfolding before them.
Prompt
poses fighting: energetic, playful ; A group of friends; medium shot; groups; a rooftop overlooking a city skyline at night; cinematic
Characteristic
Shot : Two men in silhouette are standing on a rooftop overlooking a city skyline at night. The men appear to be in a tense or confrontational pose, with their arms raised as if they are about to engage in a physical altercation.
Aesthetic Score : 0.4
Mood : tense, dramatic, urban
Quality
Entropy : 6.55
Noise : 56
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no major errors in the image. However, the quality of the image is somewhat low, which may be due to the low light conditions in which it was captured.
Silhouetted Warrior in a Fiery Landscape
A lone warrior, silhouetted against a backdrop of raging fire and smoke, stands ready with sword in hand. The epic scene evokes a sense of power, drama, and fierce determination.
Prompt
poses fighting: tragic, determined ; A lone warrior; close-up; heroism; a burning village with smoke billowing in the air; cinematic
Characteristic
Shot : A lone warrior stands silhouetted against a fiery backdrop, a sword in hand. The background suggests a battle or a burning city.
Aesthetic Score : 0.7
Mood : epic, dramatic, powerful
Quality
Entropy : 6.46
Noise : 65
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is some noise and graininess in the image. The edges of the silhouetted figures are slightly blurry.
Shadows in the Cave: A Journey into the Unknown
Four figures, cloaked in shadow, navigate a mysterious cave, their silhouettes illuminated by an unseen light source. A sword held aloft hints at adventure and danger ahead. This epic scene evokes a sense of mystery and intrigue, promising a thrilling journey into the unknown.
Prompt
poses fighting: suspenseful, adventurous ; A group of explorers; wide shot; adventure; a dark cave with flickering torches and mysterious shadows; cinematic
Characteristic
Shot : A group of four figures, three standing and one holding a sword, are silhouetted against a bright opening in a cave, the light catches the sword and illuminates the figures in a dramatic way
Aesthetic Score : 0.6
Mood : mysterious, adventurous, dramatic
Quality
Entropy : 6.16
Noise : 82
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has some slight blurriness, particularly around the edges of the figures.
VR Worlds Collide: A Futuristic Dance of Light and Shadow
Two figures, silhouetted against a vibrant blue and pink backdrop, engage in a playful interaction within the realm of virtual reality. The contrasting light creates a dramatic effect, highlighting the futuristic and exciting nature of their experience.
Prompt
poses fighting: immersive, intense ; A gamer; close-up; gaming; a virtual reality headset with a pixelated world projected in the background; cinematic
Characteristic
Shot : Two people wearing VR headsets are interacting with each other in a dimly lit room. The background is a blurry wall with a screen displaying a blurry scene.
Aesthetic Score : 0.7
Mood : futuristic, playful, mysterious
Quality
Entropy : 6.29
Noise : 54
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some noise and grain, particularly in the shadows.
A Handshake of Secrets: Mystery in the Train Station
Two men meet in a bustling train station, their handshake shrouded in an air of professionalism and intrigue. The lighting and composition create a sense of mystery, leaving the viewer wondering what secrets lie beneath the surface.
Prompt
poses fighting: fast-paced, chaotic ; Two travelers; medium shot; travel; a crowded train station with people rushing in all directions; cinematic
Characteristic
Shot : Two men in suits are shaking hands in a dimly lit hallway or subway station.
Aesthetic Score : 0.6
Mood : serious, professional, formal
Quality
Entropy : 6.37
Noise : 57
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some minor artifacts around the edges, especially on the right side.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.5, which is considered good. This means the generated image’s camera position closely matched the prompt’s instructions.
- Shot Analysis: The model scored 0.64, also considered good. This indicates the generated image’s shot composition was fairly aligned with the prompt’s description.
- Aesthetic Analysis: The model scored 0.12, which is not very good. This suggests the generated image’s aesthetic deviated significantly from the expected aesthetic based on the prompt.
Overall, the model seems to be capable of understanding and implementing camera positions and shot types, but it needs improvement in capturing the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://fal.ai/models/fal-ai/flux/dev/api