AI's Artistic Journey: Capturing Poses, But Missing the Mood with Imagen-v2
- 9 minutes read - 1811 wordsTable of Contents
In the realm of AI image generation, capturing the essence of a scene goes beyond simply replicating the elements. It involves understanding the nuances of composition, camera angles, and most importantly, the intended mood or aesthetic. This blog post delves into an experiment where an AI model was tasked with generating images based on specific prompts, revealing both its strengths and limitations in capturing the desired artistic style.
Created with: imagen-v2
A Lone Warrior at Sunset’s Embrace
A solitary warrior, cloaked in crimson, stands defiant on a rocky outcrop in the heart of a desolate desert. The setting sun casts a warm glow, illuminating the scene with an epic grandeur. The warrior’s pose, the dramatic landscape, and the play of light and shadow evoke a sense of mystery and intrigue, promising a tale of valor and adventure.
Prompt
poses staggered-pose: Epic, determined ; A lone warrior; wide shot; Heroism; A desolate battlefield with a setting sun; cinematic
Characteristic
Shot : A lone warrior, clad in armor and a flowing red cloak, stands on a desert cliff with a sword at his side, looking towards the setting sun. The lighting is dramatic and creates a sense of epicness.
Aesthetic Score : 0.7
Mood : epic, dramatic, powerful
Quality
Entropy : 6.80
Noise : 59
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some minor artifacts in the background and on the warrior’s armor, likely caused by over-sharpening
Lost in the Jungle: Explorers Uncover Ancient Secrets
A group of intrepid explorers navigate a dense, mist-shrouded jungle, their expressions hinting at both excitement and trepidation. The presence of a mysterious stone structure in the distance suggests the possibility of ancient ruins waiting to be discovered. Will they uncover the secrets hidden within?
Prompt
poses staggered-pose: Curious, adventurous ; A group of explorers; medium shot; Adventure; A dense jungle with ancient ruins in the background; cinematic
Characteristic
Shot : A group of adventurers in jungle attire exploring a misty jungle setting, with a stone structure in the background
Aesthetic Score : 0.6
Mood : mysterious, adventurous, suspenseful
Quality
Entropy : 6.84
Noise : 117
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.70
Image errors : Some minor artifacts are present in the foliage and the stone structure, suggesting possible digital manipulation.
Lost in the Neon Glow: A Portrait of Mystery
A young man, shrouded in blue and red neon light, stares intensely into the camera. His headphones amplify the silence, creating an atmosphere of intrigue and unspoken tension. This close-up shot captures a moment of raw emotion, leaving the viewer to decipher the story behind the enigmatic gaze.
Prompt
poses staggered-pose: Focused, intense ; A gamer; close-up; Gaming; A brightly lit gaming setup with a monitor displaying a thrilling game; cinematic
Characteristic
Shot : A close-up portrait of a young man wearing headphones, illuminated by vibrant blue and red lights, creating a striking contrast.
Aesthetic Score : 0.7
Mood : intense, dramatic, mysterious
Quality
Entropy : 6.32
Noise : 94
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears to have some minor digital artifacts, particularly around the subject’s hair and the headphones. There’s a slight over-sharpening effect, making the image look slightly unnatural.
Family Adventure: Awe-Inspiring Mountaintop Views
A heartwarming scene of a family basking in the beauty of a snow-capped mountain range. Their joy and wonder at the vast landscape are palpable, making this a truly inspiring image.
Prompt
poses staggered-pose: Joyful, relaxed ; A family; medium shot; Tourism; A breathtaking view of a mountain range with a clear blue sky; cinematic
Characteristic
Shot : A family of three, a man, a woman, and a young girl, are sitting on a rock cliff overlooking a valley with snow capped mountains in the distance.
Aesthetic Score : 0.6
Mood : happy, adventurous, scenic
Quality
Entropy : 6.78
Noise : 94
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : Slight overexposure in the sky and some compression artifacts visible in the distance.
A Lone Hiker Finds Tranquility on a Moody Mountaintop
A solitary figure stands on a mountain peak, gazing out at a winding road that disappears into the distance. The overcast sky adds a touch of melancholy, while the vast landscape evokes a sense of adventure and contemplation. This image captures the beauty of isolation and the quiet power of nature.
Prompt
poses staggered-pose: Free-spirited, adventurous ; A backpacker; long shot; Travel; A winding road leading to a distant village nestled in a valley; cinematic
Characteristic
Shot : A lone hiker stands on a mountain overlooking a winding road snaking through a valley. The sky is overcast and the overall mood is one of solitude and contemplation.
Aesthetic Score : 0.7
Mood : solitude, contemplation, wanderlust
Quality
Entropy : 6.47
Noise : 87
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight blurriness throughout the image, especially in the sky and background.
Friends Celebrate in a Burst of Joy and Energy
Capture the vibrant spirit of a party with this image, showcasing a group of friends dressed to impress and radiating pure joy. The slightly elevated perspective and dynamic poses create a sense of excitement and energy, making this a perfect snapshot of a memorable celebration.
Prompt
poses staggered-pose: Energetic, celebratory ; A group of friends; medium shot; Groups; A lively party scene with people dancing and laughing; cinematic
Characteristic
Shot : A group of friends celebrating at a party, they are wearing party attire, and the background is a shimmering curtain.
Aesthetic Score : 0.6
Mood : happy, celebratory, energetic
Quality
Entropy : 6.51
Noise : 108
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some noise and grain, particularly in the shadows.
Superhero Stands Tall in a City of Tomorrow
A powerful and mysterious superhero, possibly Superman, dominates the skyline of a futuristic cityscape. The dramatic lighting and the subject’s pose create a sense of anticipation and tension, hinting at an epic battle to come.
Prompt
poses staggered-pose: Powerful, confident ; A superhero; close-up; Heroism; A cityscape with towering skyscrapers and a dramatic sky; cinematic
Characteristic
Shot : A costumed superhero stands in a destroyed cityscape, the hero’s gaze is determined, suggesting a battle has just taken place. The cityscape is rendered in a gritty, stylized aesthetic.
Aesthetic Score : 0.7
Mood : dramatic, intense, powerful
Quality
Entropy : 6.47
Noise : 61
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.80
Image errors : There are some minor artifacts and inconsistencies in the image, particularly in the textures and details of the costume. The texture on the subject’s right arm and the cape’s texture appear somewhat unnatural.
Warriors of the Wasteland: A Silhouette of Hope in a Desolate World
Four figures, clad in armor, stand defiant against the backdrop of a barren desert. The stark lighting casts long shadows, emphasizing the strength and determination of the central female figure. This epic scene evokes a sense of mystery and hope in a post-apocalyptic or fantasy world.
Prompt
poses staggered-pose: Hopeful, determined ; A group of adventurers; wide shot; Adventure; A vast desert landscape with a lone oasis in the distance; cinematic
Characteristic
Shot : A group of four figures, likely warriors or survivors, stand in a desolate desert landscape. There are rocky formations in the background and a hazy sky. The scene evokes a sense of post-apocalyptic or fantasy world.
Aesthetic Score : 0.7
Mood : epic, rugged, desolate
Quality
Entropy : 6.70
Noise : 69
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image is generally well-composed, but there are some minor artifacts in the shadows, particularly around the figures.
Lost in the Glow: A Moment of Intense Focus
A young man, bathed in vibrant colored lights, stares intently at something unseen. His headphones isolate him, creating an atmosphere of intense focus and edgy intrigue. The dramatic lighting draws the viewer’s eye to his expression, leaving them wondering what captivating scene lies beyond the frame.
Prompt
poses staggered-pose: Focused, strategic ; A gamer; close-up; Gaming; A dimly lit room with a computer screen displaying a complex strategy game; cinematic
Characteristic
Shot : A man wearing headphones stares intensely at the camera, bathed in a blue and orange light.
Aesthetic Score : 0.7
Mood : intense, focused, dramatic
Quality
Entropy : 6.32
Noise : 71
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.90
Image errors : Some artifacts are visible in the subject’s hair and the background, and there is a slight blur in the image.
Sunset Romance on the Beach
A couple embraces on a sun-drenched beach, the ocean behind them shrouded in a dreamy haze. The scene evokes a sense of romance, tranquility, and mystery, capturing the essence of a perfect sunset moment.
Prompt
poses staggered-pose: Romantic, peaceful ; A couple; medium shot; Travel; A romantic sunset over a beach with the ocean waves crashing in the background; cinematic
Characteristic
Shot : A couple embracing on a beach at sunset, with the ocean behind them.
Aesthetic Score : 0.7
Mood : romantic, serene, nostalgic
Quality
Entropy : 6.78
Noise : 82
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.50
Image errors : The image suffers from some blurriness, particularly in the background. The colors are slightly muted, which could be due to post-processing or low-light conditions.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.53, which falls within the “good” range (0.5 to 0.75). This indicates that the model was able to accurately capture the camera positions described in the prompt.
- Shot Analysis: The model also scored 0.5, which is within the “good” range. This suggests that the model understood the scene described in the prompt and was able to create an image that reflected that understanding.
- Aesthetic Analysis: The model scored 0.08, which is significantly lower than the “very good” range (-0.2 to 0.1). This indicates that the generated image did not match the expected aesthetic as closely as it did with the camera position and shot analysis.
Overall, the model demonstrates a good understanding of camera positions and scene descriptions, but needs improvement in generating images that meet the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/