AI's Artistic Struggle: Capturing the Essence of Poses with Imagen-v2
- 9 minutes read - 1890 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on textual descriptions is a rapidly evolving field. This blog post delves into an experiment where an AI model was tasked with creating images based on specific poses and accompanying scene descriptions. While the model demonstrated some success in understanding camera positions and shot composition, it struggled to capture the desired aesthetic, highlighting the ongoing challenges in AI’s artistic capabilities. This exploration sheds light on the complexities of translating human artistic vision into the digital realm, emphasizing the need for further advancements in AI’s understanding of aesthetics and visual storytelling.
Created with: imagen-v2
A Hiker’s Solitude Amidst Dramatic Clouds
A lone hiker stands on a rocky mountain peak, dwarfed by the vast landscape and dramatic clouds. The scene evokes a sense of adventure, serenity, and isolation, capturing the beauty and grandeur of nature.
Prompt
poses ankle-cross: Determined, confident, facing the unknown ; A lone adventurer, standing atop a windswept mountain peak; wide shot; Adventure; Dramatic sky with swirling clouds; cinematic
Characteristic
Shot : A lone hiker stands on a rocky mountaintop, looking out at a cloudy, misty landscape. The sky is mostly cloudy, with patches of blue peeking through.
Aesthetic Score : 0.7
Mood : dramatic, contemplative, adventurous
Quality
Entropy : 6.81
Noise : 90
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some slight compression artifacts visible in the sky, especially around the clouds.
Superman: A Silhouette of Hope Against the Setting Sun
A powerful image captures Superman standing tall on a rooftop, his cape billowing in the wind as the sun sets over the city. The dramatic lighting and heroic pose evoke a sense of epic grandeur and hope.
Prompt
poses ankle-cross: Powerful, heroic, standing tall ; A superhero, silhouetted against a blazing sunset; medium shot; Heroism; City skyline with towering buildings; cinematic
Characteristic
Shot : Superman standing on a rooftop overlooking a city at sunset.
Aesthetic Score : 0.7
Mood : heroic, powerful, hopeful
Quality
Entropy : 6.53
Noise : 65
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some artifacts are visible in the sky and the cityscape, specifically in the building in the background
Lost in the Neon Glow: A Glimpse into the Future of VR
A man, enveloped in a futuristic VR experience, is bathed in vibrant neon light. The image evokes a sense of mystery and anticipation, hinting at the immersive and transformative power of virtual reality.
Prompt
poses ankle-cross: Immersed, concentrated, in the zone ; A gamer, intensely focused on a virtual reality headset; close-up; Gaming; Futuristic, neon-lit gaming room; cinematic
Characteristic
Shot : A man wearing a VR headset and headphones is sitting in a chair with pink and blue neon lights illuminating his face.
Aesthetic Score : 0.6
Mood : futuristic, cyberpunk, mysterious
Quality
Entropy : 6.04
Noise : 115
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some noise and artifacts present in the image, especially visible on the man’s skin and in the background.
Tranquility Amidst Ruins: A Moment of Peace by the Lake
A woman finds solace amidst the remnants of the past, gazing out at a serene lake and distant mountain. The scene evokes a sense of tranquility and contemplation, highlighting the beauty of nature and the power of solitude.
Prompt
poses ankle-cross: Awe-struck, contemplative, taking in the beauty ; A tourist, gazing out at a breathtaking vista; medium shot; Tourism; Ancient ruins with a panoramic view; cinematic
Characteristic
Shot : A woman is sitting on a rocky outcrop overlooking a lake and a mountain range in the distance. She is wearing a white tank top and her hair is pulled back in a ponytail.
Aesthetic Score : 0.6
Mood : calm, contemplative, peaceful
Quality
Entropy : 6.78
Noise : 99
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are some artifacts in the image, particularly in the sky, where there are some unnatural-looking streaks.
Lost in the Vastness: A Moment of Serenity in the Desert
A lone figure hangs their legs over the edge of a towering sand dune, gazing out at the endless expanse of the desert. The image evokes a sense of peace and adventure, highlighting the smallness of humanity against the grandeur of nature.
Prompt
poses ankle-cross: Free-spirited, adventurous, embracing the unknown ; A backpacker, standing at the edge of a vast desert; wide shot; Travel; Endless sand dunes stretching into the horizon; cinematic
Characteristic
Shot : A person’s legs are dangling over the edge of a sand dune. The desert stretches out in the distance, with the sky above it.
Aesthetic Score : 0.6
Mood : tranquil, serene, adventurous
Quality
Entropy : 6.71
Noise : 83
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : No significant errors detected
Neon Nights: Friends Dance Under the Glow
Four friends capture the energy of a vibrant night out, laughing and dancing under the dazzling glow of neon lights. The blurry background adds a sense of movement and excitement, creating a joyful and playful atmosphere.
Prompt
poses ankle-cross: Joyful, carefree, enjoying each other’s company ; A group of friends, laughing and celebrating; medium shot; Groups; Vibrant, bustling street scene with colorful lights; cinematic
Characteristic
Shot : Four young people are in the middle of a busy city street at night. They are laughing and enjoying each other’s company. The city lights are reflecting off the wet pavement, creating a vibrant and colorful scene.
Aesthetic Score : 0.7
Mood : happy, youthful, energetic
Quality
Entropy : 6.61
Noise : 91
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : There are some artifacts in the image, particularly in the background. The image also appears to be slightly overexposed, which makes the city lights look very bright.
A Knight’s Somber Vigil: A Dramatic Scene of Anticipation
A lone knight in full plate armor stands resolute on a stone bridge, his gaze fixed on a looming medieval castle. The overcast sky mirrors the somber mood, heightening the sense of drama and anticipation in this epic scene.
Prompt
poses ankle-cross: Stoic, vigilant, protecting the realm ; A lone warrior, standing guard at a castle gate; medium shot; Heroism; Majestic castle with a moat and drawbridge; cinematic
Characteristic
Shot : A lone knight stands on a stone bridge, in front of a imposing medieval castle with cloudy skies above.
Aesthetic Score : 0.6
Mood : dramatic, medieval, heroic
Quality
Entropy : 6.81
Noise : 106
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image appears slightly soft with some blur, particularly on the edges of the image. The knight’s armor exhibits some slight artifacts and the overall texture seems slightly off, possible signs of AI generation.
Whispers in the Smoke: A Night of Mystery by the Fire
Four figures huddled around a crackling campfire, their faces obscured by the swirling smoke. The forest is dark, the air thick with an eerie silence. This image evokes a sense of mystery and intrigue, leaving you wondering what secrets lie hidden in the shadows.
Prompt
poses ankle-cross: Intrigued, curious, sharing stories ; A group of explorers, huddled around a campfire; close-up; Adventure; Dense forest with flickering flames; cinematic
Characteristic
Shot : Four people are gathered around a campfire in a forest at night. The light from the fire illuminates their faces and the surrounding trees, creating a warm and inviting atmosphere.
Aesthetic Score : 0.7
Mood : mysterious, intimate, adventurous
Quality
Entropy : 6.49
Noise : 107
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable errors in the image.
Neon Intensity: A Moment of Focus
A young man, bathed in vibrant neon light, sits in a darkened room, his hands held up in a decisive ‘stop’ gesture. The dramatic lighting and intense focus create a powerful image of determination and control.
Prompt
poses ankle-cross: Excited, victorious, celebrating success ; A gamer, triumphantly raising their hands after winning a game; close-up; Gaming; Brightly lit gaming console with flashing lights; cinematic
Characteristic
Shot : A young man with headphones, lit by red and blue neon light, is staring intensely at the camera with his hands crossed in front of his face. The background is blurred and indistinct.
Aesthetic Score : 0.6
Mood : intense, mysterious, edgy
Quality
Entropy : 6.34
Noise : 100
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has some noise and grain, especially in the shadows. The subject’s skin tone appears slightly unnatural, and the hair looks slightly unrealistic.
City Lights, City Dreams: A Romantic Night on the Balcony
A couple embraces the night, their love story unfolding against a backdrop of twinkling city lights. The blurred cityscape creates a sense of intimacy and distance, capturing the dreamy essence of their moment.
Prompt
poses ankle-cross: Intimate, romantic, enjoying the view together ; A couple, standing on a balcony overlooking a bustling city; medium shot; Travel; Romantic cityscape with twinkling lights; cinematic
Characteristic
Shot : A couple is standing on a rooftop overlooking a city at night. They are looking at each other and the city lights are twinkling in the distance.
Aesthetic Score : 0.8
Mood : romantic, intimate, dreamy
Quality
Entropy : 6.87
Noise : 99
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is a slight blur in the background and the couple is not perfectly in focus.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis. Here’s a breakdown:
Camera Position:
- Score: 0.45
- Interpretation: This score falls below the “good” range of 0.5 to 0.75. It suggests that the model didn’t perfectly capture the intended camera positions described in the prompt.
Shot Analysis:
- Score: 0.45
- Interpretation: Similar to camera position, this score also falls below the “good” range. It indicates that the model didn’t fully understand the scene described in the prompt and didn’t accurately translate it into the generated image.
Aesthetic Analysis:
- Score: 0.13
- Interpretation: This score is significantly higher than the “very good” range of -0.2 to 0.1. It suggests that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall:
While the model showed some success in understanding camera positions and shot composition, it struggled to capture the desired aesthetic. This suggests that the model might need further training to better understand and translate aesthetic descriptions into visually appealing images.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-2/