AI's Artistic Journey: Capturing Poses, Missing the Essence with Imagen-v3
- 9 minutes read - 1734 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunning visuals based on text prompts. However, achieving a perfect balance between technical accuracy and artistic expression remains a challenge. This blog post examines the results of an experiment where an AI model was tasked with generating images based on specific prompts, focusing on the model’s ability to capture the desired aesthetic. We’ll explore the model’s strengths and weaknesses, highlighting its progress in understanding camera angles and shot composition, while also revealing its limitations in capturing the intended mood and artistic style.
Created with: imagen-v3
Finding Serenity Amidst the Peaks
A lone hiker pauses on a rocky mountain path, dwarfed by snow-capped peaks. The scene evokes a sense of serene adventure and contemplation, capturing the vastness and solitude of the wilderness.
Prompt
poses leaning-in: determined, focused ; A lone adventurer; close-up; Adventure; a vast, snow-capped mountain range; cinematic
Characteristic
Shot : A man with a backpack is kneeling on a rocky mountain path, looking down at the ground, with snow-capped mountains in the background
Aesthetic Score : 0.8
Mood : serene, adventurous, contemplative
Quality
Entropy : 6.41
Noise : 88
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : None
Superman Races Against Time to Save a Burning City
A dramatic scene unfolds as Superman, with a determined expression, flies towards the viewer over a burning cityscape. Two other superheroes can be seen in the background, adding to the sense of urgency and danger. The image captures the intensity of the moment, leaving viewers on the edge of their seats.
Prompt
poses leaning-in: powerful, heroic ; A superhero in mid-flight; dynamic shot; Heroism; a cityscape with a burning building in the background; cinematic
Characteristic
Shot : Superman flying over a burning cityscape with two other superheroes in the background
Aesthetic Score : 0.7
Mood : heroic, dramatic, intense
Quality
Entropy : 6.66
Noise : 102
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.90
Image errors : Slight blurriness in the background, some artifacts around the edges of the characters
Focused on the Task at Hand
A close-up shot captures the intensity of concentration as hands fly across the keyboard. The blurred background and cool tones create a sense of digital immersion, highlighting the seriousness of the moment.
Prompt
poses leaning-in: intense, focused ; A gamer’s hands on a keyboard; close-up; Gaming; a brightly lit computer screen displaying a game; cinematic
Characteristic
Shot : A person’s hands are typing on a keyboard, the background is blurry and has a blueish tint.
Aesthetic Score : 0.4
Mood : focused, serious, digital
Quality
Entropy : 6.62
Noise : 71
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has a lot of noise and the colors are not very vibrant.
Sunset Romance on the Cliffside
A couple embraces the golden hour on a dramatic cliff overlooking the ocean. The warm glow of the sunset paints a picture of intimacy and connection, capturing the essence of a romantic moment.
Prompt
poses leaning-in: romantic, awe-inspired ; A couple gazing at a breathtaking sunset; medium shot; Tourism; a panoramic view of a beach with the sun setting over the ocean; cinematic
Characteristic
Shot : A couple standing on a cliff overlooking the ocean at sunset.
Aesthetic Score : 0.7
Mood : romantic, intimate, peaceful
Quality
Entropy : 6.03
Noise : 90
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors in the image.
Lost in the Blur of a Journey
A man, lost in thought, gazes out the window of a moving train. The passing landscape blurs into a green and brown tapestry, reflecting the contemplative mood of the moment. The scene evokes a sense of travel and the quiet introspection that comes with it.
Prompt
poses leaning-in: reflective, adventurous ; A backpacker looking out of a train window; close-up; Travel; a passing landscape of rolling hills and green fields; cinematic
Characteristic
Shot : A man in a beanie and brown jacket is looking out the window of a train, the view outside is a blur of green fields and distant hills.
Aesthetic Score : 0.7
Mood : pensive, contemplative, journey
Quality
Entropy : 6.59
Noise : 82
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors.
Secrets Whispered in the Dark
A group of friends huddle together in a shadowy forest, their faces illuminated by the flickering glow of a single flame. The atmosphere is thick with mystery and suspense, hinting at secrets shared and dangers lurking in the darkness.
Prompt
poses leaning-in: intimate, warm ; A group of friends huddled together around a campfire; medium shot; Groups; a dark forest with the firelight illuminating their faces; cinematic
Characteristic
Shot : A group of friends huddle together in a dark forest, their faces illuminated by the flickering light of a small flame.
Aesthetic Score : 0.6
Mood : mysterious, intimate, suspenseful
Quality
Entropy : 4.98
Noise : 89
Prompt Clip Score : 0.36
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some noise in the dark areas, slight chromatic aberration
The Weight of Focus: A Soldier’s Moment of Truth
A lone soldier, camouflaged and poised, lies in wait behind a concrete barrier. The dark, moody atmosphere and dramatic lighting heighten the tension as he aims his rifle, capturing the intensity and urgency of a battlefield moment.
Prompt
poses leaning-in: intense, focused ; A soldier peering through a sniper scope; close-up; Heroism; a battlefield with smoke and explosions in the distance; cinematic
Characteristic
Shot : A soldier in camouflage gear is aiming a rifle at a target while lying in a prone position behind a concrete barrier. The image has a dark, moody atmosphere with hints of war-torn landscape and a sense of urgency.
Aesthetic Score : 0.7
Mood : intense, focused, dramatic
Quality
Entropy : 6.52
Noise : 94
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : No notable artifacts or errors.
Lost in the Mist: Explorers Brave the Jungle’s Secrets
A sense of mystery and danger hangs heavy in the air as four figures navigate a muddy path through a dense jungle. A fallen tree, shrouded in mist, blocks their way, adding to the suspense of their adventurous journey into the unknown.
Prompt
poses leaning-in: determined, adventurous ; A group of explorers navigating a dense jungle; wide shot; Adventure; lush green foliage and towering trees; cinematic
Characteristic
Shot : Four figures walk away from the viewer on a muddy path through a dense jungle with a fallen tree spanning the path, shrouded in mist.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, suspenseful
Quality
Entropy : 6.85
Noise : 116
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant image errors. The mist appears slightly unnatural but fits the overall tone.
Red Hot Focus: Gamer Reacts to a Thrilling Moment
A young man, bathed in red light, sits glued to his computer, headphones on, eyes wide with surprise and excitement. The intensity of the moment is palpable, as he reacts to a thrilling event in the game. The blurred figure behind him suggests a shared experience, adding to the competitive atmosphere.
Prompt
poses leaning-in: excited, immersed ; A gamer’s face lit by the screen; close-up; Gaming; a vibrant, colorful game interface; cinematic
Characteristic
Shot : A young man wearing headphones is seated in front of a computer, looking surprised and excited. Another person is seated behind him, out of focus. The room is dimly lit with a red glow, highlighting the subject.
Aesthetic Score : 0.6
Mood : excited, focused, competitive
Quality
Entropy : 6.13
Noise : 71
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some minor artifacts in the background, particularly around the edges of the subject’s chair and the monitor in the background.
Lost in the City Lights
A solitary figure stands silhouetted against the vibrant cityscape, their hooded form a stark contrast to the twinkling lights below. The scene evokes a sense of melancholy and introspection, highlighting the feeling of isolation amidst the urban sprawl.
Prompt
poses leaning-in: Solitude, contemplation ; A lone figure stands on a rooftop, gazing out at the sprawling cityscape, its lights twinkling like scattered diamonds.; cinematic
Characteristic
Shot : A lone figure stands on a rooftop overlooking a city skyline at night. The city lights are twinkling and the sky is dark with a few scattered lights. The figure is wearing a hooded jacket and is silhouetted against the bright city lights.
Aesthetic Score : 0.6
Mood : melancholy, mysterious, contemplative
Quality
Entropy : 5.71
Noise : 68
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly blurry and the city lights are not very realistic. The figure is also slightly pixelated.
Conclusion
The analysis shows that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.45, which is slightly below the “good” range of 0.5 to 0.75. This suggests that the model’s ability to accurately interpret and recreate the camera position specified in the prompt is decent, but could be improved.
- Shot Analysis: The model scored 0.56, falling within the “good” range. This indicates that the model effectively understood the scene described in the prompt and generated an image with a shot composition that aligns well with the intended scene.
- Aesthetic Analysis: The model scored 0.14, which is significantly higher than the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated considerably from the expected aesthetic based on the prompt.
Overall, the model demonstrates a good understanding of camera position and shot composition, but needs improvement in generating images that match the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/