AI's Artistic Journey: Capturing Poses, Missing the Essence with Imagen-v3

AI's Struggle with Aesthetic: A Look at Poses and Scene Generation with Imagen-v3

Contents

The world of AI image generation is rapidly evolving, with models capable of creating stunning visuals based on text prompts. However, achieving a perfect balance between technical accuracy and artistic expression remains a challenge. This blog post examines the results of an experiment where an AI model was tasked with generating images based on specific prompts, focusing on the model’s ability to capture the desired aesthetic. We’ll explore the model’s strengths and weaknesses, highlighting its progress in understanding camera angles and shot composition, while also revealing its limitations in capturing the intended mood and artistic style.

Created with: imagen-v3

Finding Serenity Amidst the Peaks

A lone hiker pauses on a rocky mountain path, dwarfed by snow-capped peaks. The scene evokes a sense of serene adventure and contemplation, capturing the vastness and solitude of the wilderness.

Finding Serenity Amidst the Peaks

Prompt

poses leaning-in: determined, focused ; A lone adventurer; close-up; Adventure; a vast, snow-capped mountain range; cinematic

Characteristic

Shot : A man with a backpack is kneeling on a rocky mountain path, looking down at the ground, with snow-capped mountains in the background

Aesthetic Score : 0.8

Mood : serene, adventurous, contemplative

Quality

Entropy : 6.41

Noise : 88

Prompt Clip Score : 0.32

AI Evaluation

Likelihood of AI : 0.10

Image errors : None

Superman Races Against Time to Save a Burning City

A dramatic scene unfolds as Superman, with a determined expression, flies towards the viewer over a burning cityscape. Two other superheroes can be seen in the background, adding to the sense of urgency and danger. The image captures the intensity of the moment, leaving viewers on the edge of their seats.

Superman Races Against Time to Save a Burning City

Prompt

poses leaning-in: powerful, heroic ; A superhero in mid-flight; dynamic shot; Heroism; a cityscape with a burning building in the background; cinematic

Characteristic

Shot : Superman flying over a burning cityscape with two other superheroes in the background

Aesthetic Score : 0.7

Mood : heroic, dramatic, intense

Quality

Entropy : 6.66

Noise : 102

Prompt Clip Score : 0.32

AI Evaluation

Likelihood of AI : 0.90

Image errors : Slight blurriness in the background, some artifacts around the edges of the characters

Focused on the Task at Hand

A close-up shot captures the intensity of concentration as hands fly across the keyboard. The blurred background and cool tones create a sense of digital immersion, highlighting the seriousness of the moment.

Focused on the Task at Hand

Prompt

poses leaning-in: intense, focused ; A gamer’s hands on a keyboard; close-up; Gaming; a brightly lit computer screen displaying a game; cinematic

Characteristic

Shot : A person’s hands are typing on a keyboard, the background is blurry and has a blueish tint.

Aesthetic Score : 0.4

Mood : focused, serious, digital

Quality

Entropy : 6.62

Noise : 71

Prompt Clip Score : 0.29

AI Evaluation

Likelihood of AI : 0.10

Image errors : The image has a lot of noise and the colors are not very vibrant.

Sunset Romance on the Cliffside

A couple embraces the golden hour on a dramatic cliff overlooking the ocean. The warm glow of the sunset paints a picture of intimacy and connection, capturing the essence of a romantic moment.

Sunset Romance on the Cliffside

Prompt

poses leaning-in: romantic, awe-inspired ; A couple gazing at a breathtaking sunset; medium shot; Tourism; a panoramic view of a beach with the sun setting over the ocean; cinematic

Characteristic

Shot : A couple standing on a cliff overlooking the ocean at sunset.

Aesthetic Score : 0.7

Mood : romantic, intimate, peaceful

Quality

Entropy : 6.03

Noise : 90

Prompt Clip Score : 0.31

AI Evaluation

Likelihood of AI : 0.10

Image errors : No visible artifacts or errors in the image.

Lost in the Blur of a Journey

A man, lost in thought, gazes out the window of a moving train. The passing landscape blurs into a green and brown tapestry, reflecting the contemplative mood of the moment. The scene evokes a sense of travel and the quiet introspection that comes with it.

Lost in the Blur of a Journey

Prompt

poses leaning-in: reflective, adventurous ; A backpacker looking out of a train window; close-up; Travel; a passing landscape of rolling hills and green fields; cinematic

Characteristic

Shot : A man in a beanie and brown jacket is looking out the window of a train, the view outside is a blur of green fields and distant hills.

Aesthetic Score : 0.7

Mood : pensive, contemplative, journey

Quality

Entropy : 6.59

Noise : 82

Prompt Clip Score : 0.33

AI Evaluation

Likelihood of AI : 0.20

Image errors : No visible errors.

Secrets Whispered in the Dark

A group of friends huddle together in a shadowy forest, their faces illuminated by the flickering glow of a single flame. The atmosphere is thick with mystery and suspense, hinting at secrets shared and dangers lurking in the darkness.

Secrets Whispered in the Dark

Prompt

poses leaning-in: intimate, warm ; A group of friends huddled together around a campfire; medium shot; Groups; a dark forest with the firelight illuminating their faces; cinematic

Characteristic

Shot : A group of friends huddle together in a dark forest, their faces illuminated by the flickering light of a small flame.

Aesthetic Score : 0.6

Mood : mysterious, intimate, suspenseful

Quality

Entropy : 4.98

Noise : 89

Prompt Clip Score : 0.36

AI Evaluation

Likelihood of AI : 0.10

Image errors : Some noise in the dark areas, slight chromatic aberration

The Weight of Focus: A Soldier’s Moment of Truth

A lone soldier, camouflaged and poised, lies in wait behind a concrete barrier. The dark, moody atmosphere and dramatic lighting heighten the tension as he aims his rifle, capturing the intensity and urgency of a battlefield moment.

The Weight of Focus: A Soldier’s Moment of Truth

Prompt

poses leaning-in: intense, focused ; A soldier peering through a sniper scope; close-up; Heroism; a battlefield with smoke and explosions in the distance; cinematic

Characteristic

Shot : A soldier in camouflage gear is aiming a rifle at a target while lying in a prone position behind a concrete barrier. The image has a dark, moody atmosphere with hints of war-torn landscape and a sense of urgency.

Aesthetic Score : 0.7

Mood : intense, focused, dramatic

Quality

Entropy : 6.52

Noise : 94

Prompt Clip Score : 0.32

AI Evaluation

Likelihood of AI : 0.20

Image errors : No notable artifacts or errors.

Lost in the Mist: Explorers Brave the Jungle’s Secrets

A sense of mystery and danger hangs heavy in the air as four figures navigate a muddy path through a dense jungle. A fallen tree, shrouded in mist, blocks their way, adding to the suspense of their adventurous journey into the unknown.

Lost in the Mist: Explorers Brave the Jungle’s Secrets

Prompt

poses leaning-in: determined, adventurous ; A group of explorers navigating a dense jungle; wide shot; Adventure; lush green foliage and towering trees; cinematic

Characteristic

Shot : Four figures walk away from the viewer on a muddy path through a dense jungle with a fallen tree spanning the path, shrouded in mist.

Aesthetic Score : 0.7

Mood : mysterious, adventurous, suspenseful

Quality

Entropy : 6.85

Noise : 116

Prompt Clip Score : 0.33

AI Evaluation

Likelihood of AI : 0.20

Image errors : No significant image errors. The mist appears slightly unnatural but fits the overall tone.

Red Hot Focus: Gamer Reacts to a Thrilling Moment

A young man, bathed in red light, sits glued to his computer, headphones on, eyes wide with surprise and excitement. The intensity of the moment is palpable, as he reacts to a thrilling event in the game. The blurred figure behind him suggests a shared experience, adding to the competitive atmosphere.

Red Hot Focus: Gamer Reacts to a Thrilling Moment

Prompt

poses leaning-in: excited, immersed ; A gamer’s face lit by the screen; close-up; Gaming; a vibrant, colorful game interface; cinematic

Characteristic

Shot : A young man wearing headphones is seated in front of a computer, looking surprised and excited. Another person is seated behind him, out of focus. The room is dimly lit with a red glow, highlighting the subject.

Aesthetic Score : 0.6

Mood : excited, focused, competitive

Quality

Entropy : 6.13

Noise : 71

Prompt Clip Score : 0.26

AI Evaluation

Likelihood of AI : 0.20

Image errors : The image has some minor artifacts in the background, particularly around the edges of the subject’s chair and the monitor in the background.

Lost in the City Lights

A solitary figure stands silhouetted against the vibrant cityscape, their hooded form a stark contrast to the twinkling lights below. The scene evokes a sense of melancholy and introspection, highlighting the feeling of isolation amidst the urban sprawl.

Lost in the City Lights

Prompt

poses leaning-in: Solitude, contemplation ; A lone figure stands on a rooftop, gazing out at the sprawling cityscape, its lights twinkling like scattered diamonds.; cinematic

Characteristic

Shot : A lone figure stands on a rooftop overlooking a city skyline at night. The city lights are twinkling and the sky is dark with a few scattered lights. The figure is wearing a hooded jacket and is silhouetted against the bright city lights.

Aesthetic Score : 0.6

Mood : melancholy, mysterious, contemplative

Quality

Entropy : 5.71

Noise : 68

Prompt Clip Score : 0.34

AI Evaluation

Likelihood of AI : 0.80

Image errors : The image is slightly blurry and the city lights are not very realistic. The figure is also slightly pixelated.

Conclusion

The analysis shows that the generative AI model performed well in understanding the camera position and shot composition, but struggled with the aesthetic aspect. Here’s a breakdown:

  • Camera Position: The model scored 0.45, which is slightly below the “good” range of 0.5 to 0.75. This suggests that the model’s ability to accurately interpret and recreate the camera position specified in the prompt is decent, but could be improved.
  • Shot Analysis: The model scored 0.56, falling within the “good” range. This indicates that the model effectively understood the scene described in the prompt and generated an image with a shot composition that aligns well with the intended scene.
  • Aesthetic Analysis: The model scored 0.14, which is significantly higher than the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated considerably from the expected aesthetic based on the prompt.

Overall, the model demonstrates a good understanding of camera position and shot composition, but needs improvement in generating images that match the desired aesthetic.

Sources: