AI Captures the Scene, But Struggles with the Shot with Imagen-v3
- 9 minutes read - 1738 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunning and realistic images from text prompts. However, these models are not without their limitations. This blog post delves into the strengths and weaknesses of AI image generation, focusing on a specific aspect: the ability to accurately capture camera positions. We’ll explore how well these models understand the scene and aesthetic, and how they perform in replicating the intended camera angle. Through a series of test prompts, we’ll analyze the results and discuss the implications for future development.
Created with: imagen-v3
Triumphant Warrior: A Moment of Victory Amidst the Storm
A lone warrior stands tall, arms raised in victory, amidst a battlefield littered with fallen soldiers. The dramatic pose and the stormy sky create a powerful image of triumph and resilience.
Prompt
poses dancing: triumphant, powerful ; A lone warrior; wide shot; heroism; a battlefield littered with fallen enemies; cinematic
Characteristic
Shot : A warrior stands triumphantly over a battlefield, arms raised in victory, surrounded by fallen soldiers.
Aesthetic Score : 0.7
Mood : epic, dramatic, victorious
Quality
Entropy : 6.82
Noise : 87
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image is slightly blurry, especially the fallen soldiers, which could be due to excessive post-processing.
Jungle Rhythms: A Celebration of Life and Laughter
A vibrant scene unfolds in the heart of the jungle, where a group of actors dance with infectious energy. Ancient ruins provide a backdrop to their spontaneous joy, creating a captivating blend of adventure and playful spirit.
Prompt
poses dancing: excited, adventurous ; A group of explorers; medium shot; adventure; a dense jungle with ancient ruins in the background; cinematic
Characteristic
Shot : A group of people, likely actors, are dancing in a jungle setting. There are ruins in the background.
Aesthetic Score : 0.6
Mood : energetic, adventurous, playful
Quality
Entropy : 6.67
Noise : 107
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is no visible noise or artifacts.
Lost in the Digital World: A Gamer’s Intense Focus
A young man, bathed in the vibrant blue and red glow of his computer screen, is completely absorbed in his game. The image captures the intensity and focus of a gamer lost in the digital world, creating a moody and dramatic atmosphere.
Prompt
poses dancing: intense, focused ; A gamer; close-up; gaming; a brightly lit gaming setup with a screen displaying a virtual world; cinematic
Characteristic
Shot : A young man, wearing headphones, is sitting in front of a computer screen, likely playing a video game. The image is lit with blue and red light, creating a moody atmosphere.
Aesthetic Score : 0.7
Mood : intense, focused, digital
Quality
Entropy : 6.06
Noise : 75
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.30
Image errors : There are some minor artifacts around the edges of the image and the background is slightly blurred.
Love in the Hustle: A Dance Amidst the Marketplace
In the heart of a vibrant and bustling marketplace, a couple finds a moment of intimacy and romance. Amidst the colorful chaos, they dance, their connection a stark contrast to the busy surroundings. The woman, in a floral dress, and the man, in a blue shirt and tan pants, create a romantic scene that stands out in the crowd.
Prompt
poses dancing: joyful, romantic ; A couple; medium shot; tourism; a bustling marketplace with vibrant colors and exotic goods; cinematic
Characteristic
Shot : A couple is dancing in a crowded, colorful marketplace. The woman is wearing a floral dress and the man is wearing a blue shirt and tan pants.
Aesthetic Score : 0.7
Mood : romantic, vibrant, bustling
Quality
Entropy : 6.78
Noise : 107
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No major errors. Image quality is good.
Finding Peace in the Desert Sunset
A solitary figure silhouetted against a vibrant orange sunset, practicing yoga in the vast desert landscape. The image evokes a sense of tranquility and connection to nature, highlighting the beauty of the moment and the power of mindfulness.
Prompt
poses dancing: reflective, contemplative ; A traveler; long shot; travel; a vast desert landscape with a setting sun; cinematic
Characteristic
Shot : A man in a desert landscape, striking a yoga pose at sunset, the man is silhouetted against the bright orange sunset.
Aesthetic Score : 0.7
Mood : peaceful, serene, contemplative
Quality
Entropy : 6.58
Noise : 72
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors, the image is sharp and well-exposed.
Rooftop Revelry: Young Friends Dance the Night Away
Capture the joy and energy of a night out with friends as they dance under the city lights. This image evokes a sense of carefree celebration, perfect for capturing the spirit of youth and friendship.
Prompt
poses dancing: happy, carefree ; A group of friends; medium shot; groups; a rooftop overlooking a city skyline at night; cinematic
Characteristic
Shot : A group of young people are dancing on a rooftop at night. They are all smiling and having a good time. The city lights are visible in the background.
Aesthetic Score : 0.7
Mood : joyful, carefree, celebratory
Quality
Entropy : 5.90
Noise : 97
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are some minor issues with the lighting and exposure.
A Shadow in the Alley: A Woman’s Intense Gaze Holds a Secret
Crouched in the darkness of a narrow alleyway, a woman in a hooded sweatshirt stares directly at the camera. The low lighting and her intense gaze create a palpable sense of tension and mystery. What secrets does she hold? What is she waiting for?
Prompt
poses dancing: determined, defiant ; A lone dancer; close-up; heroism; a dark alleyway with flickering streetlights; cinematic
Characteristic
Shot : A woman in a hooded sweatshirt is crouched down in a dark alleyway. She is looking directly at the camera, and there is a sense of tension in the air.
Aesthetic Score : 0.7
Mood : intense, mysterious, suspenseful
Quality
Entropy : 6.31
Noise : 89
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible artifacts or errors.
Embracing the Summit: A Moment of Joy and Freedom
A woman stands triumphantly on a mountaintop, her leg raised in the air, capturing the breathtaking view of snow-capped peaks. The scene radiates joy, adventure, and a sense of carefree abandon, inviting you to experience the thrill of reaching new heights.
Prompt
poses dancing: exhilarated, free ; A group of adventurers; wide shot; adventure; a breathtaking mountain range with a clear blue sky; cinematic
Characteristic
Shot : A woman is standing on a mountaintop with a view of snow capped mountains in the background. She is wearing a brown sweater and shorts, and has her leg raised in the air. The sky is blue and the ground is covered in grass and rocks.
Aesthetic Score : 0.7
Mood : joyful, adventurous, carefree
Quality
Entropy : 6.64
Noise : 94
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable errors in the image
Lost in Thought: A Moment of Intense Focus
A young man, shrouded in shadow, sits lost in thought, his headphones amplifying the silence. The dramatic lighting casts a mysterious glow on his face, highlighting his intense concentration. The blue surface in the background adds a touch of intrigue, leaving the viewer wondering what secrets lie within this dimly lit room.
Prompt
poses dancing: focused, strategic ; A gamer; close-up; gaming; a dimly lit room with a computer screen displaying a competitive game; cinematic
Characteristic
Shot : A young man wearing headphones is looking down and to the left, his face is partially lit, he is wearing a dark shirt and is sitting in a dimly lit room. There is a blue surface in the background, and the lighting is dark.
Aesthetic Score : 0.6
Mood : serious, focused, intense
Quality
Entropy : 5.90
Noise : 83
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Silhouette of Serenity: A Dancer Finds Tranquility on the Beach
A lone figure in black dance attire stands poised on a sandy beach, their silhouette stark against the muted blue sky. The scene evokes a sense of serene contemplation, drawing the viewer’s eye to the dancer’s graceful posture and the vastness of the ocean beyond.
Prompt
poses dancing: Solitude, contemplation, longing ; A lone figure, silhouetted against the setting sun, walks along a pristine beach, the turquoise water stretching endlessly before them.; cinematic
Characteristic
Shot : A lone figure in black dance attire performs a graceful pose on a sandy beach with the ocean in the background, the sky is a muted blue with some clouds.
Aesthetic Score : 0.7
Mood : serene, calm, introspective
Quality
Entropy : 6.35
Noise : 102
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.10
Image errors : None
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.64, which is considered good. This indicates that the model was able to understand the scene and create a shot that was relatively close to what was described in the prompt.
- Aesthetic Analysis: The model scored 0.1, which is considered very good. This means that the generated image’s aesthetic was very close to the expected aesthetic described in the prompt.
Overall, the model seems to be better at understanding the scene and creating a visually appealing image than accurately capturing the intended camera position.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/