AI's Artistic Journey: Capturing Scenes, Missing the Mood with Imagen-v3-fast
- 9 minutes read - 1871 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunning visuals based on text prompts. However, achieving a perfect balance between technical accuracy and artistic expression remains a challenge. This blog post delves into the results of a recent experiment, where a generative AI model was tasked with creating images based on specific scenes and camera positions. While the model demonstrates an understanding of scene composition and camera angles, it falls short in capturing the intended aesthetic, highlighting the ongoing challenges in AI image generation. This exploration will delve into the specific areas where the model excels and where it struggles, providing insights into the future of AI-powered art.
Created with: imagen-v3-fast
Silhouetted Against the Setting Sun: A Warrior’s Epic Stand
A lone warrior, clad in dark armor, stands defiant against the backdrop of a fiery sunset in a desolate desert landscape. The dramatic silhouette evokes a sense of power and epic struggle, capturing a moment of intense drama.
Prompt
poses staggered-pose: Epic, determined ; A lone warrior; wide shot; Heroism; A desolate battlefield with a setting sun; cinematic
Characteristic
Shot : A lone warrior stands in a desert landscape with a sunset in the background. The warrior is clad in dark armor and looks imposing.
Aesthetic Score : 0.7
Mood : epic, dramatic, powerful
Quality
Entropy : 6.67
Noise : 69
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.90
Image errors : Slight blurring in the background, the sand appears slightly artificial.
Unveiling the Secrets of the Jungle Temple
A group of intrepid adventurers venture deep into a mysterious jungle, their path leading them towards an ancient Mayan temple. The lush greenery and low light create an atmosphere of suspense and adventure, drawing the viewer’s eye towards the unknown.
Prompt
poses staggered-pose: Curious, adventurous ; A group of explorers; medium shot; Adventure; A dense jungle with ancient ruins in the background; cinematic
Characteristic
Shot : A group of adventurers are walking through a jungle pathway towards an ancient Mayan temple.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, suspenseful
Quality
Entropy : 6.56
Noise : 95
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image appears slightly over-saturated and has a slight digital graininess. There’s some slight blurring around the edges of the image.
Lost in the Code: A Moment of Intense Focus
A young man, shrouded in darkness, is completely absorbed in his work. The dim lighting and his serious expression create a palpable sense of concentration, highlighting the power of immersion in the digital world.
Prompt
poses staggered-pose: Focused, intense ; A gamer; close-up; Gaming; A brightly lit gaming setup with a monitor displaying a thrilling game; cinematic
Characteristic
Shot : A young man in a black hoodie and headphones is sitting in front of a computer, focused on his screen. The dimly lit room and his serious expression create a sense of immersion and focus.
Aesthetic Score : 0.6
Mood : focused, serious, techy
Quality
Entropy : 6.72
Noise : 48
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible errors. The image appears to be clean and well-produced.
A Moment of Serenity Amidst Majestic Peaks
A lone figure stands dwarfed by the grandeur of a snow-capped mountain range, the vibrant blue sky and subtle clouds creating a sense of calm and awe. This image captures the vastness of nature and the human experience, leaving you feeling contemplative and inspired.
Prompt
poses staggered-pose: Awe, contemplation ; A lone figure stands silhouetted against the vast, snow-capped mountain range, the sky a vibrant blue.; cinematic
Characteristic
Shot : A lone figure stands in the foreground of a majestic mountain range with a snowy peak dominating the center. The sky is a vibrant blue with subtle cloud formations, giving the image a sense of calm and vastness.
Aesthetic Score : 0.8
Mood : serene, awe-inspiring, contemplative
Quality
Entropy : 6.79
Noise : 55
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.90
Image errors : The edges of the mountain peaks and the figure exhibit a slight pixelation, which is more noticeable on closer inspection.
A Lone Hiker Embarks on a Serene Mountain Journey
A solitary figure traverses a winding mountain road, their journey leading towards a distant village. The breathtaking scenery, with lush green slopes and a clear blue sky, evokes a sense of adventure and hope. The long shot emphasizes the vastness of the landscape and the smallness of the hiker, creating a dramatic effect that captures the beauty and tranquility of the moment.
Prompt
poses staggered-pose: Free-spirited, adventurous ; A backpacker; long shot; Travel; A winding road leading to a distant village nestled in a valley; cinematic
Characteristic
Shot : A lone hiker walks down a winding road in a mountain valley. The mountains are covered in green grass and the sky is blue. The road leads to a small village in the distance.
Aesthetic Score : 0.8
Mood : serene, adventurous, hopeful
Quality
Entropy : 6.75
Noise : 84
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image shows some signs of being AI generated. The mountains and clouds lack natural texture and the hiker’s figure is somewhat distorted.
Red Curtain Backdrop Adds Drama to Fun Group Photo
A group of six young adults strike a casual pose in front of a striking red curtain backdrop. The contrast between the formal setting and their relaxed demeanor creates a playful and festive mood.
Prompt
poses staggered-pose: Energetic, celebratory ; A group of friends; medium shot; Groups; A lively party scene with people dancing and laughing; cinematic
Characteristic
Shot : A group of six young adults are standing together in a room with a red curtain backdrop and a window behind them. The group is looking at the camera, with some smiling.
Aesthetic Score : 0.6
Mood : casual, festive, fun
Quality
Entropy : 6.42
Noise : 87
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The lighting is a bit flat and there is some noise in the image, especially in the shadows. The composition is a bit uneven and the group is not centered.
City’s Guardian at Sunset
A superhero, clad in blue and gold, stands tall against a breathtaking sunset cityscape. His determined expression and powerful pose radiate heroism and strength, promising a thrilling battle against the forces of darkness.
Prompt
poses staggered-pose: Powerful, confident ; A superhero; close-up; Heroism; A cityscape with towering skyscrapers and a dramatic sky; cinematic
Characteristic
Shot : A superhero in a blue and gold costume stands in front of a city skyline at sunset. He is wearing a mask, and he has a serious expression on his face. The image has a strong, heroic, and dynamic feel.
Aesthetic Score : 0.7
Mood : powerful, heroic, determined
Quality
Entropy : 6.65
Noise : 75
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 1.00
Image errors : The image appears to be digitally rendered, and while the art style is decent, there are some minor artifacts on the superhero’s costume and the background. The lines are a little too sharp and the textures lack depth.
A Journey into the Unknown: Five Figures Walk Towards the Setting Sun
A group of five figures, clad in futuristic attire, traverse a desolate desert landscape. Monument Valley-like rock formations loom in the distance, while a vibrant sunset paints the sky. The scene evokes a sense of mystery, hope, and adventure, leaving viewers to ponder the figures’ destination and the secrets held within the vast, unknown expanse.
Prompt
poses staggered-pose: Hopeful, determined ; A group of adventurers; wide shot; Adventure; A vast desert landscape with a lone oasis in the distance; cinematic
Characteristic
Shot : Five figures in futuristic/post-apocalyptic clothing walk away from the viewer across a desert landscape. Monument Valley-like rock formations rise in the background, and a setting sun fills the sky.
Aesthetic Score : 0.7
Mood : mysterious, hopeful, adventurous
Quality
Entropy : 6.80
Noise : 71
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.90
Image errors : No noticeable artifacts or errors, but the color grading is somewhat flat and could be more dynamic.
Focused Determination: A Young Man Immersed in Work
A young man sits intently at his computer, headphones on, fingers flying across the keyboard. The lighting highlights his concentration, while a blurred figure in the background suggests a shared workspace or a sense of community. The mood is one of serious focus and determination, capturing the essence of a dedicated individual pursuing their goals.
Prompt
poses staggered-pose: Focused, strategic ; A gamer; close-up; Gaming; A dimly lit room with a computer screen displaying a complex strategy game; cinematic
Characteristic
Shot : A young man is sitting in front of a computer wearing headphones, focused and typing on the keyboard. There is another person in the background, also wearing headphones, but out of focus.
Aesthetic Score : 0.7
Mood : serious, focused, determined
Quality
Entropy : 6.36
Noise : 50
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight noise in the background and some blur in the out-of-focus person.
Sunset Romance on the Beach
A couple embraces on a golden sunset beach, their love story unfolding in the warm glow. The scene is filled with romantic intimacy and serene beauty.
Prompt
poses staggered-pose: Romantic, peaceful ; medium shot; Travel; A romantic sunset over a beach with the ocean waves crashing in the background; cinematic
Characteristic
Shot : A couple is standing on a beach at sunset, embracing and looking at each other.
Aesthetic Score : 0.8
Mood : romantic, intimate, serene
Quality
Entropy : 6.76
Noise : 81
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant artifacts or errors visible.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.41, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t perfectly capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.62, which falls within the “good” range. This indicates that the model was able to understand the scene described in the prompt and create an image that reflects it reasonably well.
- Aesthetic Analysis: The model scored 0.04, which is far from the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic significantly deviated from the expected aesthetic based on the prompt.
Overall, the model shows promise in understanding scene composition and camera positioning, but needs improvement in generating images that match the desired aesthetic.
Sources:
- https://www.writerswrite.co.za/cheat-sheets-for-writing-body-language/
- https://mads3df.wordpress.com/2013/09/04/storytelling-poses/
- https://www.pinterest.com/pegasister890/character-poses/
- https://www.youtube.com/watch?v=udky6ANxWws
- https://maven.com/articles/storytelling-techniques
- https://deepmind.google/technologies/imagen-3/