AI's Artistic Eye: Capturing the 'style-aesthetic' but Missing the Shot with Imagen-v3-fast
- 9 minutes read - 1731 wordsTable of Contents
The ‘style-aesthetic’ is a powerful tool in visual storytelling, allowing artists to evoke specific emotions and atmospheres. This aesthetic often involves dramatic lighting, bold colors, and striking compositions. In this blog post, we explore the capabilities of a generative AI model in capturing this aesthetic, analyzing its performance in translating scene descriptions into visual representations. We’ll delve into the model’s strengths and weaknesses, highlighting its ability to capture the desired aesthetic while revealing its limitations in understanding camera positions and shot composition. Through this case study, we gain insights into the evolving landscape of AI in visual storytelling and the challenges that lie ahead in bridging the gap between human creativity and machine learning.
Created with: imagen-v3-fast
A Lone Warrior in the Setting Sun
A powerful warrior, clad in full armor, stands resolute in a vast, desolate desert landscape. The dramatic lighting of the setting sun casts long shadows, creating a sense of epic grandeur and isolation. This image evokes a mood of strength, determination, and vulnerability in the face of a harsh and unforgiving world.
Prompt
style-aesthetic Stylized: Epic and melancholic ; A lone warrior; wide shot; Heroism; A desolate battlefield with a setting sun; cinematic
Characteristic
Shot : A lone warrior in full armor stands in a vast desert landscape, illuminated by the setting sun. The warrior appears to be strong and resolute, with a determined expression on his face.
Aesthetic Score : 0.7
Mood : epic, dramatic, desolate
Quality
Entropy : 6.75
Noise : 60
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is somewhat blurry, particularly around the edges of the warrior’s armor. The sand dunes in the background look somewhat artificial and lack detail. The lighting is overly dramatic, resulting in a slightly unrealistic effect.
Unveiling the Secrets of a Cave: A Treasure Trove Awaits
Venture into a dark and mysterious cave where a treasure chest overflows with golden coins. This fantastical scene evokes a sense of wonder, excitement, and danger, promising an adventurous journey.
Prompt
style-aesthetic Stylized: Excitement and wonder ; A treasure chest overflowing with gold; close-up; Adventure; A dark and mysterious cave; cinematic
Characteristic
Shot : A treasure chest overflowing with gold coins, set against the backdrop of a dark, mysterious cave.
Aesthetic Score : 0.7
Mood : fantasy, adventurous, mysterious
Quality
Entropy : 6.62
Noise : 72
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image contains some minor artifacts, particularly in the gold coins, which appear slightly blurry.
The Shadow Warrior Awaits
A futuristic, armored figure emerges from the darkness, bathed in warm orange light. Their glowing red eyes hint at a power waiting to be unleashed. This intense and mysterious scene evokes a sense of danger and anticipation.
Prompt
style-aesthetic Stylized: Triumphant and futuristic ; A player’s avatar, a powerful warrior, standing triumphantly; medium shot; Gaming; A vibrant and futuristic cityscape; cinematic
Characteristic
Shot : A futuristic, armored character stands with a dark background, illuminated by warm orange light.
Aesthetic Score : 0.8
Mood : intense, mysterious, powerful
Quality
Entropy : 6.11
Noise : 56
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image is slightly blurry, especially in the background, and the armor details seem a bit flat.
Urban Oasis: A Moment of Calm Amidst the City’s Majesty
A wide shot captures the grandeur of a modern city, with towering skyscrapers casting long shadows over an empty street. The scene evokes a sense of calm and tranquility, offering a momentary escape from the bustling urban landscape.
Prompt
style-aesthetic Stylized: Energetic and lively ; A panoramic view of a bustling city; long shot; Tourism; A vibrant and colorful cityscape; cinematic
Characteristic
Shot : A wide shot of a city street with skyscrapers in the background. The street is empty except for a few cars.
Aesthetic Score : 0.7
Mood : calm, urban, modern
Quality
Entropy : 6.95
Noise : 68
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : no visible errors
Silhouetted Against the Setting Sun: A Moment of Contemplation in the Desert
A lone figure finds solace and reflection as the sun dips below the horizon, casting long shadows across the vast expanse of sand. The scene evokes a sense of serenity, contemplation, and hope, with the dramatic silhouette highlighting the figure’s isolation and introspective state.
Prompt
style-aesthetic Stylized: Serene and contemplative ; A lone traveler gazing at a breathtaking sunset; medium shot; Travel; A vast desert landscape; cinematic
Characteristic
Shot : A lone figure sits on a sand dune in a desert landscape, watching the sunset over a vast expanse of sand.
Aesthetic Score : 0.7
Mood : serene, contemplative, hopeful
Quality
Entropy : 6.89
Noise : 56
Prompt Clip Score : 0.35
AI Evaluation
Likelihood of AI : 0.20
Image errors : No obvious artifacts or errors.
Love in Bloom: A Joyful Chase Through Wildflowers
Experience the delightful energy of a young couple lost in a world of their own, as they dash through a vibrant field of wildflowers. Their laughter and love-filled glances create a romantic and playful mood, while the surrounding trees add depth and drama to the scene.
Prompt
style-aesthetic Stylized: Joyful and carefree ; A medium shot of two friends, their laughter echoing through the park as they playfully chase each other through a field of wildflowers.; cinematic
Characteristic
Shot : A young couple is running through a field of wildflowers. The woman is on the left, and the man is on the right. They are both smiling and looking at each other. There are trees in the background.
Aesthetic Score : 0.7
Mood : happy, playful, romantic
Quality
Entropy : 6.78
Noise : 117
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no noticeable artifacts or errors in the image.
Lost in the Storm’s Embrace
A solitary figure stands on a windswept cliff, gazing out at the turbulent ocean. The dramatic sky, heavy with brooding clouds, mirrors the melancholic mood of the scene, evoking a sense of isolation and contemplation.
Prompt
style-aesthetic Stylized: Dramatic and powerful ; A lone figure standing on a cliff overlooking a vast ocean; long shot; Heroism; A stormy sea with dramatic clouds; cinematic
Characteristic
Shot : A lone figure stands on a grassy cliff overlooking the vast ocean. The sky is filled with dramatic, brooding clouds.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, dramatic
Quality
Entropy : 6.92
Noise : 76
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.30
Image errors : Some minor noise in the image, possibly due to post-processing.
Unveiling the Secrets of the Past: A Vintage Map Beckons
A close-up of a weathered vintage map, adorned with pins marking forgotten destinations, sits on a wooden table bathed in soft, mysterious light. The shallow depth of field draws your eye to the map, hinting at untold stories and adventures waiting to be discovered.
Prompt
style-aesthetic Stylized: Intriguing and mysterious ; A map with pins marking locations of hidden treasures; close-up; Adventure; A dimly lit room with antique furniture; cinematic
Characteristic
Shot : A close-up of a vintage map with pins marking locations, sitting on a wooden table in a dimly lit room. The focus is on the map, with the surrounding objects slightly blurred.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, nostalgic
Quality
Entropy : 6.65
Noise : 43
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight noise in the darker areas, but otherwise the image is clean
The Archer’s Focus: A Moment of Intensity in the Dark Forest
A male archer stands poised in a shadowy forest, his bow drawn and arrow aimed. The low light and his intense expression create a palpable sense of anticipation and mystery. This image captures a moment of focused determination, leaving the viewer wondering what lies ahead.
Prompt
style-aesthetic Stylized: Intense and focused ; A player’s character, a skilled archer, aiming at a target; close-up; Gaming; A dark and mysterious forest; cinematic
Characteristic
Shot : A male archer in a dark forest, aiming with his bow and arrow
Aesthetic Score : 0.7
Mood : intense, focused, mysterious
Quality
Entropy : 6.60
Noise : 56
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.90
Image errors : No visible errors
Friends Celebrate with Laughter and Champagne
A heartwarming scene of four friends enjoying a celebratory dinner at a restaurant, toasting with glasses of champagne. The warm lighting and balanced composition create a joyful and inviting atmosphere.
Prompt
style-aesthetic Stylized: Social and celebratory ; A group of friends enjoying a meal at a restaurant with a view; medium shot; Tourism; A bustling city street with vibrant lights; cinematic
Characteristic
Shot : Four friends are having dinner at a restaurant, toasting with glasses of champagne.
Aesthetic Score : 0.7
Mood : joyful, celebratory, friendly
Quality
Entropy : 6.66
Noise : 61
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors in the image.
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.4, which is considered below average. This suggests that the model didn’t accurately capture the intended camera positions described in the prompt.
- Shot Analysis: The model scored 0.495, which is also below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create the expected shot composition.
- Aesthetic Analysis: The model scored 0.04, which is considered very good. This means that the generated image closely matched the desired aesthetic style.
Overall, the model seems to be better at capturing the desired aesthetic style than understanding the camera positions and shot composition. This suggests that the model might need further training to improve its ability to interpret and translate complex visual instructions.
Sources:
- https://heartofnoir.com/knowing-noir/aesthetic-of-noir/
- https://www.yellowbrick.co/blog/film/maximizing-the-visual-impact-unveiling-the-art-of-film-aesthetics
- https://www.questjournals.org/jrhss/papers/vol10-issue8/1008255260.pdf
- https://www.jstor.org/stable/3331672
- https://www.cinepoetics.fu-berlin.de/activities/workshops/2020-12-ws/index.html
- https://resource.download.wjec.co.uk/vtc/2016-17/16-17_1-22/eng/Part%201%20What%20is%20Aesthetics.pdf
- https://deepmind.google/technologies/imagen-3/