AI's Artistic Struggle: Capturing the Essence of Cinematic Aesthetics with Imagen-v2
- 9 minutes read - 1870 wordsTable of Contents
The world of AI image generation is rapidly evolving, with models capable of creating stunning and realistic images. However, one area where AI still faces challenges is capturing the essence of specific cinematic aesthetics. This blog post explores a case study where an AI model was tasked with generating images based on prompts that included detailed scene descriptions and desired aesthetics. The results highlight the model’s strengths and weaknesses in understanding and replicating these visual styles. We’ll delve into the specific challenges faced by the AI, analyze the reasons behind its performance, and discuss the potential for future advancements in this exciting field.
Created with: imagen-v2
A Lone Figure Contemplates the Vastness of the Desert
A solitary figure stands at the peak of a sand dune, silhouetted against the setting sun. The vast desert landscape stretches out before them, evoking a sense of isolation and contemplation. The warm glow of the sunset and the fluffy clouds in the sky offer a glimmer of hope and possibility.
Prompt
French New Wave: epic, melancholic ; A lone figure, silhouetted against a setting sun; long shot; heroism; a vast, empty desert landscape; cinematic
Characteristic
Shot : A lone figure walks through a vast, red desert under a sunset sky.
Aesthetic Score : 0.7
Mood : solitude, adventure, mystery
Quality
Entropy : 6.63
Noise : 68
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : The clouds have a painted, almost artificial look, and the sand dunes appear a little too smooth. There are some slight blurring issues, especially around the figure.
A Hand Points to the Unknown, Bathed in Warm Light
A vintage scene filled with mystery and suspense. A hand, illuminated by a warm lamp, points to a map or document, leaving the details shrouded in shadow. The camera, out of focus, captures the intrigue and anticipation of the moment.
Prompt
French New Wave: intriguing, suspenseful ; A close-up of a weathered map, with a finger tracing a route; medium shot; adventure; a cluttered, dimly lit room; cinematic
Characteristic
Shot : A person’s hand points to a map on a table with a lamp in the background and a camera in the foreground.
Aesthetic Score : 0.7
Mood : mysterious, suspenseful, intrigue
Quality
Entropy : 6.58
Noise : 112
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no visible artifacts or errors in the image.
The Focus is on the Game
A hand grips a video game controller, the screen behind a blur of intense action. The lighting and selective focus create a sense of drama and excitement, capturing the player’s focused energy.
Prompt
French New Wave: intense, energetic ; A hand holding a joystick, fingers moving rapidly; close-up; gaming; a neon-lit arcade with flashing screens; cinematic
Characteristic
Shot : A hand holding a black gaming controller with red buttons, the background is blurry and has a purplish hue.
Aesthetic Score : 0.6
Mood : intense, focused, playful
Quality
Entropy : 6.71
Noise : 95
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to be slightly overexposed, causing some of the details to be lost. The lighting is also uneven, making the image look a bit unnatural.
Parisian Dreams: A Moment of Wonder at the Eiffel Tower
A young woman, lost in thought, gazes up at the iconic Eiffel Tower in Paris. Her red beret and brown coat add a touch of vintage charm, while the blurred background evokes a sense of romantic nostalgia. This dreamy scene captures the essence of Parisian wanderlust and the longing for something more.
Prompt
French New Wave: romantic, nostalgic ; A young woman, her face filled with wonder, gazing at the Eiffel Tower; medium shot; tourism; a bustling Parisian street; cinematic
Characteristic
Shot : A young woman in a red beret and a brown coat stands in a Parisian street with the Eiffel Tower in the background. She is looking up and the scene is slightly out of focus, creating a dreamy atmosphere.
Aesthetic Score : 0.8
Mood : dreamy, romantic, nostalgic
Quality
Entropy : 6.64
Noise : 48
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.80
Image errors : The skin tones are slightly unrealistic, especially around the eyes and lips. There is a slight halo effect around the woman’s head.
Lost in the Golden Field
A solitary figure stands amidst a sea of yellow grass, the train tracks stretching into the distance. The scene evokes a sense of solitude, contemplation, and nostalgia, capturing the feeling of being lost in thought and surrounded by the vastness of nature.
Prompt
French New Wave: reflective, contemplative ; A train speeding through a countryside landscape, with a lone figure looking out the window; long shot; travel; a vibrant, sun-drenched field; cinematic
Characteristic
Shot : A solitary figure stands on a dirt path beside a railroad track, looking out over vast fields of golden wheat.
Aesthetic Score : 0.7
Mood : melancholy, contemplative, serene
Quality
Entropy : 6.74
Noise : 107
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.80
Image errors : Some of the details in the image are blurry, particularly the figure and the background.
A Cozy Kitchen Gathering
Three friends share a meal in a warm and inviting kitchen, bathed in natural light. The scene exudes a sense of peace and tranquility, captured in the soft colors and intimate setting.
Prompt
French New Wave: intimate, heartwarming ; A family gathered around a table, sharing a meal, with laughter and conversation; medium shot; family; a warm, inviting kitchen; cinematic
Characteristic
Shot : A family is gathered around a dining table in a kitchen, enjoying a meal. The warm lighting and the food on the table create a cozy and inviting atmosphere.
Aesthetic Score : 0.7
Mood : warm, cozy, intimate
Quality
Entropy : 6.59
Noise : 79
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image has a slightly blurry appearance, especially in the background. The colors are slightly muted and there is some noise in the image.
A Race Against Time: A Man’s Desperate Flight Through a Crowded Street
This black and white photograph captures a moment of intense drama as a man in a long coat races through a bustling cobblestone street. The blurred background and the subject’s dynamic pose create a sense of urgency and suspense, hinting at a thrilling chase unfolding before our eyes.
Prompt
French New Wave: urgent, dramatic ; A young man, his face etched with determination, running through a crowded marketplace; medium shot; heroism; a chaotic, bustling market; cinematic
Characteristic
Shot : A man is running through a crowded street, the scene is set in what appears to be a medieval city, the image is in black and white.
Aesthetic Score : 0.6
Mood : intense, dramatic, suspenseful
Quality
Entropy : 6.72
Noise : 98
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.70
Image errors : The image has some noise and graininess, but it is not overly distracting. There are also some artifacts around the edges of the man’s hair and clothing.
Lost in Time: A Vintage Compass Beckons
A close-up shot of an antique brass compass, its needle frozen in time against a dark, mysterious backdrop. The sharp focus and vintage aesthetic evoke a sense of contemplation and intrigue, inviting you to explore the secrets it holds.
Prompt
French New Wave: mysterious, suspenseful ; A close-up of a compass needle spinning, pointing towards an unknown destination; close-up; adventure; a dimly lit, mysterious room; cinematic
Characteristic
Shot : A close-up of an antique compass with a worn and aged look, sitting on a dark surface.
Aesthetic Score : 0.7
Mood : mysterious, vintage, timeless
Quality
Entropy : 6.38
Noise : 109
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has a slight graininess and a vintage filter applied which can be considered stylistic choices. There are no major errors.
What Lies in the Shadows? Three Young People Stare into the Unknown
A trio of teenagers, caught in a dimly lit scene with blue and green hues, fix their gazes on something unseen. Their intense focus and the play of shadows create a palpable sense of mystery and anticipation. What secrets are they about to uncover?
Prompt
French New Wave: intense, focused ; A group of friends huddled around a computer screen, their faces illuminated by the glow; medium shot; gaming; a dimly lit, cluttered room; cinematic
Characteristic
Shot : Three teenagers, two boys and a girl, are huddled together, focused on something out of frame. The lighting is dramatic and the mood is intense.
Aesthetic Score : 0.7
Mood : intense, suspenseful, dramatic
Quality
Entropy : 6.31
Noise : 72
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has a slight amount of graininess and blur, which may be intended for a stylistic effect. The colors are slightly oversaturated, and the contrast is a bit too high.
Sunset Stroll: A Romantic Moment in a European Town
A couple walks hand-in-hand down a cobblestone street, bathed in the warm glow of a setting sun. The backlighting creates a sense of mystery and intrigue, while the peaceful atmosphere evokes a feeling of nostalgia and romance.
Prompt
French New Wave: romantic, nostalgic ; A couple walking hand-in-hand along a cobblestone street, their silhouettes framed by the setting sun; long shot; tourism; a romantic, picturesque town; cinematic
Characteristic
Shot : A couple walks hand-in-hand down a cobblestone street in a European village, with the setting sun casting long shadows.
Aesthetic Score : 0.7
Mood : romantic, nostalgic, warm
Quality
Entropy : 6.53
Noise : 94
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.20
Image errors : No notable artifacts or errors.
Conclusion
The results indicate that the generative AI model performed well in terms of understanding the scene and camera position, but struggled with achieving the desired aesthetic. Here’s a breakdown:
- Camera Position: The model scored 0.4, which is below the “good” range of 0.5 to 0.75. This suggests that the model didn’t fully capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.57, which falls within the “good” range. This indicates that the model was able to understand the scene and create a shot that was generally consistent with the prompt.
- Aesthetic Analysis: The model scored 0.09, which is significantly higher than the “very good” range of -0.2 to 0.1. This suggests that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall, the model shows promise in understanding the scene and camera position, but needs improvement in generating images that match the desired aesthetic.
Sources:
- https://heartofnoir.com/knowing-noir/aesthetic-of-noir/
- https://www.yellowbrick.co/blog/film/maximizing-the-visual-impact-unveiling-the-art-of-film-aesthetics
- https://www.questjournals.org/jrhss/papers/vol10-issue8/1008255260.pdf
- https://www.jstor.org/stable/3331672
- https://www.cinepoetics.fu-berlin.de/activities/workshops/2020-12-ws/index.html
- https://resource.download.wjec.co.uk/vtc/2016-17/16-17_1-22/eng/Part%201%20What%20is%20Aesthetics.pdf
- https://deepmind.google/technologies/imagen-2/