AI's Artistic Eye: Capturing the Essence of a Scene, But Missing the Aesthetic with Imagen-v3
- 9 minutes read - 1737 wordsTable of Contents
The world of generative AI is rapidly evolving, with models capable of creating stunning images from simple text prompts. However, while these models excel in capturing the technical aspects of a scene, such as camera position and shot composition, they often fall short when it comes to capturing the desired aesthetic style. This blog post explores the strengths and weaknesses of generative AI in this domain, using a specific example to illustrate the challenges and opportunities that lie ahead.
Created with: imagen-v3
Silhouettes of Hope in the Desert Sunset
A solitary figure walks into the vastness of the desert as the sun sets, casting a long shadow that speaks of both melancholy and a glimmer of hope. The dramatic silhouette against the fiery sky evokes a sense of mystery and isolation, leaving the viewer to ponder the journey ahead.
Prompt
style-aesthetic French New Wave: epic, melancholic ; A lone figure, silhouetted against a setting sun; long shot; heroism; a vast, empty desert landscape; cinematic
Characteristic
Shot : A lone figure walks away from the camera into the desert at sunset.
Aesthetic Score : 0.8
Mood : melancholy, desolate, hopeful
Quality
Entropy : 5.60
Noise : 74
Prompt Clip Score : 0.32
AI Evaluation
Likelihood of AI : 0.10
Image errors : None
A Hand Points the Way: Unraveling a Vintage Mystery
A single hand, illuminated by a soft lamplight, points towards a map spread across a table. The vintage setting, complete with an alarm clock and a flickering candle, adds to the air of mystery and adventure. What secrets lie hidden within this map, and where will this journey lead?
Prompt
style-aesthetic French New Wave: intriguing, suspenseful ; A close-up of a weathered map, with a finger tracing a route; medium shot; adventure; a cluttered, dimly lit room; cinematic
Characteristic
Shot : A hand is pointing at a map on a table with a lamp, an alarm clock, and a candle in the background.
Aesthetic Score : 0.7
Mood : mysterious, vintage, adventurous
Quality
Entropy : 6.21
Noise : 68
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.30
Image errors : There is a slight blur in the background, indicating possible limitations in the camera or lighting.
The Joy of the Joystick: A Nostalgic Close-Up
A low-angle shot captures the essence of retro gaming with a close-up of a hand on an arcade joystick. The blurred background adds a sense of depth and intrigue, transporting you back to a time of pixelated adventures and endless quarters.
Prompt
style-aesthetic French New Wave: intense, energetic ; A hand holding a joystick, fingers moving rapidly; close-up; gaming; a neon-lit arcade with flashing screens; cinematic
Characteristic
Shot : A close-up of a person’s hand on an arcade game joystick. The image is shot from a low angle, and the joystick is in focus while the rest of the scene is blurred.
Aesthetic Score : 0.7
Mood : retro, nostalgic, playful
Quality
Entropy : 6.02
Noise : 79
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors in this image
Parisian Dreams: A Moment of Joy at the Eiffel Tower
A young woman captures the magic of Paris, her smile reflecting the wonder of the Eiffel Tower. The blurred background emphasizes the focus on her happiness and the iconic landmark, creating a romantic and nostalgic atmosphere.
Prompt
style-aesthetic French New Wave: romantic, nostalgic ; A young woman, her face filled with wonder, gazing at the Eiffel Tower; medium shot; tourism; a bustling Parisian street; cinematic
Characteristic
Shot : A young woman stands in front of the Eiffel Tower, looking up at it with a smile. The background is blurred, and the focus is on the woman’s face and the iconic tower.
Aesthetic Score : 0.7
Mood : happy, romantic, nostalgic
Quality
Entropy : 6.74
Noise : 81
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no visible artifacts or errors in the image.
Chasing the Sunset, One Yellow Field at a Time
A tranquil train journey unfolds, with a field of vibrant yellow flowers blurring past the window. The nostalgic feeling of adventure is amplified by the dynamic motion blur, capturing the fleeting beauty of the moment.
Prompt
style-aesthetic French New Wave: reflective, contemplative ; A train speeding through a countryside landscape, with a lone figure looking out the window; long shot; travel; a vibrant, sun-drenched field; cinematic
Characteristic
Shot : A view from the window of a moving train, looking out at a field of yellow flowers in the distance.
Aesthetic Score : 0.7
Mood : tranquil, nostalgic, adventurous
Quality
Entropy : 6.48
Noise : 91
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is slight noise in the image and some minor artifacts in the sky.
Intimate Gathering: A Cozy Kitchen Scene
A group of four friends share a meal and conversation in a dimly lit kitchen, creating a warm and inviting atmosphere. The low lighting adds a sense of intimacy and closeness, highlighting the heartwarming connection between them.
Prompt
style-aesthetic French New Wave: intimate, heartwarming ; A family gathered around a table, sharing a meal, with laughter and conversation; medium shot; family; a warm, inviting kitchen; cinematic
Characteristic
Shot : A group of four people are sitting around a table, eating and talking. The setting is a dimly lit kitchen with a warm, inviting atmosphere.
Aesthetic Score : 0.6
Mood : cozy, intimate, heartwarming
Quality
Entropy : 6.00
Noise : 75
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.10
Image errors : Some graininess in the shadows
Blood-Stained Escape: A Man Flees Through a Crowded Market
A young man, covered in blood, races through a bustling market street, his panicked gaze over his shoulder hinting at a desperate escape. The image is charged with intensity and suspense, the shallow depth of field focusing on the man’s frantic expression and the stark contrast of blood against the vibrant market backdrop.
Prompt
style-aesthetic French New Wave: urgent, dramatic ; A young man, his face etched with determination, running through a crowded marketplace; medium shot; heroism; a chaotic, bustling market; cinematic
Characteristic
Shot : A young man, covered in blood, runs through a crowded market street, looking over his shoulder in a panicked way.
Aesthetic Score : 0.7
Mood : intense, urgent, suspenseful
Quality
Entropy : 6.30
Noise : 74
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.10
Image errors : No noticeable image errors.
Lost in the Shadows: A Compass Beckons
A vintage compass, bathed in low-key light, rests on a textured surface, hinting at a journey yet to be taken. The close-up framing and dramatic lighting create a sense of mystery and intrigue, emphasizing the compass as a symbol of exploration and direction.
Prompt
style-aesthetic French New Wave: mysterious, suspenseful ; A close-up of a compass needle spinning, pointing towards an unknown destination; close-up; adventure; a dimly lit, mysterious room; cinematic
Characteristic
Shot : Close-up of a compass on a textured surface, with a dark background
Aesthetic Score : 0.7
Mood : classic, vintage, adventurous
Quality
Entropy : 5.92
Noise : 52
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.30
Image errors : Slight chromatic aberration visible around the edges of the compass needle.
Intense Focus: Four Men Huddle Around a Computer Screen
A dimly lit room, four young men gathered around a computer screen, their expressions focused and intense. The atmosphere is charged with suspense, leaving the viewer wondering what they are witnessing. The close-up shot amplifies the anticipation, drawing you into the heart of the action.
Prompt
style-aesthetic French New Wave: intense, focused ; A group of friends huddled around a computer screen, their faces illuminated by the glow; medium shot; gaming; a dimly lit, cluttered room; cinematic
Characteristic
Shot : Four young men are gathered around a computer screen, looking intently at something on the display. The lighting is dim and blue, giving the scene a somewhat mysterious or tense atmosphere.
Aesthetic Score : 0.6
Mood : intense, focused, suspenseful
Quality
Entropy : 6.15
Noise : 74
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.30
Image errors : No visible artifacts or errors
Silhouettes of Love Against a Sunset Sky
A romantic couple strolls hand-in-hand down a charming cobblestone alleyway, their silhouettes framed against a breathtaking sunset. The narrow path leads the eye towards the horizon, creating a sense of peace and nostalgia. This captivating scene evokes a feeling of love and longing, capturing the essence of a perfect moment.
Prompt
style-aesthetic French New Wave: romantic, nostalgic ; A couple walking hand-in-hand along a cobblestone street, their silhouettes framed by the setting sun; long shot; tourism; a romantic, picturesque town; cinematic
Characteristic
Shot : A couple walking away from the camera down a cobblestone alleyway towards a beautiful sunset.
Aesthetic Score : 0.7
Mood : romantic, peaceful, nostalgic
Quality
Entropy : 5.87
Noise : 87
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : No visible artifacts or errors
Conclusion
The results show that the generative AI model performed well in terms of camera position and shot analysis, but struggled with aesthetic analysis.
Here’s a breakdown:
- Camera Position: The model scored 0.5, which falls within the “good” range (0.5-0.75). This indicates that the model was able to accurately capture the camera position described in the prompt.
- Shot Analysis: The model scored 0.57, also within the “good” range. This suggests that the model understood the scene described in the prompt and was able to create an image that reflected that understanding.
- Aesthetic Analysis: The model scored 0.06, which is significantly lower than the “very good” range (-0.2 to 0.1). This indicates that the generated image did not match the expected aesthetic style as closely as it did with the camera position and shot analysis.
Overall, the model demonstrates a good understanding of camera position and shot composition, but needs improvement in capturing the desired aesthetic style.
Sources:
- https://heartofnoir.com/knowing-noir/aesthetic-of-noir/
- https://www.yellowbrick.co/blog/film/maximizing-the-visual-impact-unveiling-the-art-of-film-aesthetics
- https://www.questjournals.org/jrhss/papers/vol10-issue8/1008255260.pdf
- https://www.jstor.org/stable/3331672
- https://www.cinepoetics.fu-berlin.de/activities/workshops/2020-12-ws/index.html
- https://resource.download.wjec.co.uk/vtc/2016-17/16-17_1-22/eng/Part%201%20What%20is%20Aesthetics.pdf
- https://deepmind.google/technologies/imagen-3/