AI's Artistic Vision: Capturing the Scene, Not the Shot with Midjourney
- 9 minutes read - 1842 wordsTable of Contents
In the realm of AI-generated imagery, the ability to accurately translate camera positions and shot composition is crucial for creating visually compelling and impactful scenes. This case study explores the performance of a generative AI model in this regard, highlighting its strengths and weaknesses. While the model demonstrates a strong ability to capture the desired aesthetic, it struggles with accurately interpreting camera positions and shot composition instructions. This discrepancy raises questions about the current limitations of AI in understanding and implementing these technical aspects of visual storytelling. We delve into the reasons behind this challenge and discuss the implications for the future of AI-generated imagery.
Created with: midjourney
Silhouetted Against the Sunset: A Moment of Solitude and Hope
A lone figure stands atop a snow-capped mountain, bathed in the golden light of the setting sun. The vast sea of clouds below creates a breathtaking backdrop, while the silhouette of the figure evokes a sense of inspiration, hope, and serene solitude.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Epic, triumphant, awe-inspiring ; A lone figure standing on a mountain peak; wide shot; heroism; dramatic cloudscape; cinematic
Characteristic
Shot : A lone figure stands on a mountain peak, overlooking a vast sea of clouds at sunset.
Aesthetic Score : 0.8
Mood : inspirational, serene, majestic
Quality
Entropy : 6.60
Noise : 84
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.90
Image errors : The image is slightly overexposed in the sky, causing the clouds to appear bleached out. The figure is also somewhat flat and lacks detail.
Will They Find the Treasure? Mysterious Hand Reaches for Chest in Cave
A hand, shrouded in mystery, reaches for a treasure chest nestled amongst rocks. The scene evokes a sense of adventure and suspense, leaving viewers wondering what secrets lie within the chest and what awaits the hand’s owner.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Intriguing, suspenseful, adventurous ; A hand reaching for a treasure chest; close-up; adventure; dark, mysterious cave; cinematic
Characteristic
Shot : A hand reaches out from the darkness towards a chest, placed on the ground in a cave
Aesthetic Score : 0.6
Mood : mysterious, suspenseful, dark
Quality
Entropy : 6.32
Noise : 114
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some noise and graininess, especially in the darker areas, minor artifacts around the hand
Cyberpunk Gamer: Immersed in the Neon Glow
A close-up shot captures the intensity of a gamer, their hands gripping a controller bathed in vibrant red and blue light. The cyberpunk aesthetic creates a dramatic and immersive atmosphere, highlighting the focus and passion of the player.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Focused, intense, exhilarating ; A player’s hands manipulating a controller; close-up; gaming; brightly lit gaming room; cinematic
Characteristic
Shot : Close-up of a person’s hands holding a video game controller in a dimly lit room with red and blue lighting. A computer screen is blurred in the background.
Aesthetic Score : 0.6
Mood : intense, focused, playful
Quality
Entropy : 6.42
Noise : 103
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly overexposed and the colors are a bit oversaturated.
Times Square’s Towering Majesty: A Fisheye Perspective
Experience the vibrant energy of Times Square through a captivating fisheye lens. The towering buildings seem to reach for the sky, while the bustling street life below adds to the scene’s dynamic energy. This photo captures the iconic urban landscape in all its distorted glory.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Energetic, exciting, overwhelming ; A bustling city street; wide shot; tourism; vibrant, colorful buildings; cinematic
Characteristic
Shot : A fisheye view of Times Square in New York City, with yellow taxis in the foreground and towering buildings and billboards in the background.
Aesthetic Score : 0.6
Mood : urban, busy, vibrant
Quality
Entropy : 6.46
Noise : 89
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some minor artifacts are present, particularly around the edges of the image, which is common with fisheye lenses.
Tranquil Journey Through Rolling Hills
A serene view of rolling hills and fields captured from a train window. The muted colors and cloudy sky create a peaceful atmosphere, while the blur of the landscape evokes a sense of motion and the passage of time.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Tranquil, contemplative, nostalgic ; A train window view of passing landscapes; medium shot; travel; rolling hills and fields; cinematic
Characteristic
Shot : A view of rolling green hills through a train window. The photo is taken from a moving train, resulting in a slight motion blur.
Aesthetic Score : 0.6
Mood : tranquil, calm, serene
Quality
Entropy : 6.17
Noise : 100
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : Slight motion blur due to the movement of the train. The blur adds a sense of dynamism to the image but also makes some elements less sharp.
Campfire Nights: Warmth, Friendship, and a Sky Full of Stars
A group of friends gather around a crackling campfire, their laughter echoing under a breathtaking night sky. The warmth of the flames and the twinkling stars create a sense of peace and togetherness, capturing the essence of a perfect evening with loved ones.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Warm, intimate, joyful ; A group of friends laughing and talking around a campfire; medium shot; groups; starry night sky; cinematic
Characteristic
Shot : A group of friends are gathered around a campfire under a night sky with the milky way visible. They are laughing and enjoying each other’s company.
Aesthetic Score : 0.7
Mood : joyful, cozy, adventurous
Quality
Entropy : 5.21
Noise : 65
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to have slight noise and a minor amount of blur, particularly on the faces of the subjects. This may be due to the low light conditions.
Adrenaline Rush: Capturing the Thrill of Flight
Experience the intensity of takeoff with this captivating image. The blurred runway and dynamic composition evoke a sense of exhilarating speed, leaving you feeling the rush of adrenaline.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Thrilling, exhilarating, powerful ; A pilot’s view of the cockpit during takeoff; close-up; heroism; runway and clouds; cinematic
Characteristic
Shot : The image shows a cockpit of an airplane with a view of the runway from the pilot’s seat. The airplane is accelerating, and the view out of the window is blurry due to the speed. The sky is cloudy with a warm golden light.
Aesthetic Score : 0.7
Mood : dramatic, intense, exciting
Quality
Entropy : 6.44
Noise : 117
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.40
Image errors : The image has some blurriness, especially in the background and along the edges of the frame, suggesting it might have been processed or edited. The motion blur also appears to be a bit exaggerated and unnatural.
Sunlit Passage: A Diver’s Tranquil Journey Through Coral
A scuba diver glides through a narrow underwater passage, bathed in sunlight streaming from above. The vibrant coral reef on either side and the numerous fish swimming around create a serene and mysterious atmosphere. The contrast between the dark passage and the bright light adds a dramatic touch to this tranquil scene.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Peaceful, serene, awe-inspiring ; A diver exploring a coral reef; wide shot; adventure; colorful fish and marine life; cinematic
Characteristic
Shot : A scuba diver swims through a narrow underwater passage with coral walls and fish, sunlight streams down from above creating a dramatic effect.
Aesthetic Score : 0.8
Mood : serene, mysterious, adventurous
Quality
Entropy : 6.45
Noise : 82
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some minor artifacts and blur, particularly on the fish and coral. The diver’s silhouette is well-defined but could be sharper.
Lost in the Game: A Moment of Immersive Focus
A player is deeply engrossed in a fantastical video game, the controller held firmly in their hands, while the vibrant, magical forest on the screen blurs into the background. The image captures the immersive and focused nature of gaming, highlighting the contrast between the real and the virtual.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Immersive, engaging, exciting ; A gamer’s screen displaying a virtual world; close-up; gaming; vibrant, fantastical landscape; cinematic
Characteristic
Shot : A person is playing a video game with a controller. The game world is a colorful and fantastical landscape with trees and glowing lights.
Aesthetic Score : 0.6
Mood : immersive, playful, mysterious
Quality
Entropy : 6.80
Noise : 100
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable errors in the image.
Golden Hour Serenity: A Sunset Symphony Over the Ocean
Capture the breathtaking beauty of a wide-angle sunset over the ocean, with crashing waves and a serene atmosphere. The golden light creates a sense of grandeur and awe, making this a perfect image for a peaceful and calming mood.
Prompt
Point-of-view-POV-shot Point-of-view (POV) shot: Romantic, peaceful, serene ; A panoramic view of a sunset over a beach; wide shot; travel; golden light and waves; cinematic
Characteristic
Shot : An aerial view of a beach and ocean at sunset with waves crashing on the shore. The sun is setting behind the horizon, casting a golden glow on the water and sky.
Aesthetic Score : 0.8
Mood : tranquil, serene, peaceful
Quality
Entropy : 6.50
Noise : 113
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.20
Image errors : Some noise and artifacts in the image, particularly in the sky and water
Conclusion
The generative AI model performed well in terms of understanding camera positions and shot composition, but struggled with achieving the desired aesthetic. Here’s a breakdown:
Camera Position:
- Score: 0.36
- Interpretation: This score falls below the “good” range of 0.5 to 0.75. It suggests the model had some difficulty accurately translating the camera positions described in the prompt into the generated image.
Shot Analysis:
- Score: 0.495
- Interpretation: This score is also below the “good” range, indicating the model had some trouble understanding and implementing the shot composition specified in the prompt.
Aesthetic Analysis:
- Score: 0.13
- Interpretation: This score is within the “very good” range of -0.2 to 0.1. It suggests the generated image’s aesthetic closely matched the expected aesthetic, despite the model’s struggles with camera position and shot composition.
Overall:
The model demonstrated a strong ability to capture the desired aesthetic, but it needs improvement in accurately interpreting camera positions and shot composition instructions.