AI's Artistic Eye: Capturing Scenes, Missing the Shot with Midjourney
- 10 minutes read - 1988 wordsTable of Contents
In the realm of artificial intelligence, image generation has emerged as a captivating field, pushing the boundaries of creativity and technological prowess. One of the most intriguing aspects of this technology is its ability to translate text descriptions into visually compelling images. This blog post delves into a study that examines the performance of a generative AI model in understanding and recreating scenes based on text prompts. The study focuses on the model’s ability to capture the intended camera position, scene aesthetics, and overall scene comprehension. Through a series of prompts, we explore the model’s strengths and weaknesses, revealing insights into its artistic capabilities and limitations.
Created with: midjourney
Lost in the Autumn Leaves: A Moment of Solitude
A solitary figure finds peace amidst the falling leaves, their small form dwarfed by the vastness of the park. The image captures a sense of melancholy and contemplation, highlighting the beauty and isolation of the moment.
Prompt
Sadness Downcast eyes, slight frown: Melancholy, loneliness ; A lone figure; eye-level; Single Person; Empty park bench with fallen leaves; cinematic
Characteristic
Shot : A lone figure sits on a bench in a park, facing away from the camera. The ground is covered in fallen leaves and the trees in the background are blurred.
Aesthetic Score : 0.7
Mood : melancholy, solitude, contemplative
Quality
Entropy : 6.91
Noise : 110
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image appears to be slightly overexposed. There are also some minor artifacts in the leaves.
The Dark Knight’s Lonely Vigil
A solitary figure, shrouded in darkness, sits on a rain-soaked rooftop, gazing out at the city below. The scene evokes a sense of brooding mystery, leaving the viewer to wonder about the figure’s identity and purpose.
Prompt
Sadness Tired, defeated: Despair, disillusionment ; A superhero in their costume; eye-level; Hero; City skyline at night, rain falling; cinematic
Characteristic
Shot : A lone figure, possibly Batman, sits on a rooftop overlooking a rainy city at night. The silhouette of the figure and the cityscape are prominent, while the rain adds to the gloomy atmosphere.
Aesthetic Score : 0.7
Mood : gloomy, somber, melancholic
Quality
Entropy : 6.37
Noise : 97
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.70
Image errors : The rain effect is slightly artificial and the city skyline looks somewhat blurry and lacking in detail.
A Moment of Quiet Melancholy
A young woman sits alone at a kitchen table, her head in her hands, a cup of tea untouched before her. The soft lighting and intimate composition evoke a sense of loneliness and contemplation, capturing a moment of quiet sadness.
Prompt
Sadness Tears streaming down her face: Hopelessness, grief ; A woman sitting at a kitchen table; eye-level; Normal People; Empty coffee cup, unwashed dishes; cinematic
Characteristic
Shot : A woman is sitting at a kitchen table, looking down and resting her head on her hands, with a teacup on the table in front of her. The lighting is dim, and the overall mood is somber.
Aesthetic Score : 0.7
Mood : somber, melancholic, introspective
Quality
Entropy : 6.11
Noise : 95
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable errors or artifacts in the image.
Lost in the Code: A Young Man’s Intense Focus in a Dimly Lit Room
A young man, shrouded in mystery, sits hunched over a computer screen in a dimly lit room. His intense gaze and the blurred background create an atmosphere of intrigue, leaving the viewer wondering what secrets lie within the code. The pizza and soda scattered on the table hint at a long night of dedication, or perhaps, a hidden agenda.
Prompt
Sadness Blank stare, unblinking eyes: Isolation, withdrawal ; A gamer hunched over their computer; close-up; Gamer; Empty pizza boxes, energy drink cans; cinematic
Characteristic
Shot : A young man in a dark hoodie sits at a desk with a computer screen in the background. He is looking at the camera with a serious expression. There is pizza and soda cans in the foreground.
Aesthetic Score : 0.6
Mood : dark, intense, mysterious
Quality
Entropy : 6.59
Noise : 97
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears to be slightly blurry, particularly around the edges. The lighting is also uneven, with some areas being too dark.
A Boy’s Somber Gaze in a Shadowy Hallway
A young boy stands in a doorway, his expression somber as he looks towards the camera. The dimly lit hallway behind him, with its shadowy corners and a single window letting in a sliver of light, creates an atmosphere of melancholy and suspense. The image evokes a sense of mystery, leaving the viewer wondering what secrets lie within the shadows.
Prompt
Sadness Lip trembling, eyes welling up: Loneliness, abandonment ; A child standing in a doorway; eye-level; Single Person; Empty hallway, dim lighting; cinematic
Characteristic
Shot : A young boy stands in a dimly lit doorway, looking out into a hallway with a window at the end. The hallway is empty and the light coming in from the window casts a soft glow on the floor. The boy’s face is partially in shadow, creating a sense of mystery and intrigue.
Aesthetic Score : 0.6
Mood : melancholy, suspense, thoughtful
Quality
Entropy : 5.52
Noise : 95
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.30
Image errors : No visible image errors.
The Weight of War: A Soldier’s Desolation
A lone soldier sits amidst the ruins of a battlefield, smoke and fire swirling in the background. His head bowed in despair, the image captures the somber reality of war and the crushing weight of loss.
Prompt
Sadness Haunted eyes, clenched jaw: Loss, regret ; A soldier kneeling on a battlefield; eye-level; Hero; Explosions in the distance, smoke filling the air; cinematic
Characteristic
Shot : A soldier in a helmet sits on the ground with his head in his hands, the background is a battlefield with smoke and fire
Aesthetic Score : 0.7
Mood : sad, somber, despair
Quality
Entropy : 6.37
Noise : 103
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.60
Image errors : The image has a slightly grainy texture and some artifacts around the soldier’s helmet.
What’s Behind the Wall? A Couple’s Terrifying Discovery
A chilling scene unfolds as a couple on a couch reacts to something unseen. The man’s fear and the woman’s awe leave you wondering what lurks beyond the simple blue and white wall. Is it a ghost, a monster, or something even more terrifying? This suspenseful image will keep you guessing.
Prompt
Sadness Both looking away from each other: Silence, unspoken tension ; A couple sitting on a couch; eye-level; Normal People; Empty popcorn bowl, remote control on the floor; cinematic
Characteristic
Shot : A couple sitting on a couch, watching tv. The man is covering his mouth with his hands, seemingly in horror, while the woman stares up in contemplation. The woman is holding a bowl of popcorn.
Aesthetic Score : 0.7
Mood : suspense, drama, anxiety
Quality
Entropy : 6.54
Noise : 114
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.80
Image errors : The edges of the image are slightly blurry and the color palette is a bit muted. The painting style creates an effect of being slightly off-kilter, which may or may not be intentional. This could be considered a technical error.
In the Zone: The Intensity of Gaming
A dimly lit room, a focused gamer, and a screen ablaze with action. This image captures the raw intensity and focus of gaming, with dramatic lighting highlighting the player’s hands on the keyboard and the vibrant game on the monitor.
Prompt
Sadness Not visible, but implied by body language: Frustration, defeat ; A gamer’s hands on a keyboard; close-up; Gamer; Screen displaying a game over message; cinematic
Characteristic
Shot : A person is playing a video game on a computer. The monitor is displaying a game interface, with the word “GAME” highlighted on screen. The person’s hands are on the keyboard, and the scene is lit with colorful LED lights.
Aesthetic Score : 0.6
Mood : intense, focused, digital
Quality
Entropy : 5.49
Noise : 60
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are slight artifacts in the image, primarily visible in the darker areas. Some chromatic aberration is also present, but it’s not particularly distracting.
Lost in the Crowd: A Moment of Melancholy
A solitary figure navigates a bustling street, her expression hinting at introspection. The blurred background emphasizes the anonymity of the crowd, creating a sense of isolation and a fleeting moment of melancholy.
Prompt
Sadness Lost in thought, distant gaze: Alienation, loneliness ; A woman walking down a crowded street; eye-level; Single Person; People passing by, oblivious to her; cinematic
Characteristic
Shot : A woman walking through a busy street, people are blurred in the background, the scene is dark and moody, only the woman is in focus.
Aesthetic Score : 0.6
Mood : melancholy, somber, lonely
Quality
Entropy : 6.42
Noise : 84
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.40
Image errors : The image appears to be heavily edited, with artificial blur and a vignette effect. There is some noise and grain in the image, especially in the background.
Lost in the City Lights
A solitary figure stands on a rain-soaked rooftop, gazing down at the distant, blurred lights of the city. The heavy rain and the figure’s isolation evoke a sense of melancholy and reflection.
Prompt
Sadness Sad smile, wistful eyes: Reflection, introspection ; A hero standing on a rooftop; eye-level; Hero; City lights twinkling in the distance; cinematic
Characteristic
Shot : A man stands alone on a rooftop in the rain, overlooking a city. The city lights are blurred and create a soft glow in the background. The man’s silhouette is the focal point of the image.
Aesthetic Score : 0.7
Mood : melancholy, lonely, contemplative
Quality
Entropy : 6.26
Noise : 95
Prompt Clip Score : 0.26
AI Evaluation
Likelihood of AI : 0.90
Image errors : There are some slight artifacts in the rain, and some minor imperfections in the man’s silhouette. The lighting is a bit flat.
Conclusion
The results show that the generative AI model performed well in terms of understanding the scene and aesthetics, but struggled with camera positioning. Here’s a breakdown:
- Camera Position: The model scored 0.2, indicating a significant difference between the intended camera position in the prompt and the actual camera position in the generated image. This suggests the model is not very good at following camera position instructions.
- Shot Analysis: The model scored 0.53, which is considered good. This means the model was able to understand the scene in the prompt and create an image that reflects it reasonably well.
- Aesthetic Analysis: The model scored 0.1, which is considered very good. This means the generated image closely matched the expected aesthetic style described in the prompt.
Overall: The model demonstrates a strong ability to understand the scene and create aesthetically pleasing images, but needs improvement in accurately capturing the intended camera position.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://midjourney.com