AI's Artistic Eye: Capturing Emotion, Missing the Scene with Freepik
- 9 minutes read - 1821 wordsTable of Contents
In the realm of artificial intelligence, generative models are pushing the boundaries of creativity. These models can generate images, text, and even music based on user prompts. However, the ability to accurately translate complex descriptions into visual representations remains a challenge. This blog post examines the performance of a generative AI model in creating images based on detailed scene descriptions, focusing on its ability to capture both the aesthetic and narrative elements of the prompt. We’ll explore the model’s strengths and weaknesses, highlighting its impressive ability to capture the desired aesthetic style while revealing its limitations in understanding scene details and camera position. Through this analysis, we gain insights into the current capabilities and limitations of generative AI models in the realm of visual storytelling.
Created with: freepik
Silhouetted Solitude: A Moment of Contemplation in the Desert
A lone figure stands in a serene desert landscape, their silhouette stark against the fiery hues of the setting sun. The scene evokes a sense of tranquility and contemplation, capturing the beauty of isolation and the vastness of nature.
Prompt
facial-expressions Curiosity: Melancholy, contemplative ; A lone figure, silhouetted against a setting sun; eye-level; Single Person; vast, empty desert landscape; cinematic
Characteristic
Shot : A lone figure stands in a desert landscape, gazing at the setting sun. The scene is characterized by vast, rolling sand dunes and a warm, golden sky.
Aesthetic Score : 0.7
Mood : serene, contemplative, lonely
Quality
Entropy : 6.71
Noise : 46
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly overexposed, which is causing some details in the sky to be washed out.
A City Under Watch: Hope Rises Above the Skyline
A powerful silhouette against the glittering cityscape, this superhero stands guard, their glowing symbol a beacon of hope in the night. The dramatic lighting and composition evoke a sense of grandeur and power, leaving viewers with a feeling of anticipation and inspiration.
Prompt
facial-expressions Curiosity: Determined, hopeful ; A superhero, standing atop a skyscraper, looking out at the city; eye-level; Hero; bustling cityscape with neon lights; cinematic
Characteristic
Shot : A man wearing a superhero suit stands on a rooftop overlooking a futuristic city at night. He is looking out over the city, and his glowing red Superman symbol is prominent on his back. The city lights create a vibrant backdrop.
Aesthetic Score : 0.8
Mood : dramatic, futuristic, powerful
Quality
Entropy : 6.75
Noise : 54
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 1.00
Image errors : No visible errors. The image is well-rendered and free of artifacts.
Lost in Thought: A Moment of Serenity in the Park
A young woman finds peace amidst the vibrant greenery, her contemplative gaze inviting viewers to share her tranquil moment. The blurred background emphasizes the stillness of the scene, creating a sense of calm and serenity.
Prompt
facial-expressions Curiosity: Peaceful, observant ; A young woman, sitting on a park bench, watching children play; eye-level; Normal People; vibrant park with blooming flowers; cinematic
Characteristic
Shot : A young woman is sitting on a park bench, looking directly at the camera.
Aesthetic Score : 0.7
Mood : calm, contemplative, wistful
Quality
Entropy : 6.69
Noise : 65
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible errors
Lost in the Code: A Moment of Intense Focus
A young man sits hunched over his computer in a dimly lit room, his expression serious and focused. The cluttered desk and shadowy surroundings create an atmosphere of mystery and intrigue, hinting at the depth of his concentration. This image captures the essence of a mind immersed in the digital world, lost in the pursuit of a solution.
Prompt
facial-expressions Curiosity: Intense, focused ; A gamer, hunched over a computer screen, eyes glued to the monitor; close-up; Gamer; dimly lit room with flashing lights from the screen; cinematic
Characteristic
Shot : A young man is sitting in front of a computer, looking intently at the screen. The room is dimly lit, with only the glow of the monitor and a few lights in the background.
Aesthetic Score : 0.6
Mood : focused, contemplative, serious
Quality
Entropy : 6.35
Noise : 46
Prompt Clip Score : 0.22
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry, and there is some noise in the shadows.
Lost in the Crowd: A Moment of Introspection
A young man with a piercing gaze stands amidst the bustling chaos of a market, his thoughts seemingly far away. The blurred background adds to the sense of mystery, inviting the viewer to wonder what secrets lie behind his intense expression.
Prompt
facial-expressions Curiosity: Intrigued, observant ; A man, walking through a crowded marketplace, his eyes darting around; eye-level; Single Person; bustling marketplace with colorful stalls and vendors; cinematic
Characteristic
Shot : A young man is standing in a crowded marketplace, looking directly at the camera with a serious expression.
Aesthetic Score : 0.7
Mood : intrigued, pensive, contemplative
Quality
Entropy : 6.82
Noise : 66
Prompt Clip Score : 0.20
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no noticeable image errors.
One Man Stands Against the Apocalypse
A lone figure in medieval garb stands defiant amidst a ravaged cityscape consumed by fire and smoke. The stark contrast between his composure and the chaotic backdrop creates a sense of dramatic tension, hinting at a story of resilience and survival in the face of overwhelming odds.
Prompt
facial-expressions Curiosity: Brave, resolute ; A hero, standing in the middle of a chaotic battle, looking determined; eye-level; Hero; smoke-filled battlefield with explosions and debris; cinematic
Characteristic
Shot : A man in medieval clothing stands in a war-torn city, surrounded by smoke and flames. He looks determined and ready to fight.
Aesthetic Score : 0.7
Mood : dramatic, intense, hopeless
Quality
Entropy : 6.86
Noise : 71
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has some slight artifacts in the smoke and flames. The edges of the building in the background look a little pixelated. The composition feels a little static.
Candlelit Laughter: A Cozy Gathering of Friends
A group of friends share a warm and intimate moment, illuminated by candlelight. The scene radiates joy and connection, capturing the essence of friendship and shared laughter.
Prompt
facial-expressions Curiosity: Joyful, connected ; A group of friends, gathered around a table, sharing stories and laughter; eye-level; Normal People; cozy living room with warm lighting; cinematic
Characteristic
Shot : A group of friends are gathered around a table, laughing and enjoying each other’s company. The warm lighting and cozy setting create a sense of intimacy and connection.
Aesthetic Score : 0.7
Mood : happy, cozy, intimate
Quality
Entropy : 6.83
Noise : 55
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image appears slightly overexposed, leading to a loss of detail in some areas.
Caught in the Moment: Gamer’s Surprise Under Neon Lights
A young gamer, eyes wide with surprise, is fully immersed in their game. The close-up framing and vibrant background lights capture the intensity and excitement of the moment.
Prompt
facial-expressions Curiosity: Excited, engaged ; A gamer, holding a controller, eyes wide with excitement; close-up; Gamer; brightly lit gaming room with colorful lights; cinematic
Characteristic
Shot : A young person is playing a video game with a controller in their hands, they are wearing headphones and their face is close to the camera, they are looking surprised or excited.
Aesthetic Score : 0.6
Mood : excited, focused, playful
Quality
Entropy : 6.70
Noise : 47
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no noticeable artifacts or errors in the image.
Lost in the Wind: A Woman’s Melancholy on the Cliffside
A solitary figure stands on a windswept cliff, gazing out at the turbulent sea. The dramatic scene evokes a sense of longing and contemplation, capturing a moment of profound melancholy.
Prompt
facial-expressions Curiosity: Contemplative, introspective ; A woman, standing at the edge of a cliff, gazing out at the vast ocean; eye-level; Single Person; dramatic cliffside with crashing waves; cinematic
Characteristic
Shot : A woman stands on a cliff overlooking a stormy sea. The sky is cloudy and the water is choppy.
Aesthetic Score : 0.7
Mood : pensive, melancholic, dramatic
Quality
Entropy : 6.70
Noise : 58
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image is slightly blurry, and there are some minor artifacts around the edges.
Man Faces Down Burning Building in Dramatic Scene
A man in a green jacket stands defiantly in front of a blazing inferno, his serious expression reflecting the intensity of the moment. The fire’s glow illuminates his face, creating a sense of urgency and danger.
Prompt
facial-expressions Curiosity: Brave, selfless ; A hero, standing in front of a burning building, ready to save people; eye-level; Hero; chaotic scene with smoke and flames; cinematic
Characteristic
Shot : A man in a green jacket stands in front of a burning building. The flames are visible through the windows and there is smoke in the air.
Aesthetic Score : 0.6
Mood : dramatic, tense, intense
Quality
Entropy : 6.72
Noise : 56
Prompt Clip Score : 0.27
AI Evaluation
Likelihood of AI : 0.80
Image errors : The image is slightly blurry, and the flames appear to be somewhat unrealistic. The fire’s texture is too smooth and not dynamic enough. The smoke is too uniform and could have more detail.
Conclusion
The results show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.15, which is considered below average. This suggests that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.48, which is considered below average. This indicates that the model didn’t fully understand the scene described in the prompt and didn’t create an image that accurately reflects it.
- Aesthetic Analysis: The model scored 0.08, which is considered very good. This means that the generated image closely matched the expected aesthetic style.
Overall, the model seems to be better at capturing the desired aesthetic than understanding the scene and camera position. This suggests that the model might need further training to improve its ability to interpret and translate prompts into accurate visual representations.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://www.freepik.com