AI's Artistic Struggle: Capturing Emotion in Images with Titan-g1
- 9 minutes read - 1781 wordsTable of Contents
In the realm of artificial intelligence, the ability to generate images based on textual descriptions is a rapidly evolving field. While AI models have made significant strides in understanding scene composition and generating visually appealing images, capturing the subtle nuances of human emotion and artistic expression remains a significant challenge. This blog post delves into an experiment where an AI model was tasked with generating images based on detailed scene descriptions, highlighting its strengths and weaknesses in capturing the intended aesthetic and emotional nuances.
Created with: titan-g1
Lost in Thought: A Moment of Autumnal Melancholy
A woman finds solace in the quiet solitude of a park, her back turned to the camera as she contemplates the changing season. The blurred background and fallen leaves evoke a sense of melancholy and introspection, capturing the essence of autumn’s bittersweet beauty.
Prompt
facial-expressions Sadness: Melancholy, loneliness ; A lone figure; eye-level; Single Person; Empty park bench with fallen leaves; cinematic
Characteristic
Shot : A woman sitting on a bench in a park, with fallen leaves in the background.
Aesthetic Score : 0.5
Mood : melancholy, contemplative, autumnal
Quality
Entropy : 6.91
Noise : 95
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : There are no major errors in the image. The resolution is good and there is no noticeable noise.
Celestial Guardians: Statues Silhouetted Against Vibrant Aurora Borealis
Two statues stand as silent sentinels against a breathtaking backdrop of emerald and violet aurora borealis, their forms starkly outlined against the starry sky and distant land. The mystical and serene scene evokes a sense of awe and wonder, with the celestial display adding a dramatic and ethereal touch.
Prompt
facial-expressions Sadness: Despair, disillusionment ; A lone figure in a vibrant, flowing costume stands atop a colossal, ancient statue, silhouetted against the shimmering aurora borealis. Snow falls softly, creating a delicate veil around them as they gaze out at the vast, frozen landscape below.; cinematic
Characteristic
Shot : A statue of a woman standing under a bright aurora borealis display.
Aesthetic Score : 0.7
Mood : mysterious, serene, hopeful
Quality
Entropy : 6.81
Noise : 111
Prompt Clip Score : 0.23
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly underexposed, making it difficult to see the details of the statue.
Lost in Thought: A Moment of Melancholy
A woman sits at a kitchen counter, her gaze lowered, lost in contemplation. The scene evokes a sense of sadness and introspection, capturing a moment of quiet reflection.
Prompt
facial-expressions Sadness: Hopelessness, grief ; A woman sitting at a kitchen table; eye-level; Normal People; Empty coffee cup, unwashed dishes; cinematic
Characteristic
Shot : A woman is sitting at a kitchen counter, looking down and resting her chin on her hands. There is a coffee cup and a silver pitcher on the counter.
Aesthetic Score : 0.4
Mood : pensive, melancholic, introspective
Quality
Entropy : 6.84
Noise : 102
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : There is some slight blurring in the image, particularly in the background. This could be due to the low lighting or camera shake.
The Weight of the World
A solitary figure sits at a desk, head in hands, surrounded by the remnants of a long day. The computer monitor glows in the background, a silent witness to their struggle. The mood is heavy with sadness, stress, and loneliness, captured in the person’s slumped posture and downcast expression.
Prompt
facial-expressions Sadness: Isolation, withdrawal ; A gamer hunched over their computer; close-up; Gamer; Empty pizza boxes, energy drink cans; cinematic
Characteristic
Shot : A person is sitting in front of a computer, covering their face with their hands. The image is lit by blue light, creating a dark and moody atmosphere.
Aesthetic Score : 0.4
Mood : sad, lonely, stressed
Quality
Entropy : 6.75
Noise : 106
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.10
Image errors : There are no significant image errors. The image appears to be slightly blurry and noisy, but this could be intentional for the mood.
Lost in the Shadows: A Boy’s Melancholy Journey
A young boy stands alone in a dimly lit hallway, his gaze fixed on the distant end. The bare walls and deep shadows create a sense of isolation and mystery, hinting at a story of loneliness and introspection.
Prompt
facial-expressions Sadness: Loneliness, abandonment ; A child standing in a doorway; eye-level; Single Person; Empty hallway, dim lighting; cinematic
Characteristic
Shot : A young boy stands in a hallway, looking away from the camera towards a doorway, in what appears to be a rundown apartment.
Aesthetic Score : 0.4
Mood : sad, lonely, isolated
Quality
Entropy : 6.72
Noise : 103
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image is slightly grainy and the colors are muted. The lighting is uneven and the shadows are harsh.
Golden Hour Serenity: A Hiker Contemplates the City Below
A lone hiker finds peace on a rocky cliff, bathed in the golden light of sunset. The vast cityscape stretches out below, offering a breathtaking perspective and a sense of adventure. This serene moment captures the beauty of nature and the quiet contemplation of the human spirit.
Prompt
facial-expressions Sadness: Loss, regret ; A lone adventurer kneeling on a windswept mountain peak, gazing at a distant city bathed in golden sunlight. The air is filled with the scent of pine and the sound of wind whistling through the rocks.; cinematic
Characteristic
Shot : A lone hiker with a backpack sits on a hill overlooking a city, the sun is setting casting a golden glow over the scene.
Aesthetic Score : 0.7
Mood : serene, contemplative, adventurous
Quality
Entropy : 6.84
Noise : 107
Prompt Clip Score : 0.24
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors in the image.
What’s on the Screen? Couple’s Surprised Reaction Captures the Moment
A man and woman, seated on a couch, share a look of surprise as they gaze upwards. The woman rests her hand on her chin, while the man holds a bowl of popcorn, hinting at a shared moment of unexpected excitement. The scene evokes a sense of anticipation and curiosity, leaving viewers wondering what has caught their attention.
Prompt
facial-expressions Sadness: Silence, unspoken tension ; A couple sitting on a couch; eye-level; Normal People; Empty popcorn bowl, remote control on the floor; cinematic
Characteristic
Shot : Two people are sitting on a couch, watching something on a screen, likely a television. The man is holding a bowl of popcorn and the woman looks up in surprise.
Aesthetic Score : 0.6
Mood : intrigued, suspenseful, casual
Quality
Entropy : 6.95
Noise : 102
Prompt Clip Score : 0.25
AI Evaluation
Likelihood of AI : 0.20
Image errors : No visible image errors
The Focus Within
A close-up shot captures the intensity of a person typing on a keyboard, their hands in sharp focus against a blurred background. The image evokes a sense of focused concentration and the quiet intensity of a tech-driven world.
Prompt
facial-expressions Sadness: Frustration, defeat ; A gamer’s hands on a keyboard; close-up; Gamer; Screen displaying a game over message; cinematic
Characteristic
Shot : A person’s hands are typing on a keyboard in front of a computer monitor.
Aesthetic Score : 0.5
Mood : focused, techy, digital
Quality
Entropy : 6.69
Noise : 102
Prompt Clip Score : 0.21
AI Evaluation
Likelihood of AI : 0.20
Image errors : No significant errors but there are some minor noise in the image.
Lost in Thought on a City Street
A young woman contemplates the urban landscape, her pensive expression highlighted by the shallow depth of field. The man behind her and the bustling city fade into the background, leaving her thoughts as the focus.
Prompt
facial-expressions Sadness: Alienation, loneliness ; A woman walking down a crowded street; eye-level; Single Person; People passing by, oblivious to her; cinematic
Characteristic
Shot : A woman is walking down a street, looking off to the side. She is wearing a denim jacket and a tank top. The background is out of focus, but you can see that the street is lined with shops. There is a man walking behind her, slightly out of focus.
Aesthetic Score : 0.6
Mood : pensive, urban, casual
Quality
Entropy : 6.90
Noise : 99
Prompt Clip Score : 0.19
AI Evaluation
Likelihood of AI : 0.10
Image errors : The image has some noise and graininess, particularly in the background. The focus is slightly soft in areas, especially on the man in the background.
Lost in the City Lights
A woman, her gaze lost in the blurry cityscape, contemplates the passing day. The soft hues of dusk paint a melancholic mood, hinting at a longing for something beyond the horizon.
Prompt
facial-expressions Sadness: Reflection, introspection ; A hero standing on a rooftop; eye-level; Hero; City lights twinkling in the distance; cinematic
Characteristic
Shot : A young woman is standing in front of an out-of-focus cityscape at dusk. The background is blurry, and the woman is the main focus of the image.
Aesthetic Score : 0.7
Mood : melancholy, pensive, thoughtful
Quality
Entropy : 6.43
Noise : 98
Prompt Clip Score : 0.17
AI Evaluation
Likelihood of AI : 0.10
Image errors : No significant artifacts or errors
Conclusion
The results of the analysis show that the generative AI model performed well in understanding the scene and camera position, but struggled with the aesthetic aspect. Here’s a breakdown:
- Camera Position: The model scored 0.2, which is below the “good” range of 0.5 to 0.75. This indicates that the model didn’t accurately capture the intended camera position described in the prompt.
- Shot Analysis: The model scored 0.545, which falls within the “good” range. This suggests that the model was able to understand the scene described in the prompt and create a shot that aligns with it.
- Aesthetic Analysis: The model scored 0.26, which is significantly lower than the “very good” range of -0.2 to 0.1. This indicates that the generated image’s aesthetic deviated significantly from the expected aesthetic described in the prompt.
Overall, the model demonstrated a good understanding of the scene and shot composition, but struggled to achieve the desired aesthetic.
Sources:
- https://dramaresource.com/storytelling/
- https://seedsoftellers.eu/resources/the-body-language-for-young-tellers/
- https://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1288&context=sandanona&filename=1&type=additional
- https://citeseerx.ist.psu.edu/document?doi=7f842882e9bb1fa2c0e96939bc8d2c37e34e17c0&repid=rep1&type=pdf
- https://www.twinkl.co.uk/search?q=drama+facial+expression
- https://docs.aws.amazon.com/bedrock/latest/userguide/titan-image-models.html