AI's Camera Skills: A Work in Progress with Ideogram-v2-turbo
- 9 minutes read - 1799 wordsTable of Contents
In the world of visual storytelling, camera position and shot selection are crucial for conveying mood, emotion, and narrative. We wanted to see how well an AI could understand and implement these elements. We provided the AI with a series of scene descriptions, each specifying a camera position and shot type. The results were interesting, revealing both strengths and weaknesses in the AI’s ability to capture the intended visual style.
Created with: ideogram-v2-turbo
Silhouetted Against the Sunset, a Moment of Solitude Atop the Peak
A lone figure stands on a mountain summit, bathed in the golden light of the setting sun. Below, a sea of clouds stretches out, creating a breathtaking and awe-inspiring scene. The image evokes a sense of serenity, contemplation, and the vastness of the natural world.
Prompt
camera-positions Bird’s eye view: Epic, triumphant, inspiring ; A lone figure standing on a mountain peak; wide shot; Heroism; a vast, sprawling landscape with clouds swirling below; cinematic
Characteristic
Shot : A lone figure stands atop a mountain peak, overlooking a vast expanse of clouds below. The clouds are lit by the warm glow of the setting sun, creating a dramatic and awe-inspiring scene.
Aesthetic Score : 0.8
Mood : serene, awe-inspiring, contemplative
Quality
Entropy : 6.61
Noise : 92
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.30
Image errors : No noticeable artifacts or errors.
Lost in the Jungle: A Mysterious Journey
A group of figures, cloaked in white, navigate a dense jungle under the watchful gaze of the sun. The high angle shot and dramatic shadows create an eerie atmosphere, hinting at an adventure filled with mystery and danger.
Prompt
camera-positions Bird’s eye view: Intriguing, adventurous, mysterious ; A group of explorers navigating a dense jungle; medium shot; Adventure; lush green foliage, sunlight filtering through the canopy; cinematic
Characteristic
Shot : A group of people dressed in white are walking through a dense jungle. The image is taken from a high angle, looking down on the group.
Aesthetic Score : 0.7
Mood : mysterious, adventurous, eerie
Quality
Entropy : 6.40
Noise : 116
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image has some minor artifacts, particularly around the edges of the leaves. The lighting is also a bit uneven, making some areas of the image darker than others.
Lost in the Neon Jungle: A Cyberpunk Silhouette
A lone figure in a futuristic suit stands on a rooftop, gazing out over a sprawling, neon-drenched city. The composition evokes a sense of isolation and mystery, capturing the essence of cyberpunk urban life.
Prompt
camera-positions Bird’s eye view: Futuristic, vibrant, dynamic ; A player character standing on a rooftop overlooking a bustling city; medium shot; Gaming; neon lights, towering skyscrapers, and holographic displays; cinematic
Characteristic
Shot : A man in a futuristic outfit standing on a rooftop overlooking a futuristic city.
Aesthetic Score : 0.6
Mood : cyberpunk, urban, mysterious
Quality
Entropy : 6.62
Noise : 98
Prompt Clip Score : 0.30
AI Evaluation
Likelihood of AI : 0.90
Image errors : Slight blurring around edges of the image.
A Bird’s Eye View of Bustling Market Life
Capture the vibrant energy of a bustling city market from above. Colorful umbrellas, bustling crowds, and a kaleidoscope of goods create a lively scene, enhanced by the dramatic perspective of a high angle view.
Prompt
camera-positions Bird’s eye view: Lively, vibrant, exotic ; A bustling marketplace in a foreign city; wide shot; Tourism; colorful stalls, crowds of people, and traditional architecture; cinematic
Characteristic
Shot : A bustling outdoor market in a city with many people, stalls, and colorful umbrellas.
Aesthetic Score : 0.6
Mood : busy, lively, vibrant
Quality
Entropy : 6.83
Noise : 107
Prompt Clip Score : 0.28
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors.
Tranquil Journey Through a Verdant Valley
A winding road snakes through a serene valley, flanked by lush green hills and dense forests. The dramatic perspective invites you to explore the vastness of the landscape, offering a sense of isolation and adventure.
Prompt
camera-positions Bird’s eye view: Tranquil, scenic, inspiring ; A winding road leading through a picturesque valley; long shot; Travel; rolling hills, lush meadows, and a clear blue sky; cinematic
Characteristic
Shot : A winding road through a valley with green hills and forests on both sides.
Aesthetic Score : 0.7
Mood : tranquil, serene, adventurous
Quality
Entropy : 6.65
Noise : 97
Prompt Clip Score : 0.29
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable artifacts or errors.
Campfire Nights Under a Starry Sky
A group of friends gather around a crackling campfire, sharing stories and laughter under a breathtaking starry sky. The warmth of the fire and the intimacy of the moment create a cozy and unforgettable experience.
Prompt
camera-positions Bird’s eye view: Warm, intimate, nostalgic ; A group of friends gathered around a campfire; medium shot; Groups; a starry night sky, a crackling fire, and the silhouette of mountains in the distance; cinematic
Characteristic
Shot : A group of friends are gathered around a campfire under a starry sky. They are sitting on the ground or on benches, and some are holding cups or mugs. There is a sense of warmth and intimacy in the scene.
Aesthetic Score : 0.7
Mood : cozy, warm, intimate
Quality
Entropy : 5.95
Noise : 96
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors. The image is well-composed and balanced.
Tranquil Sunset Sail: A Tiny Boat Against the Vast Ocean
Capture the serenity of a lone sailboat gliding across the deep blue ocean at sunset. The sky bursts with vibrant pink and blue hues, creating a breathtaking backdrop. The smallness of the boat against the vastness of the water evokes a sense of peace and perspective.
Prompt
camera-positions Bird’s eye view: Serene, adventurous, contemplative ; A lone sailboat navigating a vast ocean; long shot; Adventure; endless blue water, whitecaps, and a setting sun; cinematic
Characteristic
Shot : A lone sailboat sails across the ocean at sunset. The sky is a beautiful blend of pink and blue, and the water is a deep blue.
Aesthetic Score : 0.8
Mood : tranquil, peaceful, serene
Quality
Entropy : 6.27
Noise : 94
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : No noticeable errors
Vibrant Dance in a Colorful City Square
A lively scene unfolds in a cobblestone square, where a group of dancers captivate a crowd with their energetic performance. The vibrant colors of the city and the dynamic movement of the dancers create a festive and captivating atmosphere.
Prompt
camera-positions Bird’s eye view: Energetic, festive, celebratory ; A group of dancers performing in a plaza; medium shot; Groups; cobblestone streets, colorful buildings, and a lively crowd; cinematic
Characteristic
Shot : A group of people are performing a dance in a cobblestone square in a colorful city, surrounded by a crowd of onlookers.
Aesthetic Score : 0.7
Mood : energetic, festive, vibrant
Quality
Entropy : 6.95
Noise : 114
Prompt Clip Score : 0.31
AI Evaluation
Likelihood of AI : 0.20
Image errors : The image has slight artifacts and noise, particularly in the shadows and darker areas.
Solitude and Majesty: A Hiker Contemplates the Vast Canyon
A lone hiker stands on a cliff edge, gazing out at a deep canyon with a winding river below. The overcast sky casts a soft, warm light, creating a tranquil and awe-inspiring scene. The vastness of the canyon evokes a sense of wonder, while the solitary figure adds a touch of contemplation and peace.
Prompt
camera-positions Bird’s eye view: Awe-inspiring, majestic, powerful ; A lone hiker standing on a cliff overlooking a breathtaking canyon; wide shot; Heroism; towering rock formations, a river winding through the valley, and a dramatic sky; cinematic
Characteristic
Shot : A lone hiker stands on a cliff edge, looking out at a deep canyon with a winding river flowing through it. The sky is overcast with clouds, and the light is soft and warm.
Aesthetic Score : 0.8
Mood : tranquil, solitary, majestic
Quality
Entropy : 6.70
Noise : 107
Prompt Clip Score : 0.34
AI Evaluation
Likelihood of AI : 0.10
Image errors : No obvious image errors.
Bonfire Nights Under a Starry Sky
A group of friends gather around a crackling bonfire on a beach, bathed in the warm glow of the flames. The night sky above is a canvas of twinkling stars, creating a magical and cozy atmosphere. Palm trees sway gently in the background, adding to the sense of adventure and relaxation.
Prompt
camera-positions Bird’s eye view: Romantic, relaxing, nostalgic ; A group of people gathered around a bonfire on a beach; medium shot; Groups; a starry night sky, crashing waves, and the silhouette of palm trees; cinematic
Characteristic
Shot : A group of friends are gathered around a bonfire on a beach at night, under a starry sky with palm trees in the background.
Aesthetic Score : 0.7
Mood : cozy, relaxing, adventurous
Quality
Entropy : 6.49
Noise : 107
Prompt Clip Score : 0.33
AI Evaluation
Likelihood of AI : 0.30
Image errors : The image appears to be slightly overexposed, resulting in some loss of detail in the darker areas. The stars in the sky look slightly unnatural and possibly AI-generated.
Conclusion
The results show that the generative AI model performed okay in terms of understanding and reacting to camera positions and shot descriptions.
Here’s a breakdown:
- Camera Position Analysis: The score of 0.38 indicates that the model’s ability to accurately represent the intended camera position in the generated image is below average. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
- Shot Analysis: The score of 0.43 suggests that the model’s understanding of the scene and its ability to create the intended shot is also below average. A score between 0.5 and 0.75 would be considered good, and above 0.75 very good.
- Aesthetic Analysis: The score of 0.29 indicates that the generated image’s aesthetic is slightly different from the expected aesthetic. A score between -0.2 and 0.1 would be considered very good, indicating a close match between the expected and actual aesthetics.
Overall, the model needs improvement in its ability to accurately interpret and implement camera positions and shot descriptions. It also needs to better capture the intended aesthetic of the image.