Unleash the Power of AI in Visual Storytelling: Harness Midjourney v5 and Stable Diffusion to Create Engaging Visual Content
- 7 minutes read - 1455 wordsTable of Contents
Dive into the world of prompt engineering, harnessing the power of Midjourney v5 and Stable Diffusion SDXL Beta to create captivating visual content.
Discover how these cutting-edge AI models can help you generate unique and engaging images using a variety of camera positions and cinematic styles.
Target versions for this blog post
Midjourney v5
The prompts in this blog post are tailored for Midjourney, version 5. Midjourney v5 boasts enhanced language understanding capabilities that yield superior images.
Stable Diffusion SDXL beta
Concurrently, for Stable Diffusion, the cutting-edge model from Dreamstudio SDXL Beta (preview) is employed, leveraging the Cinematic style to create visually captivating content.
Developing Prompts with Midjourney Templates
Since April 2023 Midjourney is offering templates, they do allow it to iterate over results. Sometimes the camera position might confuse the AI resulting in fails.
Example of a Midjourney Prompt Template
cinematic scene - {fast Steadicam shot, extreme close up, following}: a bird flies over the beach
Midjourney will create three prompts and will create 4 images for each prompt:
cinematic scene - fast Steadicam shot: a bird flies over the beach
cinematic scene - extreme close up: a bird flies over the beach
cinematic scene - following: a bird flies over the beach
Here you can see that “fast Steadicam shot” might result in failure:
Developing prompts with ChatGPT (GPT-4)
Using GPT-4 to generate prompt ideas is a useful approach because it leverages the power of advanced AI technology to generate creative, engaging, and diverse suggestions. Users can quickly and effectively brainstorm various ideas for writing prompts, storytelling concepts, and other creative projects by utilizing GPT-4’s extensive knowledge base and sophisticated language understanding capabilities.
This method saves time and effort while sparking inspiration, resulting in a more dynamic and innovative creative process.
describe a long shot scene with keywords about a couple at a beach, start the prompt with “cinematic scene - long shot:”, max 15 words
Result:
Cinematic scene - long shot: Couple strolling, beach sunset, gentle waves, seagulls, distant lighthouse, holding hands, footprints in sand.
It is also possible to create multiple prompts:
describe five long shot scenes with keywords about a couple at a beach, start the prompt with “cinematic scene - long shot:”, max 15 words
And you will get 5 long shot prompts:
Cinematic scene - long shot: Couple picnicking, beach, laughter, seashells, ocean waves, warm sand, gentle breeze.
Cinematic scene - long shot: Silhouetted couple, beach sunset, holding hands, splashing water, seagulls flying, horizon.
Cinematic scene - long shot: Couple building sandcastle, beach, teamwork, sun hats, waves crashing, beach umbrellas.
Cinematic scene - long shot: Beach yoga, couple stretching, peaceful, sunrise, calm sea, boats in the distance.
Cinematic scene - long shot: Couple surfing, beach, exhilarating, vibrant waves, seagulls, onlookers, sun-soaked shoreline.
By leveraging ChatGPT’s expertise, you can:
- Generate diverse prompt ideas: ChatGPT can help you brainstorm various creative concepts for writing prompts, storytelling ideas, and other artistic projects.
- Iterate and refine prompts: ChatGPT can assist you in iterating and refining your prompts, offering new perspectives and styles that elevate your visual storytelling.
- Save time and effort: Utilizing ChatGPT’s sophisticated language understanding capabilities saves you time and effort while sparking inspiration and fostering a more dynamic and innovative creative process.
Using images as input for prompt engineering
Using images as input for prompt engineering is an innovative approach to generating creative and visually captivating content.
By employing advanced AI models like Midjourney v5 and Stable Diffusion SDXL Beta, you can transform images into rich, detailed prompts that inspire striking visual narratives. This method offers a unique way to explore camera positions, styles, and themes, ultimately enhancing the storytelling process.
There are several ways to utilize images as input for prompt engineering:
- Midjourney’s Describe Feature: This feature, introduced in April 2023, allows you to upload an image and receive prompt suggestions based on the visual content. These prompts can serve as a starting point for creating new images with specific camera positions and styles.
- MM-ReAct: An AI model that provides nuanced and detailed descriptions of images. It offers a more sophisticated analysis than traditional computer vision models like CLIP. By using the descriptions generated by MM-ReAct as prompts, you can create images that are more closely aligned with the original visual input.
Consider the following steps to get the most out of using images as input for prompt engineering:
- Choose an Image: Choose an image that inspires you or fits with the story you want to tell. This image will be the basis for your prompts.
- Examine the image: To generate a detailed description of the image, use AI models such as Midjourney’s Describe Feature or MM-ReAct. Take note of important details such as camera position, style, and themes.
- Make Prompts: Create your prompts using the AI-generated descriptions as a starting point. Include the essential elements as well as any additional details that will enhance your visual narrative.
- Iterate and fine-tune: Experiment with various prompt variations and camera angles. You can discover new perspectives and styles that elevate your visual storytelling by iterating and refining your prompts.
In conclusion, using images as prompt engineering input provides a unique opportunity to leverage the power of AI models such as Midjourney v5 and Stable Diffusion SDXL Beta. This method allows you to create visually appealing content that complements your storytelling and engages your audience.
Example Image
Midjourney’s describe
Midjourney introduced the describe feature in early April 2023, the describe feature returns for prompt suggestions after the uploading an image.
For the Example Image we get:
these drone photos show a massive deserted home or building, in the style of dau al set, 32k uhd, cubo-futurism, green and beige, ndebele art, concept art, kushan empire
two trucks are driving in an empty desert, in the style of elaborate spacecrafts, dimitry roulland, light emerald and beige, grandeur of scale, agfa vista, iconic imagery, neo-concrete art
the sands of the great sahara desert where a desert oasis is standing in a circular rock formation, in the style of industrial brutalist, dark aquamarine and beige, mind-bending murals, 32k uhd, cubo-futurism, modular construction, national geographic photo
desert habitat for alien, in the style of dimitry roulland, modular constructivism, cargopunk, national geographic photo, gustave van de woestijne, kushan empire, symmetrical composition
Prompt used:
these drone photos show a massive deserted home or building, in the style of dau al set, 32k uhd, cubo-futurism, green and beige, ndebele art, concept art, kushan empire
For Midjourney:
Using the Midjourney we can see that the term “drone photos” can be used to get an aerial view, conceptionally a “drone photo” is mostly an aerial view.
For Stable Diffusion:
For Stable Diffusion “drone photos” does not work well.
Using MM-ReAct
MM-ReAct is an AI model capable of describing images in a more nuanced and detailed manner than traditional computer vision models such as CLIP.
For the example image we get:
This is an aerial view of a building in the desert with a close up of a stone.
For Midjourney:
The MMReact prompt is shorter and missing details from the example image, compared to the prompt created by Midjourney’s describe feature.
For Stable Diffusion:
The prompt does not work well for Stable Diffusion, probably because the prompt is lacking details and is very short, after adding details:
This is an aerial view of a building in the desert with a close up of a stone, there is a car in front of the building, and a desert road.
Conclusions
The combination of advanced AI models like Midjourney v5 and Stable Diffusion SDXL Beta opens up a realm of possibilities for visual storytelling.
With prompt engineering and creative experimentation, you can develop stunning imagery that captures the essence of your narrative and captivates your audience.
You can effectively experiment with a wide range of visual storytelling possibilities by incorporating ChatGPT into your prompt engineering workflow, resulting in richer and more engaging content for your audience.