If you are using Stability AI Cloud Service just select "SD Cloud" and enter your API Key (you find in your profile in Dreamstudio)
For in-house Installation follow instructions at the GitHub page.
A good GPU is needed from NVIDIA only (2600 or better, min 6GB RAM). You cannot run server part on Mac or with AMD card. Mac computers (eveb the most expensive ones) have very low GPU power for this kind of task. So using a Mac as server would be more a "proof-of-concept" than a usable solution for daily work because it is just too slow. At the moment we are working on a server update so link will change a bit in future.
After setup just put that link in "SD Local" Configuration. You can test server by opening the link and (/api/version). So for example http://127.0.0.1/api/version should show the current server version.
Prompt: enter whatever you want to see in text input (e.g. "a cat in suit"). We have seperated the so called modifiers from main prompt input. And so you can define modifiers for individual styles below prompt input. You can also manage your own library with modifiers. Each list of modifiers can be assigned to tags which can be selected at the top as category:
Find inspiration for modifiers and prompts here. So with that separation you can quickly change the look an style of a prompt with manually typing or copy/paste a long string again. We also implemented to comment out a line in modifier box:
In this example "pastel colored" will not be part of the full prompt.
This is second most important parameter for Stable Diffusion AI and we recommend that you play around with this and get some kind of feeling here with it. We made this an easy task. So you can start with low step value first (e.g. 15) and on second preview screen you can update individual images with higher values to get more details/better quality pictures. But this depends on subject/prompt and also the sampler (see below).
Low steps values need lees computional time and results come more quickly. If you are using Cloud configuration it generation costs less here.
So for example if you have a 3600TI. Generation of 4 images at step value 15 needs only around 8 seconds (2 seconds each). And one image with 50 steps needs same time.
- Number of images: generate between 1 and 4 images
- Seed: a random number where image is connected with. Same Seed plus same other parameters results in same image. In preview Screen you can see the seed of each image below the image and copy it from there.
- Guidiance Scale: higher value means that the result follows more the prompt. 7.5 is a good common value.
About Sampler (advanced reading):
The ancestral algorithms (Euler a and DPM2 a) do better at lower steps value and Euler A tends to diverge at higher steps rather than converge. The other algorithms, especially LMS, benefit from additional steps and a slightly higher CFG. They tend to converge on an image, so as the steps go higher, the image only changes in small ways. It's the early steps where it can vary more. Euler A work better with a CFG around 5-8, and Euler/LMS do better with 7-12. For simpler/short prompts, fewer steps, Euler 32 @ CFG 8, produces the most realistic pictures. We recommend to start with around 15 steps (in Euler A) or with 30 at LMS and try higher values (like 50) in preview screen for individual image updates.
In this mode you can adjust an existing image to draw it in a new style. This is much more powerful than other so called "style-transfer" alorithmn but it also works different as well because it uses the input of the prompt plus an image.
On the right you see the selected image which will be used for image generation.
Denoising Strength: this is new parameter you can guide how strong the result should follow the original image or the prompt. So with very high value the result might be something completely different what you might expect - just similar to Text to Image without using an input image. So we made it easy to find the correct value by offer an update of image generations in Preview screen similar to the steps parameter for fine-tuning the result.
Here again these three parameters in same example from above and result image:
With this feature pretty amazing things can be made. So a childrens drawing could be transfered to a more professional one or vice-versa. Please note that in cloud configuration the strength parameter behave in opposite direction to be compatible to Dreamstudio version.
With Inpainting a part of an existing images will be changed. There are 3-4 different variations of this technique known:
- Normal Inpainting: Place a new object in existing scene (or remove one)
- Orginal Inpainting: slightly change a part of an image
- Outpainting and Uncrop: Add something new outside the image to enlarge it
We do not support outpainting at the moment. But in some situations it works a bit and brings some results. Uncrop does not work at all.
1. Normal Inpainting
First remove a part of an image. Than make a selection big enough that AI can understand the surrounding and close enough to have good resolution. Also erased part should match the prompt. Otherwise the object in inpaint might be cut off at the edges. The hotel room example from the video explains quite quite what is imprtant here:
If there would be only "a suitcase" the AI mostly deliveres not an upright one whoch would not fit in this area. So with additional "trolley" it the results matches very good to the erased area.
Two more very important things.
- Please set in configuration the so called Diffuser to "Latent Noise" otherwise you would normally have bad results.
- also use a very high strength value. Much higher than in Image to Image - like 0.95
2. Original Inpainting
If you want not an new object in scene but to just change the style of an existing part of the image. The "Orginal" Inpainting delivers great results.
As you can see beside the prompt another image is needed - the original one. This means that the AI not only works on one image with a hole in it - it also uses the image data from the area before it has been erased. So this technique is some kind of "cheating" because it depends on that original data. Our plugin handles it like that: the current selection of current layer is used for mask (image with erased area - the middle one in example above). The other layer with all image data is searched by the layer name. So it looks for a layer with a the same number in it. If it cannot be find it, it uses the first layer. In configuration the Diffuser has been set to "original" here. Also the strength value is not so high (something above 0.5)
With tiling a seamless pattern can be easily generated in Text to Image (Image to Image might also work but have not tested yet). In preview screen you can see the results in a 3x3 pattern by clicking on one image:
Stable Diffusion produces quite often bad results for faces of people. So this option is very useful here to improve some of these. For close-ups it is mostly better to turn it off because it softenes the result a bit.
Stable Diffusion has been trained on images with size of 512x512. Therefore the default selection matches exakt that size. If you change that size in configuration please be very careful:
- only numbers which can be divided by 64 are allowed
- on local installation image generation might not work anymore (memory error)
- higher values produces quite often images with repeating content inside it