There are two leading kinds of models:ĭiffusion models, like Stable Diffusion, DALL♾ 2, Midjourney, and CLIP-Guided Diffusion, which work by starting with a random field of noise, and then editing it in a series of steps to match its understanding of the prompt. The next step for the AI is to actually render the resulting image. Different art generators have different levels of understanding of complex text, depending on the size of their training database. This allows them to learn the difference between dogs and cats, Vermeers and Picassos, and everything else. To do this, the AI algorithms are trained on hundreds of thousands, millions, or even billions of image-text pairs. Since your prompt can be anything, the first thing all these apps have to do is attempt to understand what you're asking. They use computers, machine learning, powerful graphics cards, and a whole lot of data to do their thing.ĪI art generators take a text prompt and, as best they can, turn it into a matching image. But it turns out AI art generators don't work using magic.
The first time you enter a prompt into an AI art generator and it actually creates something that perfectly matches what you want, it feels like magic.