AI image generation primer
4 min readI’ve spent the last 6 months understanding how to use image gen models after getting a little frustrated with only relying on API services to generate images for a particular use case. Here’s a braindump of my knowledge so far as someone who is non-technical with AI.
ComfyUI
ComfyUI (node based) is an app that helps you generate AI images. It’s a must-learn if you want to fiddle around with AI image/video gen. There are others like Forge and A1111 (form based) but I’ve found ComfyUI to be the most flexible (and most difficult to grasp initially). While the others are easier to use and (from what I hear) give a bit better results, ComfyUI is the only one that can completely automate a workflow from pre-processing > generation > post-processing, which means if you want to productise a workflow, This is going to be your best option in cases where your processing is specific or complex. I’ll talk about how to get your head around ComfyUI later, but first I’ll talk about the image models.
Image model history
I’ll try to summarise the complex history of open models which’ll help you understand the big picture.
2022
Mid 2022, Stable Diffusion v1 was released. Although Stability AI was the company behind the model, SAI outsourced the training so all prior v1.5 was released by CompVis, and v1.5 was released by RunwayML. All models were uncensored. SD was quickly adopted by the community as the model was open source.
Late 2022, SAI released SD2 with a filtered dataset, removing all the NSFW content. This model was not well received by the community.
2023
Early 2023 saw the release of Controlnet, LoRA, and IPadapters for SD1.5 allowing better control of image generation, and SD1.5 exploded in popularity becoming the go-to model. Two styles that were popular with this model: realism and anime.
Mid 2023 saw the release of Stable Diffusion XL. While v1.x were trained on 512px content, SDXL was trained on 1024px content. SDXL was well received but since there was already a large adoption for SD1.5, SDXL development saw a slow start from the community due to the popularity of SD1.5, but also SDXL saw longer training times due to its larger size. Keep in mind that every major version e.g 1, 2 and XL had a different architecture so tools like Controlnet, LoRA, and IPadapters did not work universally across all major models and had to be supported for each major version by the community. Eventually Controlnet, LoRA, and IPadapters support followed SDXL.
Late 2023 was when SDXL Turbo released: a distilled version of SDXL and a change in license. The community did not like the license change as it was not true open source.
2024
Early 2024 saw SD3 released by Stability AI. The release caused a lot of backlash from the community as the license was very restricted even for personal use. At the same time, Stability AI were running out of money and had no plans to create future SD models. SAI eventually rectified their license by releasing SD3.5 in late 2024.
Mid 2024 saw the release of a new image model called Flux from a team called Black Forest Labs, who were a bunch of former SAI employees. I’ve seen community members say that Flux should have been what SD3 was when it was released. Ouch. SD3 seems to have been overshadowed by the Flux release and has become the latest go-to model for a lot of people. Flux released 3 variations: Dev: an open source non-commercial license version, Schnell: a distilled version completely open sourced, and Pro: a closed sourced version only accessible via an API.
Late 2024 saw BFL release Flux tools such as Fill, Redux, Depth, and Canny, the equivalent of inpaint and controlnets from Stable Diffusion.
And that’s the history of open models, here’s a great video that explains the stable diffusion models in detail. There are other open models since the release of Flux which I will not get into here. I’ve created a table of the pros and cons of the popular models to help you get a grasp on them.
| Model | T2I/I2I | Inpainting/Controlnet/LoRa/IPadapter | Non restricted comercial use | Pros | Cons | ||||
|---|---|---|---|---|---|---|---|---|---|
| SD1.5 | Yes/Yes | Yes/Yes/Yes/Yes/ | Yes | Fast generation, fast training, rich and mature ecosystem, can be first choice still for some use cases e.g anime | Base output is not good quality, Low prompt adherance, text illegible | ||||
| SDXL1 | Yes/Yes | Yes/Yes/Yes/Yes/ | Yes (Not turbo) | Realism can be on par with base Flux, Faster than Flux, relatively fast training, rich and mature ecosystem, quality can get as good as base Flux | Base output is still not great quality, Low prompt adherence, text illegible | ||||
| Flux Dev/Schnell | Yes/Yes | Fill/Depth,Canny/Yes/Redux/ | Schnell | High quality base output, High prompt adherence | Incredibly slow generation, slow training, No negative prompt | ||||
| FLUX Dev/Schnell | Yes | Yes | Fill | Depth, Canny | Yes | Redux | Schnell | High quality base output, High prompt adherence, Great for realism | Incredibly slow generation, slow training, No negative prompt |