Finguard GenAI — Diffusion & multimodal generation

A diffusion model is trained on a strangely simple task: take a noisy image and make it slightly less noisy.

Repeat that thousands of times, starting from pure static, and a picture appears. This course builds that idea from the ground up — through the VAEs and GANs that came before, the math that makes diffusion work, and the systems that turn a text prompt into an image.

What you'll learn

From noise to a finished image.

The classics

Autoencoders and VAEs (with a latent space you can explore), then GANs — the adversarial game, mode collapse, and StyleGAN's control.

Diffusion, in depth

The forward and reverse processes, the training objective, samplers (DDPM, DDIM), classifier-free guidance, and the score-based view.

Systems & beyond

Latent diffusion and Stable Diffusion, text conditioning and ControlNet, text-to-image craft, and generation of video, audio, and 3D.

The curriculum

Twelve sections, one craft.

Seven sections are live now (through Latent & Conditional Diffusion); the rest are being written and appear in your dashboard as they ship.

Foundations of Generative Modeling

Generative vs discriminative, likelihood and density, latent variables and the data manifold, evaluation (FID), and a map of the families.

Available now

Autoencoders & VAEs

Autoencoders and the bottleneck, the VAE and the ELBO, the reparameterization trick, an interactive latent space, and why VAEs blur.

Available now

Generative Adversarial Networks

The adversarial game, training dynamics, mode collapse, DCGAN and StyleGAN, and conditional GANs.

Available now

Diffusion Models: Core

The big idea, the forward and reverse processes, the training objective, DDPM sampling, and the U-Net backbone.

Available now

Diffusion: Sampling & Guidance

DDIM and deterministic sampling, modern samplers, classifier and classifier-free guidance, and quality vs speed.

Available now

Score-Based Models & Theory

Score functions, denoising score matching, the SDE and probability-flow ODE view, and how it unifies with diffusion.

Available now

Latent & Conditional Diffusion

Latent diffusion and Stable Diffusion, text conditioning and cross-attention, ControlNet, img2img, and inpainting.

Available now

Text-to-Image Systems

The full system, prompt craft and negative prompts, guidance and seeds, and upscaling and refinement.

Coming soon

Beyond Images: Video, Audio, 3D

Video generation and temporal consistency, audio and music, 3D generation, and any-to-any multimodal models.

Coming soon

Control, Editing & Personalization

Latent control, DreamBooth and textual inversion, LoRA for diffusion, prompt-based editing, and style transfer.

Coming soon

Evaluation, Ethics & Safety

FID and CLIP score, deepfakes and misuse, watermarking and provenance, data and copyright, and bias.

Coming soon

Capstones & The Frontier

Design a text-to-image pipeline and an image editor, then the frontier: consistency models, flow matching, and real-time generation.

Coming soon

Before you start

Bring some ML basics.

You'll get the most from this if you're comfortable with neural networks and the idea of training by gradient descent. New to that? Start with Finguard ML, then come here for the generative half of the field.

Open the course → See all courses

Who it's for

ML learners going generative Engineers using image models Artists & technical creatives Researchers & students

Begin

See how machines imagine.

No account, no install. Progress saves automatically in your browser, separate from your other courses.

Open the course → Back to catalog