Skip to main content
Machine Learning & Deep Learning

⏱ About 15 min15 XP

Networks That Create

For most of computing history, computers only ever processed or stored things humans made. Then deep learning crossed a new threshold: machines began creating original images, music, voices, and text that did not exist before. This lesson explores how that works, why it is powerful, and what questions it raises.

The Generator-Discriminator Idea

One of the earliest and most influential generative architectures is the Generative Adversarial Network, or GAN. The name describes the mechanism: two networks compete. The generator network starts with random noise — just meaningless numbers — and tries to transform that noise into something that looks like a real image. The discriminator network has seen thousands of real images and tries to tell whether any given image is real or generator-made. The two networks train together in a loop. Every time the discriminator catches the generator's fakes, the generator adjusts its weights to fool the discriminator better. Every time the generator successfully fools it, the discriminator adjusts to get better at spotting fakes. Over millions of rounds, the generator learns to produce images that are nearly indistinguishable from real photographs. By 2018, GANs trained on face photographs were generating hyper-realistic human faces of people who had never existed. Websites like 'This Person Does Not Exist' let anyone see this in action.

Definition: Generative Model

A generative model is a deep network that learns the statistical patterns of a dataset so well it can produce new examples that follow the same patterns. Generative models can output images, audio, video, text, or code — anything that can be represented as numbers.

GANs are one approach. Two others now dominate. Diffusion models start with a real image and gradually add random noise until the image is pure static. Then they train a network to reverse that process — to take noisy static and peel back the noise step by step until a clean image re-emerges. During inference (actual use), you skip the real image entirely and start from pure noise, letting the model 'denoise' toward whatever category or description you specified. This is how Stable Diffusion and DALL-E 3 work. Autoregressive language models generate text one token at a time, each token predicted from everything that came before. GPT-4 is an autoregressive model. 'Token' roughly means a word-piece; the model predicts millions of them in sequence to write a paragraph, a poem, or a program. All three approaches — GANs, diffusion, autoregressive — can be conditioned on a text prompt. You describe what you want; the model generates something matching that description. This is called text-to-image or text-to-text generation.

Real Uses and Real Risks

Generative models are already reshaping creative industries. Graphic designers use diffusion models to prototype concepts in seconds. Musicians use audio generators to create backing tracks. Writers use language models to draft and edit. Game studios generate landscapes, textures, and dialogue at a scale no human team could match. The same technology creates serious risks. Deepfakes — AI-generated videos of real people saying or doing things they never did — can be used to spread misinformation or to harm reputations. Audio cloning can replicate a specific person's voice from seconds of recording, enabling fraud. Generated images can fabricate evidence. The creative power and the deception risk come from exactly the same capability: the ability to produce realistic content indistinguishable from real content. Understanding how these tools work is the first step to using them responsibly and spotting when others misuse them.

Synthetic Media Requires Verification

If you see a striking image or video online, it may be AI-generated. Check for artifacts: unnatural hands (models often get finger counts wrong), inconsistent text within images, and strange background blurring. When the stakes are high, verify with a primary source, not just the media itself.

Prompt Challenge

Write a prompt for an image-generation model that would produce a scientifically accurate image of a neuron, suitable for a biology textbook.

Your prompt should…

  • describe the subject clearly using specific visual details
  • specify the artistic style and intended purpose
  • state what should be excluded to keep the image accurate

In a Generative Adversarial Network, what is the role of the discriminator?

Diffusion models generate images by doing what, in reverse?

Design the GAN Arena

  1. On paper, draw two boxes labeled Generator and Discriminator.
  2. Draw arrows showing: (1) random noise going into the Generator, (2) a generated image coming out, (3) that image and a real image both going into the Discriminator, (4) a True/Fake label coming out of the Discriminator, (5) feedback arrows going back to both networks.
  3. Next to each arrow, write one sentence describing what information is flowing.
  4. Present your diagram to a partner and walk them through one full training round.