Variance schedule



At each step t, the image is slightly shifted from its iteration in the previous step (per the mean) and noise is added to this shifted version of the image (per the variance). The magnitude of each shift and addition of noise is driven by the value of β t : as β t increases in accordance with the variance schedule, the rate of diffusion steadily increases as a result. β is always a value between 0 and 1: so, 0 < β 1 < β 2 < … < β T < 1.

Choosing a specific variance schedule for β is an important consideration. It’s usually set by hand as a hyperparameter, either fixed to a constant value or proceeding according to some formula with a predetermined starting value and end value for β. In the DDPM paper, Ho et al used a linear schedule with 1,000 steps wherein β 1 = 10-4 and β T = 0.02. Later research found improvements in performance and efficiency with other types of schedules, such as a cosine schedule,[1] or making the schedule itself another learned parameter.[2]

The value of β t determines both the mean and variance of the Gaussian noise added at step t.

The mean μ of the gaussian noise added at timestep t, μ t , is calculated as μ t = ( 1 - β t ) x t - 1 . In plain language, the average of the noise added at each step t is simply a scaled version of the image from the previous step, x t-1 . The size of β t determines how far this mean deviates from the previous step: when β t is very small, this shift is very minor—because ( 1 - β t ) ≈ ( 1 - 0 ) ≈ 1 —and the added noise will thus closely resemble the original image. As the value of β t increases, this shift becomes more significant.





Because the addition of gaussian noise begins gradually and the noise itself is always derived from the essential structure of the original image in the previous step, the essential qualities of the original image are retained for many steps. This enables the model to meaningfully learn the patterns and structure of the original data distribution during the reverse diffusion process.