Improving a Variational Autoencoder with Normalizing Flows
In order to fully grasp the concepts explained here, I strongly recommend you to read my three posts on Variational Autoencoders (in the following order)
- Variational Autoencoders and the Expectation Maximization Algorithm
- Minimalist Variational Autoencoder in Pytorch with CUDA GPU
- Assessing a Variational Autoencoder on MNIST using Pytorch.
Theory of Vanilla VAEs
Recall that in a Vanilla VAE we feed
To get a latent sample
To learn the parameters of our neural network our aim is to maximize the ELBO
The reconstruction error (the first term) is easy to compute in the Normal and Bernoulli case. In what follows, we will assume that the likelihood is a product of Bernoullis. This is the usual set-up when working with MNIST. The likelihood is then
where
where
Using Pytorch we can code it as
def vae_loss(image, reconstruction, mu, logvar):
"""Loss for the Variational AutoEncoder."""
# Compute the binary_crossentropy.
recon_loss = F.binary_cross_entropy(
input=reconstruction.view(-1, 28*28), # input is p(z) (the mean reconstruction)
target=image.view(-1, 28*28), # target is x (the true image)
reduction='sum'
)
# Compute KL divergence using formula (closed-form)
kl = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return reconstruction_loss + kl
VAE with Normalizing Flows
This time, we not only want our encoder to output
through a series of
Then we would firstly use
and finally we would feed
This means that our approximating distribution is not
anymore but, rather, it can be found using the usual change of variable formula
where the base distribution
Thanks to the law of the unconscious statistician we have
As usual, we can approximate this using Monte Carlo and generally we only need one sample. By drawing
This means that our objective function is given by
where the Log-Absolute-Determinant-Jacobian is the usual