Gaussian with Unknown Variance and Unknown Mean

Finding the Objective Function

Suppose that now we want to generate samples from N(μ,σ) rather than N(μ,1). This makes the calculations a bit more involved than in the previous example because now also σ is unknown, but the general idea is the same.

Once again, we find the expression for p(x) using the change of variable formula. This time the transformation is g(z)=σz+μ leading to p(g1(x))=12log(2π)(xμ)22σ2. Since xg1(x)=σ1 we get logp(x)=12log(2π)(xμ)22σ2logσ.

The average log-likelihood is then given by the following expression: 1ni=1nlogp(x(i))=12log(2π)12nσ2i=1n(x(i)μ)2logσ

Gradient Estimates and Updates

Firstly, we take the derivative with respect to μ: μ1ni=1nlogp(x(i))=1nσ2i=1n(xiμ)=xσ2μσ2

Similarly, the derivative with respect to σ is given by σ1ni=1nlogp(x(i))=1nσ3i=1n(x(i)μ)21σ

This means that our updates will be μt+1μt+γμ(xσt2μtσt2)σt+1σt+γσ(1nσt3i=1n(x(i)μt+1)21σt)

where γμ and γσ represent the (possibly) different step sizes for μ and σ respectively.

Coding

The code below is very similar to the one for example 1. The only differences are that now we have both a true μ and a true σ. In this example, we’ve fixed them at μtrue=2.0 and σtrue=2.0. We start with guesses 3.0 and 1.0 for μ and σ respectively. Notice how we use two different step sizes for μt and σt.

# Import stuff
import math
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Generate data
n = 10000
mu_true = 2.0
sigma_true = 2.0
x = np.random.normal(loc=mu_true, scale=sigma_true, size=n)

# Algorithm Settings
mu = -3.0        # Start with mu = -1.0
sigma = 1.0      # Start with sigma = 1.0
gamma_mu = 0.1      # Learning rate
gamma_sigma = 0.01
n_iter = 500     # Number of iterations

# Loop through and update mu and sigma
mus = [mu]
sigmas = [sigma]
for i in range(n_iter):
    # Compute gradients
    mu_grad = (np.mean(x) - mu) / (sigma**2)
    sigma_grad = (np.mean((x - mu)**2)/sigma**3 - 1 / sigma)
    # Update mu and sigma
    mu, sigma = mu + gamma_mu*mu_grad, sigma + gamma_sigma*sigma_grad
    # Store mu and sigma
    mus.append(mu)
    sigmas.append(sigma)
    
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(20, 5))
# Plot mu trajectory
ax[0].plot(range(n_iter+1), mus, lw=3)
ax[0].hlines(y=mu_true, xmin=0, xmax=n_iter, 
          color='darkred', linestyle='dashed', lw=3)
ax[0].legend([r'$\mu_t$' + " Trajectory", r'$\mu_{true}$'], prop={'size': 29}, loc='lower right')
# Plot sigma trajectory 
ax[1].plot(range(n_iter+1), sigmas, lw=3)
ax[1].hlines(y=sigma_true, xmin=0, xmax=n_iter, 
          color='darkred', linestyle='dashed', lw=3)
ax[1].legend([r'$\sigma_t$' + " Trajectory", r'$\sigma_{true}$'], prop={'size': 29})
plt.show()

Normalizing Flows Course Example 2

You can find and run a working version of this Google Colab notebook. .

Previous
Next