Gaussian with Unknown Variance and Unknown Mean

Finding the Objective Function

Suppose that now we want to generate samples from $N (μ, σ)$ rather than $N (μ, 1)$ . This makes the calculations a bit more involved than in the previous example because now also $σ$ is unknown, but the general idea is the same.

Once again, we find the expression for $p (x)$ using the change of variable formula. This time the transformation is $g (z) = σ z + μ$ leading to $p (g^{- 1} (x)) = - \frac{1}{2} \log (2 π) - \frac{(x - μ)^{2}}{2 σ^{2}} .$ Since $\partial_{x} g^{- 1} (x) = σ^{- 1}$ we get $\log p (x) = - \frac{1}{2} \log (2 π) - \frac{(x - μ)^{2}}{2 σ^{2}} - \log σ .$

The average log-likelihood is then given by the following expression: $\frac{1}{n} \sum_{i = 1}^{n} \log p (x^{(i)}) = - \frac{1}{2} \log (2 π) - \frac{1}{2 n σ^{2}} \sum_{i = 1}^{n} (x^{(i)} - μ)^{2} - \log σ$

Gradient Estimates and Updates

Firstly, we take the derivative with respect to $μ$ : $\frac{\partial}{\partial μ} \frac{1}{n} \sum_{i = 1}^{n} \log p (x^{(i)}) = \frac{1}{n σ^{2}} \sum_{i = 1}^{n} (x_{i} - μ) = \frac{\overset{―}{x}}{σ^{2}} - \frac{μ}{σ^{2}}$

Similarly, the derivative with respect to $σ$ is given by $\frac{\partial}{\partial σ} \frac{1}{n} \sum_{i = 1}^{n} \log p (x^{(i)}) = \frac{1}{n σ^{3}} \sum_{i = 1}^{n} (x^{(i)} - μ)^{2} - \frac{1}{σ}$

This means that our updates will be $\begin{aligned} μ_{t + 1} & ⟵ μ_{t} + γ_{μ} (\frac{\overset{―}{x}}{σ_{t}^{2}} - \frac{μ_{t}}{σ_{t}^{2}}) \\ σ_{t + 1} & ⟵ σ_{t} + γ_{σ} (\frac{1}{n σ_{t}^{3}} \sum_{i = 1}^{n} (x^{(i)} - μ_{t + 1})^{2} - \frac{1}{σ_{t}}) \end{aligned}$

where $γ_{μ}$ and $γ_{σ}$ represent the (possibly) different step sizes for $μ$ and $σ$ respectively.

Coding

The code below is very similar to the one for example 1. The only differences are that now we have both a true $μ$ and a true $σ$ . In this example, we’ve fixed them at $μ_{true} = 2.0$ and $σ_{true} = 2.0$ . We start with guesses $- 3.0$ and $1.0$ for $μ$ and $σ$ respectively. Notice how we use two different step sizes for $μ_{t}$ and $σ_{t}$ .

# Import stuff
import math
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

# Generate data
n = 10000
mu_true = 2.0
sigma_true = 2.0
x = np.random.normal(loc=mu_true, scale=sigma_true, size=n)

# Algorithm Settings
mu = -3.0        # Start with mu = -1.0
sigma = 1.0      # Start with sigma = 1.0
gamma_mu = 0.1      # Learning rate
gamma_sigma = 0.01
n_iter = 500     # Number of iterations

# Loop through and update mu and sigma
mus = [mu]
sigmas = [sigma]
for i in range(n_iter):
    # Compute gradients
    mu_grad = (np.mean(x) - mu) / (sigma**2)
    sigma_grad = (np.mean((x - mu)**2)/sigma**3 - 1 / sigma)
    # Update mu and sigma
    mu, sigma = mu + gamma_mu*mu_grad, sigma + gamma_sigma*sigma_grad
    # Store mu and sigma
    mus.append(mu)
    sigmas.append(sigma)
    
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(20, 5))
# Plot mu trajectory
ax[0].plot(range(n_iter+1), mus, lw=3)
ax[0].hlines(y=mu_true, xmin=0, xmax=n_iter, 
          color='darkred', linestyle='dashed', lw=3)
ax[0].legend([r'$\mu_t$' + " Trajectory", r'$\mu_{true}$'], prop={'size': 29}, loc='lower right')
# Plot sigma trajectory 
ax[1].plot(range(n_iter+1), sigmas, lw=3)
ax[1].hlines(y=sigma_true, xmin=0, xmax=n_iter, 
          color='darkred', linestyle='dashed', lw=3)
ax[1].legend([r'$\sigma_t$' + " Trajectory", r'$\sigma_{true}$'], prop={'size': 29})
plt.show()

Normalizing Flows Course Example 2

You can find and run a working version of this Google Colab notebook. .

Last updated on Dec 10, 2020

Edit this page