Objective Function and Update Equations

Data Log-Likelihood

Of course, we know what $p (x)$ looks like, it’s just the pdf of a normal distribution. However, in practice, this could be a very complicated density and in that case, we can use the univariate change of variable formula to find its pdf

$\log p (x) = \log p (g^{- 1} (x)) + \log | \frac{\partial}{\partial x} g^{- 1} (x) | .$

The first term can be found by using the pdf for a standard normal distribution $\log p (z) = - \frac{1}{2} \log (2 π) - \frac{z^{2}}{2}$ and plugging in the inverse transformation $g^{- 1} (x) = x - μ$ $\log p (g^{- 1} (x)) = - \frac{1}{2} \log (2 π) - \frac{(x - μ)^{2}}{2} .$ For the second term, we need to compute the derivative of $g^{- 1} (x)$ with respect to $x$ $\frac{\partial}{\partial x} g^{- 1} (x) = 1$ and remember that $\log (1) = 0$ to obtain:

$\begin{aligned} \log p (x) & = - \frac{1}{2} \log (2 π) - \frac{(x - μ)^{2}}{2} . \end{aligned}$

Since the samples are i.i.d. we can estimate the log-likelihood very easily. $\begin{aligned} \sum_{i = 1}^{n} \log p (x^{(i)}) & = - \frac{n}{2} \log (2 π) - \frac{1}{2} \sum_{i = 1}^{n} (x^{(i)} - μ)^{2} \end{aligned}$

Our objective function to minimize is then the negative log-likelihood. Often one uses the negative average log-likelihood instead, because this leads to more consistent gradients across different dataset sizes, you can read more about it here

$\frac{1}{n} \sum_{i = 1}^{n} \log p (x^{(i)}) = - \frac{1}{2} \log (2 π) - \frac{1}{2 n} \sum_{i = 1}^{n} (x^{(i)} - μ)^{2}$

Log-Likelihood Gradient Estimates

We can now compute the gradient of the negative average log-likelihood with respect to $μ$

$\begin{aligned} \frac{\partial}{\partial μ} \frac{1}{n} \sum_{i = 1}^{n} \log p (x^{(i)}) & = \frac{1}{n} \sum_{i = 1}^{n} x^{(i)} - μ \end{aligned}$

These can be easily computed since we know $n$ , and $μ$ so we compute $x^{(i)} = z^{(i)} + μ$ .

Mean Update Equation

We can then update our mean using gradient descent with step size $γ$ : $\begin{aligned} μ_{t} & ⟵ μ_{t - 1} - γ [\frac{1}{n} \sum_{i = 1}^{n} x^{(i)} - μ_{t - 1}] \end{aligned}$

Last updated on Dec 10, 2020

Edit this page