Objective Function and Update Equations

Data Log-Likelihood

Of course, we know what p(x) looks like, it’s just the pdf of a normal distribution. However, in practice, this could be a very complicated density and in that case, we can use the univariate change of variable formula to find its pdf

logp(x)=logp(g1(x))+log|xg1(x)|.

The first term can be found by using the pdf for a standard normal distribution logp(z)=12log(2π)z22 and plugging in the inverse transformation g1(x)=xμ logp(g1(x))=12log(2π)(xμ)22. For the second term, we need to compute the derivative of g1(x) with respect to x xg1(x)=1 and remember that log(1)=0 to obtain:

logp(x)=12log(2π)(xμ)22.

Since the samples are i.i.d. we can estimate the log-likelihood very easily. i=1nlogp(x(i))=n2log(2π)12i=1n(x(i)μ)2

Our objective function to minimize is then the negative log-likelihood. Often one uses the negative average log-likelihood instead, because this leads to more consistent gradients across different dataset sizes, you can read more about it here

1ni=1nlogp(x(i))=12log(2π)12ni=1n(x(i)μ)2

Log-Likelihood Gradient Estimates

We can now compute the gradient of the negative average log-likelihood with respect to μ

μ1ni=1nlogp(x(i))=1ni=1nx(i)μ

These can be easily computed since we know n, and μ so we compute x(i)=z(i)+μ.

Mean Update Equation

We can then update our mean using gradient descent with step size γ: μtμt1γ[1ni=1nx(i)μt1]

Previous
Next