Objective Function and Update Equations
Data Log-Likelihood
Of course, we know what
The first term can be found by using the pdf for a standard normal distribution
Since the samples are i.i.d. we can estimate the log-likelihood very easily.
Our objective function to minimize is then the negative log-likelihood. Often one uses the negative average log-likelihood instead, because this leads to more consistent gradients across different dataset sizes, you can read more about it here
Log-Likelihood Gradient Estimates
We can now compute the gradient of the negative average log-likelihood with respect to
These can be easily computed since we know
Mean Update Equation
We can then update our mean using gradient descent with step size