Multivariate Normal as an Exponential Family Distribution

Exponential Family of Distributions

A density f(x) belongs to the exponential family of distributions if we can write it as f(x;θ)=exp{θ,ϕ(x)A(θ)} we call θ its natural parameters, while we call Ef[ϕ(X)] its mean parameters.

Multivariate Normal Distribution

A pdf f is a multivariate normal distribution if f(x)=(2π)d2det(Σ)12exp{12(xμ)Σ1(xμ)}

This can be rearranged as f(x)=exp{xΣ1μ12xΣ1x12[dlog2π+log|Σ|+μΣ1μ]}

Frobenius Inner Product

Notice that we can write the second term as 12xΣ1x=12k=1dj=1dxkΣkj1xj similarly, the following expression can be written in the same way tr[12Σ1xx]=12tr[(Σ111Σ1d1Σd11Σdd1)(x12x1xdxdx1xd2)]=12tr[(j=1dΣ1j1xjx1j=1Σ1j1xjxdj=1dΣdj1xjx1j=1dΣdj1xjxd)]=12k=1dj=1dxkΣkj1xj Now notice that this is nothing but the Frobenius inner product between two real and symmetric matrices (which can be written both in terms of a trace and in terms of vec-torizing operations) 12Σ1,xxF=tr(12Σ1xx)=vec(12Σ1)vec(xx) where the vectorized operation for an n×m matrix A simply stacks the rows one at a time to create a (nm)×1 vector vec[A]=vec[(a11a1man1anm)]=(a11a1manm)

this allows us to write the pdf as f(x)=exp{Σ1μ,x+vec(12Σ1),vec(xx)12[dlog2π+log|Σ|+μΣ1μ]}

Natural Parameters of a Multivariate Normal Distribution

Since Σ1μ can be written as Σ1μ=(j=1dΣ1j1μjj=1dΣdj1μj) the natural parameters are given by θ=(Σ1μvec[12Σ1])=(j=1dΣ1j1μjj=1dΣdj1μj12Σ11112Σ1d112Σdd1)(d+d2)×1

In a similar fashion, we can find the sufficient statistics as ϕ(x)=(xvec(xx))=(x1xdx12x1xdxd2)(d+d2)×1

This gives the complete expression for the multivariate normal distribution as part of the Exponential Family of distributions

f(x)=exp{θ,ϕ(x)A(θ)}=exp{(j=1dΣ1j1μj,,j=1dΣdj1μj,12Σ111,,12Σ1d1,,12Σdd1)(x1xdx12x1xdxd2)12[dlog2π+log|Σ|+μΣ1μ]}

Mean Parameters of a Multivariate Normal Distribution

Remember that the expected value of a matrix or a vector is taken element-wise, so that if we were to compute (E[xx])ij=E[(xx)ij]=E[xixj]=cov(xi,xj)+E[xi]E[xj]=Σij+μiμj This means that taking the expected value of our vectorized matrix vec(xx) is equivalent to vectorizing the expected value of xx E[vec(xx)]=E[(x12x1xdxd2)]=(E[x12]E[x1xd]E[xd2])=(Σ11+μ12Σ1d+μ1μdΣdd+μd2)=vec(E[xx]) Finally, the mean parameters are given by E[ϕ(x)]=E[(xvec(xx))]=(μvec(Σ+μμ))

Avatar
Mauro Camara Escudero
Machine Learning Engineer

My research interests include approximate manifold sampling and generative models.