Multivariate Normal as an Exponential Family Distribution

Last updated on Mar 13, 2020 3 min read statistics

Exponential Family of Distributions

A density $f (x)$ belongs to the exponential family of distributions if we can write it as $f (x; θ) = \exp {⟨ θ, ϕ (x) ⟩ - A (θ)}$ we call $θ$ its natural parameters, while we call $E_{f} [ϕ (X)]$ its mean parameters.

Multivariate Normal Distribution

A pdf $f$ is a multivariate normal distribution if $f (x) = (2 π)^{- \frac{d}{2}} det (Σ)^{- \frac{1}{2}} \exp {- \frac{1}{2} (x - μ)^{⊤} Σ^{- 1} (x - μ)}$

This can be rearranged as $f (x) = \exp {x^{⊤} Σ^{- 1} μ - \frac{1}{2} x^{⊤} Σ^{- 1} x - \frac{1}{2} [d \log 2 π + \log | Σ | + μ^{⊤} Σ^{- 1} μ]}$

Frobenius Inner Product

Notice that we can write the second term as $- \frac{1}{2} x^{⊤} Σ^{- 1} x = - \frac{1}{2} \sum_{k = 1}^{d} \sum_{j = 1}^{d} x_{k} Σ_{k j}^{- 1} x_{j}$ similarly, the following expression can be written in the same way $\begin{aligned} tr [- \frac{1}{2} Σ^{- 1} x x^{⊤}] & = - \frac{1}{2} tr [(\begin{array}{c} Σ_{11}^{- 1} & \dots & Σ_{1 d}^{- 1} \\ ⋮ & ⋱ & ⋮ \\ Σ_{d 1}^{- 1} & \dots & Σ_{d d}^{- 1} \end{array}) (\begin{array}{c} x_{1}^{2} & \dots & x_{1} x_{d} \\ ⋮ & ⋱ & ⋮ \\ x_{d} x_{1} & \dots & x_{d}^{2} \end{array})] \\ = - \frac{1}{2} tr [(\begin{array}{c} \sum_{j = 1}^{d} Σ_{1 j}^{- 1} x_{j} x_{1} & \dots & \sum_{j = 1} Σ_{1 j}^{- 1} x_{j} x_{d} \\ ⋮ & ⋱ & ⋮ \\ \sum_{j = 1}^{d} Σ_{d j}^{- 1} x_{j} x_{1} & \dots & \sum_{j = 1}^{d} Σ_{d j}^{- 1} x_{j} x_{d} \end{array})] \\ = - \frac{1}{2} \sum_{k = 1}^{d} \sum_{j = 1}^{d} x_{k} Σ_{k j}^{- 1} x_{j} \end{aligned}$ Now notice that this is nothing but the Frobenius inner product between two real and symmetric matrices (which can be written both in terms of a trace and in terms of $vec$ -torizing operations) $\begin{aligned} {⟨ - \frac{1}{2} Σ^{- 1}, x x^{⊤} ⟩}_{F} & = tr (- \frac{1}{2} Σ^{- 1} x x^{⊤}) \\ = vec {(- \frac{1}{2} Σ^{- 1})}^{⊤} vec (x x^{⊤}) \end{aligned}$ where the vectorized operation for an $n \times m$ matrix $A$ simply stacks the rows one at a time to create a $(n m) \times 1$ vector $vec [A] = vec [(\begin{matrix} a_{11} & \dots & a_{1 m} \\ ⋮ & ⋱ & ⋮ \\ a_{n 1} & \dots & a_{n m} \end{matrix})] = (\begin{matrix} a_{11} \\ ⋮ \\ a_{1 m} \\ ⋮ \\ a_{n m} \end{matrix})$

this allows us to write the pdf as $f (x) = \exp {⟨ Σ^{- 1} μ, x ⟩ + ⟨ vec (- \frac{1}{2} Σ^{- 1}), vec (x x^{⊤}) ⟩ - \frac{1}{2} [d \log 2 π + \log | Σ | + μ^{⊤} Σ^{- 1} μ]}$

Natural Parameters of a Multivariate Normal Distribution

Since $Σ^{- 1} μ$ can be written as $Σ^{- 1} μ = (\begin{matrix} \sum_{j = 1}^{d} Σ_{1 j}^{- 1} μ_{j} \\ ⋮ \\ \sum_{j = 1}^{d} Σ_{d j}^{- 1} μ_{j} \end{matrix})$ the natural parameters are given by $θ = (\begin{matrix} Σ^{- 1} μ \\ vec [- \frac{1}{2} Σ^{- 1}] \end{matrix}) = {(\begin{matrix} \sum_{j = 1}^{d} Σ_{1 j}^{- 1} μ_{j} \\ ⋮ \\ \sum_{j = 1}^{d} Σ_{d j}^{- 1} μ_{j} \\ - \frac{1}{2} Σ_{11}^{- 1} \\ ⋮ \\ - \frac{1}{2} Σ_{1 d}^{- 1} \\ ⋮ \\ - \frac{1}{2} Σ_{d d}^{- 1} \end{matrix})}_{(d + d^{2}) \times 1}$

In a similar fashion, we can find the sufficient statistics as $ϕ (x) = (\begin{matrix} x \\ vec (x x^{⊤}) \end{matrix}) = {(\begin{matrix} x_{1} \\ ⋮ \\ x_{d} \\ x_{1}^{2} \\ ⋮ \\ x_{1} x_{d} \\ ⋮ \\ x_{d}^{2} \end{matrix})}_{(d + d^{2}) \times 1}$

This gives the complete expression for the multivariate normal distribution as part of the Exponential Family of distributions

$\begin{aligned} f (x) & = \exp {⟨ θ, ϕ (x) ⟩ - A (θ)} \\ = \exp {{(\sum_{j = 1}^{d} Σ_{1 j}^{- 1} μ_{j}, \dots, \sum_{j = 1}^{d} Σ_{d j}^{- 1} μ_{j}, - \frac{1}{2} Σ_{11}^{- 1}, \dots, - \frac{1}{2} Σ_{1 d}^{- 1}, \dots, - \frac{1}{2} Σ_{d d}^{- 1})}^{⊤} (\begin{array}{c} x_{1} \\ ⋮ \\ x_{d} \\ x_{1}^{2} \\ ⋮ \\ x_{1} x_{d} \\ ⋮ \\ x_{d}^{2} \end{array}) - \frac{1}{2} [d \log 2 π + \log | Σ | + μ^{⊤} Σ^{- 1} μ]} \end{aligned}$

Mean Parameters of a Multivariate Normal Distribution

Remember that the expected value of a matrix or a vector is taken element-wise, so that if we were to compute $\begin{aligned} (E [x x^{⊤}])_{i j} & = E [(x x^{⊤})_{i j}] \\ = E [x_{i} x_{j}] \\ = cov (x_{i}, x_{j}) + E [x_{i}] E [x_{j}] \\ = Σ_{i j} + μ_{i} μ_{j} \end{aligned}$ This means that taking the expected value of our vectorized matrix $vec (x x^{⊤})$ is equivalent to vectorizing the expected value of $x x^{⊤}$ $\begin{aligned} E [vec (x x^{⊤})] & = E [(\begin{array}{c} x_{1}^{2} \\ ⋮ \\ x_{1} x_{d} \\ ⋮ \\ x_{d}^{2} \end{array})] \\ = (\begin{array}{c} E [x_{1}^{2}] \\ ⋮ \\ E [x_{1} x_{d}] \\ ⋮ \\ E [x_{d}^{2}] \end{array}) \\ = (\begin{array}{c} Σ_{11} + μ_{1}^{2} \\ ⋮ \\ Σ_{1 d} + μ_{1} μ_{d} \\ ⋮ \\ Σ_{d d} + μ_{d}^{2} \end{array}) \\ = vec (E [x x^{⊤}]) \end{aligned}$ Finally, the mean parameters are given by $\begin{aligned} E [ϕ (x)] & = E [(\begin{array}{c} x \\ vec (x x^{⊤}) \end{array})] \\ = (\begin{array}{c} μ \\ vec (Σ + μ μ^{⊤}) \end{array}) \end{aligned}$

multivariate-normal multivariate-gaussian exponential-family