Gaussian Expectation Propagation

Last updated on Feb 10, 2024 4 min read expectation-propagation

Multivariate Normal Distribution in the Exponential Family

Remember from a previous blog post that a pdf $p (x) = N (x; μ, Σ)$ can be written as $p (x) = \exp {⟨ Σ^{- 1} μ, x ⟩ + ⟨ vec (- \frac{1}{2} Σ^{- 1}), vec (x x^{⊤}) ⟩ - \frac{1}{2} [d \log 2 π + \log | Σ | + μ^{⊤} Σ^{- 1} μ]}$

Following the work of Barthelmé, Chopin, and Cottet (2015) we can relabel the expression above as follows $p (x) \propto \exp {⟨ r, x ⟩ + ⟨ vec (- \frac{1}{2} Q), vec (x x^{⊤}) ⟩}$ where we have defined $r := Σ^{- 1} μ and Q := Σ^{- 1}$

Target Distribution

We assume the target distribution is intractable, but can be factorized into a product of $K + 1$ terms $f (γ) \propto \prod_{k = 0}^{K} f_{k} (γ_{k})$ where usually

$f (γ) = p (γ ∣ x)$ is a posterior distribution.
$f_{0} (γ_{0}) = p (γ; θ_{0})$ is a prior distribution.
$f_{1} (γ_{1}), \dots, f_{K} (γ_{k})$ are likelihood terms, which are intractable.

From here onwards we assume that the prior distribution $f_{0} (γ)$ is a multivariate gaussian distribution $N (γ; r_{0}, Q_{0})$ (with natural parameters $θ_{0} = {(r_{0}, - \frac{1}{2} Q_{0})}^{⊤}$ ). $f_{0} (γ) = p (γ; r_{0}, Q_{0}) =\propto \exp {⟨ r_{0}, x ⟩ + ⟨ vec (- \frac{1}{2} Q_{0}), vec (x x^{⊤}) ⟩}$

Note: The parameters $γ_{0}, \dots, γ_{k}$ are not the components of $γ$ , they are different parameters.

Global Approximation

The global approximation is defined as $g (θ) \propto \prod_{k = 0}^{K} g_{k} (θ_{k})$ where we set the first term to be equal to the (tractable) prior, which is a multivariate gaussian distribution $g_{0} (θ) \propto \exp {⟨ r_{0}, x ⟩ + ⟨ vec (- \frac{1}{2} Q_{0}), vec (x x^{⊤}) ⟩}$ Furthermore, we assume that each factor $g_{k} (θ_{k})$ for $k = 1, \dots, K$ also follows a Multivariate Normal Distribution with natural parameter $θ_{k} = (r_{k}, Q_{k})^{⊤}$ . Luckily, the product of Gaussians is again a Gaussian distribution. In particular we have $\begin{aligned} g (θ) & \propto \prod_{k = 0}^{K} \exp {r_{k}^{⊤} x + vec {(- \frac{1}{2} Q_{k})}^{⊤} vec (x x^{⊤})} \\ = \exp {{(\sum_{k = 0}^{K} r_{k})}^{⊤} x + vec {(- \frac{1}{2} \sum_{k = 0}^{K} Q_{k})}^{⊤} vec (x x^{⊤})} \\ := \exp {r^{⊤} x + vec {(- \frac{1}{2} Q)}^{⊤} vec (x x^{⊤})} \end{aligned}$ where we have defined the global natural parameters to be $r := \sum_{k = 0}^{K} r_{k} and Q := \sum_{k = 0}^{K} Q_{k}$ In other words, the natural parameters of the global approximation are found by summing the natural parameters of all the sites.

Note: The parameters $θ_{0}, \dots, θ_{k}$ are not the components of $θ$ , they are different parameters.

Cavity Distribution

The cavity distribution at the $k^{th}$ site is given by the product of all but the $k^{th}$ Multivariate Gaussian. This means that we can write the cavity distribution as $g_{- k} (θ - θ_{k}) \propto \exp {{(r - r_{k})}^{⊤} x + vec {(- \frac{1}{2} (Q - Q_{k}))}^{⊤} vec (x x^{⊤})}$ In other words, the natural parameters of the cavity distribution are found by taking the difference between the global natural parameters and the natural parameters of the $k^{th}$ site.

Tilted Distribution

The tilted distribution, also called pseudo-posterior, is found by multiplying the cavity distribution by the $k^{th}$ local likelihood term. In other words, the tilted distribution is a (pseudo-)posterior where we use the cavity distribution as a (pseudo-)prior and the single local likelihood term as the likelihood.

In general, computing moments of this distribution will be intractable, however we how that calculating moments of this distribution will be easier than calculating moments of the entire target distribution. $\begin{aligned} g_{∖ k} ({\tilde{γ}}_{k}) & \propto f_{k} (γ_{k}) \exp {{(r - r_{k})}^{⊤} x + vec {(- \frac{1}{2} (Q - Q_{k}))}^{⊤} vec (x x^{⊤})} \end{aligned}$