Summary of the SNEP algorithm

A minimalist and practical introduction.

At a Glance

Coordinate maximization of maxμ0,[μi,νi,θi]i=1n{μ0,θ0+i=1nνi,1A(μ0)i=1nβi[A(μi,νi)+A(θi)μiθi]}s.t.(μi,νi)M(s(x),i(x))i=1,,nμi=μ0i=1,,nμ0M(s(x))θiΘ

Context

In Hasenclever et al. (2017) the prior is in the base exponential family p0(x)=exp(θ0s(x)A(θ0)) Each node i=1,,n has a partition of the data. Local likelihood i(x) can be formed i(x)=cSilogp(ycx) Posterior can be written as extended exponential family p~(x)p0(x)exp(i=1ni(x))exp(θ~s~(x)A~(θ~)) where s~(x)=[s(x),1(x),,n(x)](ϕ(x),Φ1(x),,Φn(x))θ~=[θ0,1n](θ,θ~1,,θ~n)

Aim

Want to learn the posterior distribution p~(x). Do this by finding the log-partition function and the mean parameters, which completely specify the posterior distribution. Both can be found altogether by solving the variational problem represented by the convex dual. This problem is difficult to solve, so we solve an approximated problem instead.

Variational Problem

The approximated problem that we need to solve to find the (approximated) mean parameters and the (approximated) log-partition function is maxμ0,[μi,νi]i=1n{μ0,θ0+i=1nνi,1A(μ0)i=1nβi[A(μi,νi)A(μi)]}s.t.(μi,νi)M(s(x),i(x))i=1,,nμi=μ0i=1,,nμ0M(s(x)) Solving this gives an approximation to the posterior that belongs to the exponential family p~(x)exp(θ0+i=1nλi,s(x)) Since exponential family is closed under multiplication (adding up the coefficients), we can interpret this approximation as follows. Each λi is the natural parameter of an exponential family approximating the intractable likelihood term in site i given by exp(i(x)).

Updates

The tilted distribution in each node is given by pi(x)exp(θ0+j=1nλjλiβi,s(x)+1βi,i(x)) Therefore at each iteration we compute the mean parameters of the tilted distribution (Epi[s(X)],Epi[i(X)])=(μi,νi) and then perform moment matching by adjusting λi λinewμiA(μi)θ0jiλj

In practice, to avoid oscillations, we add some damping λinewαλi+(1α)(μiA(μi)θ0jiλj)where α[0,1] Notice that here μiA(μi) is the natural parameter corresponding to the mean parameter μi.

Stochastic Natural-Gradient Expectation Propagation

The only difficult step is to compute the mean parameters of the tilted distribution Epi[(s(X),i(X)] because i(x) are intractable. For this reason, authors derive an alternative to EP and PowerEP. Basically they introduce additional auxiliary natural parameter vector θ, obtaining a new variational objective function. maxμ0,[μi,νi,θi]i=1n{μ0,θ0+i=1nνi,1A(μ0)i=1nβi[A(μi,νi)+A(θi)μiθi]}s.t.(μi,νi)M(s(x),i(x))i=1,,nμi=μ0i=1,,nμ0M(s(x))θiΘ The key is that maximizing this new objective with respect to the auxiliary variables θ yields the original problem. To solve this new variational problem they introduce Lagrange multipliers and switch to the dual problem. max[θi]i=1nmin[λi]i=1nA(θ0+i=1nλi)+i=1nβi[Ai(θiλiβi,1βi)A(θi)]s.t.θiΘi=1,,n

Bibliography

Hasenclever, Leonard, Stefan Webb, Thibaut Lienart, Sebastian Vollmer, Balaji Lakshminarayanan, Charles Blundell, and Yee Whye Teh. 2017. “Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server.” J. Mach. Learn. Res. 18 (1). JMLR.org: 3744–80.

Avatar
Mauro Camara Escudero
Machine Learning Engineer

My research interests include approximate manifold sampling and generative models.

Related