At a Glance
Coordinate maximization of
Context
In Hasenclever et al. (2017) the prior is in the base exponential family
Each node has a partition of the data. Local likelihood can be formed
Posterior can be written as extended exponential family
where
Aim
Want to learn the posterior distribution . Do this by finding the log-partition function and the mean parameters, which completely specify the posterior distribution. Both can be found altogether by solving the variational problem represented by the convex dual. This problem is difficult to solve, so we solve an approximated problem instead.
Variational Problem
The approximated problem that we need to solve to find the (approximated) mean parameters and the (approximated) log-partition function is
Solving this gives an approximation to the posterior that belongs to the exponential family
Since exponential family is closed under multiplication (adding up the coefficients), we can interpret this approximation as follows. Each is the natural parameter of an exponential family approximating the intractable likelihood term in site given by .
Updates
The tilted distribution in each node is given by
Therefore at each iteration we compute the mean parameters of the tilted distribution
and then perform moment matching by adjusting
In practice, to avoid oscillations, we add some damping
Notice that here is the natural parameter corresponding to the mean parameter .
Stochastic Natural-Gradient Expectation Propagation
The only difficult step is to compute the mean parameters of the tilted distribution
because are intractable. For this reason, authors derive an alternative to EP and PowerEP. Basically they introduce additional auxiliary natural parameter vector , obtaining a new variational objective function.
The key is that maximizing this new objective with respect to the auxiliary variables yields the original problem. To solve this new variational problem they introduce Lagrange multipliers and switch to the dual problem.
Bibliography
Hasenclever, Leonard, Stefan Webb, Thibaut Lienart, Sebastian Vollmer, Balaji Lakshminarayanan, Charles Blundell, and Yee Whye Teh. 2017. “Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server.” J. Mach. Learn. Res. 18 (1). JMLR.org: 3744–80.