Towards SMC: Importance Sampling with Sequential Data
Simple Explanation of Importance Sampling with Sequential Data
Importance Sampling for Sequential Data
Suppose now that instead of having one posterior \(p(\boldsymbol{\mathbf{\theta}}\mid \boldsymbol{\mathbf{y}})\) we actually have a sequence of posterior distributions. This could happen in a scenario in which data comes in sequentially and we need to estimate the posterior each time. In particular, suppose that at time \(t\) we have the following data \(\boldsymbol{\mathbf{x}}_{t} = (x_1, \ldots, x_t)^\top\) and the respective posterior is given by \(\gamma_t(\boldsymbol{\mathbf{x}}_t)\). Then we can approximate the expected value of a function \(h(\boldsymbol{\mathbf{x}}_t)\) in a similar way as above by rewriting it in terms of an importance distribution \(q_t(\boldsymbol{\mathbf{x}}_t)\) \[\begin{align} \mathbb{E}_{\gamma_t}\left[h(\boldsymbol{\mathbf{x}}_t)\right] &= \int h(\boldsymbol{\mathbf{x}}_t)\gamma_t(\boldsymbol{\mathbf{x}}_t) d\boldsymbol{\mathbf{x}}_t \\ &= \frac{1}{Z_t}\int h(\boldsymbol{\mathbf{x}}_t)\frac{\widetilde{\gamma_t}(\boldsymbol{\mathbf{x}}_t)}{q_t(\boldsymbol{\mathbf{x}}_t)} q_t(\boldsymbol{\mathbf{x}}_t) d\boldsymbol{\mathbf{x}}_t \\ &= \frac{\mathbb{E}_{q_t}\left[\frac{\widetilde{\gamma_t}(\boldsymbol{\mathbf{x}}_t)}{q_t(\boldsymbol{\mathbf{x}}_t)}h(\boldsymbol{\mathbf{x}}_t)\right]}{\mathbb{E}_{q_t}\left[\frac{\widetilde{\gamma_t}(\boldsymbol{\mathbf{x}}_t)}{q_t(\boldsymbol{\mathbf{x}}_t)}\right]} \\ &\approx \sum_{i=1}^N w_t(\boldsymbol{\mathbf{x}}_t^{(i)}) h_t(\boldsymbol{\mathbf{x}}_t^{(i)}) && \boldsymbol{\mathbf{x}}_t^{(i)} \overset{\text{iid}}{\sim} q_t(\, \cdot\,) \end{align}\] where we have defined the normalized weights and the unnormalized weights as \[ w_t(\boldsymbol{\mathbf{x}}_t^{(i)}) = \frac{\widetilde{w}_t(\boldsymbol{\mathbf{x}}_t^{(i)})}{\sum_{j=1}^N \widetilde{w}_t(\boldsymbol{\mathbf{x}}_t^{(i)})} \qquad \text{and} \qquad \widetilde{w}_t(\boldsymbol{\mathbf{x}}_t^{(i)}) = \frac{\widetilde{\gamma_t}(\boldsymbol{\mathbf{x}}_t^{(i)})}{q_t(\boldsymbol{\mathbf{x}}^{(i)})} \] We can also use these weights to obtain an estimate of the normalization constant of the posterior, \(Z_t\), and of the normalized posterior itself. In order to do this, we need to recall the definition of the dirac delta mass. \[\begin{align} \gamma_t(\boldsymbol{\mathbf{x}}_t) &= \int \gamma_t(\boldsymbol{\mathbf{x}}_t')\delta_{\boldsymbol{\mathbf{x}}_t}(\boldsymbol{\mathbf{x}}_t') d\boldsymbol{\mathbf{x}}_t' \\ &\approx \frac{\sum_{i=1}^N \frac{\widetilde{\gamma_t}(\boldsymbol{\mathbf{x}}_t^{(i)})}{q_t(\boldsymbol{\mathbf{x}}_t^{(i)})}\delta_{\boldsymbol{\mathbf{x}}_t^{(i)}}(\boldsymbol{\mathbf{x}}_t)}{\sum_{j=1}^N \frac{\widetilde{\gamma_t}(\boldsymbol{\mathbf{x}}_t^{(j)})}{q_t(\boldsymbol{\mathbf{x}}_t^{(j)})}} \\ &= \sum_{i=1}^N w_t(\boldsymbol{\mathbf{x}}_t^{(i)}) \delta_{\boldsymbol{\mathbf{x}}_t^{(i)}}(\boldsymbol{\mathbf{x}}_t) \end{align}\]
And, similarly, to estimate the normalization constant we do the following \[ Z_t = \int \widetilde{\gamma_t}(\boldsymbol{\mathbf{x}}_t) d \boldsymbol{\mathbf{x}}_t \approx \sum_{i=1}^N \widetilde{w}_t(\boldsymbol{\mathbf{x}}_t^{(i)}) \]
Bibliography
Barber, David. 2012. Bayesian Reasoning and Machine Learning. USA: Cambridge University Press.
Doucet, Arnaud, Simon Godsill, and Christophe Andrieu. 2000. “On Sequential Monte Carlo Sampling Methods for Bayesian Filtering.” Statistics and Computing 10 (3). USA: Kluwer Academic Publishers: 197–208. https://doi.org/10.1023/A:1008935410038.
Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 2004. Bayesian Data Analysis. 2nd ed. Chapman; Hall/CRC.
Liu, Jun S. 2008. Monte Carlo Strategies in Scientific Computing. Springer Publishing Company, Incorporated.
Naesseth, Christian A., Fredrik Lindsten, and Thomas B. Schön. 2019. “Elements of Sequential Monte Carlo.”
Robert, Christian P., and George Casella. 2005. Monte Carlo Statistical Methods (Springer Texts in Statistics). Berlin, Heidelberg: Springer-Verlag.