# Towards SMC: Sequential Importance Sampling

Sequential Importance Sampling tutorial for Sequential Monte Carlo (SMC)

### Review of Importance Sampling for Sequential Data

At time $$t=1$$ we receive data $$x_1$$, and at time $$t>1$$ we receive data $$x_t$$. Let $$\boldsymbol{\mathbf{x}}_t=(x_1, \ldots, x_t)$$. Suppose that at each time $$t$$ our aim is to do inference based on the current posterior distribution $$\gamma_t(\boldsymbol{\mathbf{x}}_t)$$. Such inference could, for instance, be to approximate the current posterior expectation of a function of $$h(\boldsymbol{\mathbf{x}}_t)$$, i.e. $$\mathbb{E}_{\gamma_t(\boldsymbol{\mathbf{x}}_t)}[h(\boldsymbol{\mathbf{x}}_t)]$$. Importance sampling works as follows:

• Sample $$\boldsymbol{\mathbf{x}}_t^{(i)}$$ from an importance distribution $$q_t(\boldsymbol{\mathbf{x}}_t)$$ for $$i=1, \ldots, N$$.
• Compute the unnormalized importance weights and normalize them, to find the normalized importance weights $\widetilde{w}_t(\boldsymbol{\mathbf{x}}_t^{(i)}) = \frac{\widetilde{\gamma_t}(\boldsymbol{\mathbf{x}}_t^{(i)})}{q_t(\boldsymbol{\mathbf{x}}^{(i)})}\qquad \qquad \text{and} \qquad\qquad w_t(\boldsymbol{\mathbf{x}}_t^{(i)}) = \frac{\widetilde{w}_t(\boldsymbol{\mathbf{x}}_t^{(i)})}{\sum_{j=1}^N \widetilde{w}_t(\boldsymbol{\mathbf{x}}_t^{(i)})} \quad \text{for } i=1, \ldots, N$
• Use the importance weghts to approximate the expectation. $\mathbb{E}_{\gamma_t(\boldsymbol{\mathbf{x}}_t)}[h(\boldsymbol{\mathbf{x}}_t)] \approx \sum_{i=1}^N w_t(\boldsymbol{\mathbf{x}}_t^{(i)}) h(\boldsymbol{\mathbf{x}}_t^{(i)})$

### Sequential Importance Sampling

Sequential Importance Sampling has two main differences with respect to Importance Sampling for sequential data.

• Importance distribution is autoregressive: $q_t(\boldsymbol{\mathbf{x}}_t) = \underbrace{q_{t-1}(x_{1:t-1})}_{\substack{\text{Importance} \\ \text{ Distribution} \\ \text{at time } t-1}} q_t(x_t \mid x_{1:t-1})$
• Samples at time $$t$$ are found recursively using the samples at time $$t-1$$. Previously, at each time $$t$$ we were sampling $$\boldsymbol{\mathbf{x}}_t^{(1)}, \ldots, \boldsymbol{\mathbf{x}}_t^{(N)}$$ from $$q(\boldsymbol{\mathbf{x}}_t)= q_t(x_1, \ldots, x_t)$$. Essentially, when we were sampling $$\boldsymbol{\mathbf{x}}_t^{(i)}$$, we were sampling each component $$x_1^{(i)}, \ldots, x_t^{(i)}$$ from time $$1$$ to $$t$$. In Sequential Importance Sampling, instead, at each time step $$t$$ we are sampling $$x_t^{(1)}, \ldots, x_t^{(N)}$$ from $$q_t(x_t \mid \boldsymbol{\mathbf{x}}_{t-1})$$, and append these values to $$\boldsymbol{\mathbf{x}}_{t-1}^{(1)},\ldots, \boldsymbol{\mathbf{x}}_{t-1}^{(N)}$$. In other words, for each sample $$i$$ we are sampling only the $$t^{\text{th}}$$ component $$x_t^{(i)}$$ rather than the whole history.
• Importance weights are also computed recursively. \begin{align} \widetilde{w}_t(\boldsymbol{\mathbf{x}}_t^{(i)}) &= \frac{\widetilde{\gamma}_t(\boldsymbol{\mathbf{x}}_t^{(i)})}{q_t(\boldsymbol{\mathbf{x}}_t^{(i)})} \\ &= \frac{\widetilde{\gamma}_t(\boldsymbol{\mathbf{x}}_t^{(i)})}{q_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)}) q_t(x_t^{(i)}\mid \boldsymbol{\mathbf{x}}_{t-1}^{(i)})} && \text{Def of conditional probability}\\ &= \frac{\widetilde{\gamma}_t(\boldsymbol{\mathbf{x}}_t^{(i)})}{q_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)}) q_t(x_t^{(i)}\mid \boldsymbol{\mathbf{x}}_{t-1}^{(i)})} \cdot \frac{\widetilde{\gamma}_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)})}{\widetilde{\gamma}_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)})} && \text{Multiplying by } 1\\ &= \frac{\widetilde{\gamma}_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)})}{q_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)})}\frac{\widetilde{\gamma}_t(\boldsymbol{\mathbf{x}}_t^{(i)})}{\widetilde{\gamma}_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)})q_t(x_t^{(i)}\mid \boldsymbol{\mathbf{x}}_{t-1}^{(i)})} && \text{Rearranging terms} \\ &= \widetilde{w}_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)}) \cdot \frac{\widetilde{\gamma}_t(\boldsymbol{\mathbf{x}}_t^{(i)})}{\widetilde{\gamma}_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)})q_t(x_t^{(i)}\mid \boldsymbol{\mathbf{x}}_{t-1}^{(i)})} && \text{Def of } \widetilde{w}_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)}) \end{align}

Basically to obtain the next set of weights $$\widetilde{w}_t(\boldsymbol{\mathbf{x}}_t^{(i)})$$ we “extend” the posterior in the numerator to include $$x_t^{(i)}$$ by multiplying the previous weight by $$\frac{\widetilde{\gamma}_t(\boldsymbol{\mathbf{x}}_t^{(i)})}{\widetilde{\gamma}_{t-1}(\boldsymbol{\mathbf{x}}_{t-1}^{(i)})}$$, and we “move” the importance distribution on the denominagor one step ahead by multiplying it by $$q_t(x_t^{(i)}\mid \boldsymbol{\mathbf{x}}_{t-1}^{(i)})$$.

SIS Issue: One issue with Sequential Importance Sampling is that in practice as $$t$$ grows, all normalized weights tend to $$0$$ except for one large weight which tends to $$1$$. In these cases then the approximation is quite poor because it is essentially approximated using one sample (i.e. the sample of the non-degenerate weight). This effect is known as weight degeneracy. This issue is solve by Sequential Monte Carlo (SMC). ##### Mauro Camara Escudero
###### Computational Statistics and Data Science Ph.D.

My research interests include approximate manifold sampling and generative models.