Towards SMC: Sequential Importance Sampling

Sequential Importance Sampling tutorial for Sequential Monte Carlo (SMC)

Last updated on May 19, 2020 2 min read sequential-importance-sampling, sequential-monte-carlo, sequential-data

Review of Importance Sampling for Sequential Data

At time $t = 1$ we receive data $x_{1}$ , and at time $t > 1$ we receive data $x_{t}$ . Let $x_{t} = (x_{1}, \dots, x_{t})$ . Suppose that at each time $t$ our aim is to do inference based on the current posterior distribution $γ_{t} (x_{t})$ . Such inference could, for instance, be to approximate the current posterior expectation of a function of $h (x_{t})$ , i.e. $E_{γ_{t} (x_{t})} [h (x_{t})]$ . Importance sampling works as follows:

Sample $x_{t}^{(i)}$ from an importance distribution $q_{t} (x_{t})$ for $i = 1, \dots, N$ .
Compute the unnormalized importance weights and normalize them, to find the normalized importance weights ${\tilde{w}}_{t} (x_{t}^{(i)}) = \frac{\tilde{γ_{t}} (x_{t}^{(i)})}{q_{t} (x^{(i)})} and w_{t} (x_{t}^{(i)}) = \frac{{\tilde{w}}_{t} (x_{t}^{(i)})}{\sum_{j = 1}^{N} {\tilde{w}}_{t} (x_{t}^{(i)})} for i = 1, \dots, N$
Use the importance weghts to approximate the expectation. $E_{γ_{t} (x_{t})} [h (x_{t})] \approx \sum_{i = 1}^{N} w_{t} (x_{t}^{(i)}) h (x_{t}^{(i)})$

Sequential Importance Sampling

Sequential Importance Sampling has two main differences with respect to Importance Sampling for sequential data.

Importance distribution is autoregressive: $q_{t} (x_{t}) = \underset{\begin{matrix} Importance \\ Distribution \\ at time t - 1 \end{matrix}}{\underset{⏟}{q_{t - 1} (x_{1 : t - 1})}} q_{t} (x_{t} ∣ x_{1 : t - 1})$
Samples at time $t$ are found recursively using the samples at time $t - 1$ . Previously, at each time $t$ we were sampling $x_{t}^{(1)}, \dots, x_{t}^{(N)}$ from $q (x_{t}) = q_{t} (x_{1}, \dots, x_{t})$ . Essentially, when we were sampling $x_{t}^{(i)}$ , we were sampling each component $x_{1}^{(i)}, \dots, x_{t}^{(i)}$ from time $1$ to $t$ . In Sequential Importance Sampling, instead, at each time step $t$ we are sampling $x_{t}^{(1)}, \dots, x_{t}^{(N)}$ from $q_{t} (x_{t} ∣ x_{t - 1})$ , and append these values to $x_{t - 1}^{(1)}, \dots, x_{t - 1}^{(N)}$ . In other words, for each sample $i$ we are sampling only the $t^{th}$ component $x_{t}^{(i)}$ rather than the whole history.
Importance weights are also computed recursively. $\begin{aligned} {\tilde{w}}_{t} (x_{t}^{(i)}) & = \frac{{\tilde{γ}}_{t} (x_{t}^{(i)})}{q_{t} (x_{t}^{(i)})} \\ = \frac{{\tilde{γ}}_{t} (x_{t}^{(i)})}{q_{t - 1} (x_{t - 1}^{(i)}) q_{t} (x_{t}^{(i)} ∣ x_{t - 1}^{(i)})} & Def of conditional probability \\ = \frac{{\tilde{γ}}_{t} (x_{t}^{(i)})}{q_{t - 1} (x_{t - 1}^{(i)}) q_{t} (x_{t}^{(i)} ∣ x_{t - 1}^{(i)})} \cdot \frac{{\tilde{γ}}_{t - 1} (x_{t - 1}^{(i)})}{{\tilde{γ}}_{t - 1} (x_{t - 1}^{(i)})} & Multiplying by 1 \\ = \frac{{\tilde{γ}}_{t - 1} (x_{t - 1}^{(i)})}{q_{t - 1} (x_{t - 1}^{(i)})} \frac{{\tilde{γ}}_{t} (x_{t}^{(i)})}{{\tilde{γ}}_{t - 1} (x_{t - 1}^{(i)}) q_{t} (x_{t}^{(i)} ∣ x_{t - 1}^{(i)})} & Rearranging terms \\ = {\tilde{w}}_{t - 1} (x_{t - 1}^{(i)}) \cdot \frac{{\tilde{γ}}_{t} (x_{t}^{(i)})}{{\tilde{γ}}_{t - 1} (x_{t - 1}^{(i)}) q_{t} (x_{t}^{(i)} ∣ x_{t - 1}^{(i)})} & Def of {\tilde{w}}_{t - 1} (x_{t - 1}^{(i)}) \end{aligned}$

Basically to obtain the next set of weights ${\tilde{w}}_{t} (x_{t}^{(i)})$ we “extend” the posterior in the numerator to include $x_{t}^{(i)}$ by multiplying the previous weight by $\frac{{\tilde{γ}}_{t} (x_{t}^{(i)})}{{\tilde{γ}}_{t - 1} (x_{t - 1}^{(i)})}$ , and we “move” the importance distribution on the denominagor one step ahead by multiplying it by $q_{t} (x_{t}^{(i)} ∣ x_{t - 1}^{(i)})$ .

SIS Issue: One issue with Sequential Importance Sampling is that in practice as $t$ grows, all normalized weights tend to $0$ except for one large weight which tends to $1$ . In these cases then the approximation is quite poor because it is essentially approximated using one sample (i.e. the sample of the non-degenerate weight). This effect is known as weight degeneracy. This issue is solve by Sequential Monte Carlo (SMC).

sequential-data sequential-monte-carlo sequential-importance-sampling

Towards SMC: Sequential Importance Sampling

Review of Importance Sampling for Sequential Data

Sequential Importance Sampling

Mauro Camara Escudero

Machine Learning Engineer

Related