Towards SMC: Sequential Importance Sampling

Sequential Importance Sampling tutorial for Sequential Monte Carlo (SMC)

Review of Importance Sampling for Sequential Data

At time t=1 we receive data x1, and at time t>1 we receive data xt. Let xt=(x1,,xt). Suppose that at each time t our aim is to do inference based on the current posterior distribution γt(xt). Such inference could, for instance, be to approximate the current posterior expectation of a function of h(xt), i.e. Eγt(xt)[h(xt)]. Importance sampling works as follows:

  • Sample xt(i) from an importance distribution qt(xt) for i=1,,N.
  • Compute the unnormalized importance weights and normalize them, to find the normalized importance weights w~t(xt(i))=γt~(xt(i))qt(x(i))andwt(xt(i))=w~t(xt(i))j=1Nw~t(xt(i))for i=1,,N
  • Use the importance weghts to approximate the expectation. Eγt(xt)[h(xt)]i=1Nwt(xt(i))h(xt(i))

Sequential Importance Sampling

Sequential Importance Sampling has two main differences with respect to Importance Sampling for sequential data.

  • Importance distribution is autoregressive: qt(xt)=qt1(x1:t1)Importance Distributionat time t1qt(xtx1:t1)
  • Samples at time t are found recursively using the samples at time t1. Previously, at each time t we were sampling xt(1),,xt(N) from q(xt)=qt(x1,,xt). Essentially, when we were sampling xt(i), we were sampling each component x1(i),,xt(i) from time 1 to t. In Sequential Importance Sampling, instead, at each time step t we are sampling xt(1),,xt(N) from qt(xtxt1), and append these values to xt1(1),,xt1(N). In other words, for each sample i we are sampling only the tth component xt(i) rather than the whole history.
  • Importance weights are also computed recursively. w~t(xt(i))=γ~t(xt(i))qt(xt(i))=γ~t(xt(i))qt1(xt1(i))qt(xt(i)xt1(i))Def of conditional probability=γ~t(xt(i))qt1(xt1(i))qt(xt(i)xt1(i))γ~t1(xt1(i))γ~t1(xt1(i))Multiplying by 1=γ~t1(xt1(i))qt1(xt1(i))γ~t(xt(i))γ~t1(xt1(i))qt(xt(i)xt1(i))Rearranging terms=w~t1(xt1(i))γ~t(xt(i))γ~t1(xt1(i))qt(xt(i)xt1(i))Def of w~t1(xt1(i))

Basically to obtain the next set of weights w~t(xt(i)) we “extend” the posterior in the numerator to include xt(i) by multiplying the previous weight by γ~t(xt(i))γ~t1(xt1(i)), and we “move” the importance distribution on the denominagor one step ahead by multiplying it by qt(xt(i)xt1(i)).

SIS Issue: One issue with Sequential Importance Sampling is that in practice as t grows, all normalized weights tend to 0 except for one large weight which tends to 1. In these cases then the approximation is quite poor because it is essentially approximated using one sample (i.e. the sample of the non-degenerate weight). This effect is known as weight degeneracy. This issue is solve by Sequential Monte Carlo (SMC).

Avatar
Mauro Camara Escudero
Machine Learning Engineer

My research interests include approximate manifold sampling and generative models.

Related