Sequential Monte Carlo Samplers

Problem Set-Up

This is all taken from Sequential Monte Carlo samplers. We have a collection of target distributions πn(x)=γn(x)Zn and we would like to sample from them sequentially in order to approximate expectations.

Importance Sampling

We write target expectations using the Importance Sampling (IS) trick for a proposal density ηn Eπn[ϕ]=Eϕ(x)πn(x)dx=1ZnEϕ(x)γn(x)dx=1ZnEϕ(x)wn(x)ηn(x)dx Zn=Eγn(x)dx=Ewn(x)ηn(x)dx where we have defined importance weights as wn(x)=γn(x)ηn(x) Therefore importance sampling uses the following particle approximation ηnN(dx)=1Ni=1NδXn(i)(dx) Plugging this into the two expressions above we obtain Eπn[ϕ]=Eηn[ϕwn]Eηn[wn]EηnN[ϕwn]EηnN[wn]=i=1Nϕ(Xn(i))wn(Xn(i))i=1Nwn(Xn(i))=i=1Nϕ(Xn(i))Wn(Xn(i)) where we have defined the normalized importance weights Wn(Xn(i))=wn(Xn(i))j=1Nwn(Xn(j))

Sequential Importance Sampling

In importance sampling, for each different target πn we would sample the particles afresh from ηn. This assumes that we can sample from ηnπn and that we can evaluate ηn in order to evaluate the unnormalized IS weights wn(x)=γn(x)ηn(x) In Sequential Importance Sampling (SIS) we start using IS at n=1 but then we build ηn from the previous iteration. Specifically we do this:

  • At time n=1 our target is π1 and we use an IS proposal η1 which we choose to approximate well π1 (often we choose η1=π1). This means we sample particles X1(1:N) from η1 and then compute the IS unnormalized weights w1(X1(i))=γ1(X1(i))η1(X1(i))
  • Suppose that at time n1 we had a set of particles {Xn1(i)} sampled from ηn1. Our target at time n is πn. In order to propose a new set of particles we use a Markov Kernel Kn(xx). We call the resulting distribution ηn. Notice that this distribution can be found using the property that a kernel Kn operates on measures on the left ηn=ηn1Knηn(x)=Eηn1(x)Kn(xx)dx Once we have sampled from the kernel to move the particles forward Xn(i)Kn(Xn1(i)), we need to compute the weights to account for the discrepancy of sampling from ηn rather than πn wn(Xn(i))=γn(Xn(i))ηn(Xn(i)) However notice we can only do this if we can evaluate ηn.

In general, we cannot evaluate ηn because it is defined in terms of an integral with respect to x1:n1. Indeed consider η3

η3(x3)=Eη2(x2)K3(x3x2)dx2=E[Eη1(x1)K2(x2x1)dx1]K3(x3x2)dx2=E2η1(x1)K3(x3x2)K2(x2x1)dx1dx2

In general ηn will be ηn(xn)=En1η1(x1)k=1nKk(xkxk1)dx1:k1 which is clearly intractable.

SMC sampler

Since the problem is integration with respect to x1:k1 we “open up” the integral and instead consider its integrand. Rather than considering ηn(xn) which proposes a new set of particles {Xn(i)} from {Xn1(i)} we consider the proposal distribution ηn(x1:n) defined as ηn(x1:n)=η1(x1)k=2nKk(xkxk1) We would now like to perform importance sampling. To do this, we need to extend the target from πn(xn) to π~n(x1:n). We do this by introducing backward kernels Ln1(xn1xn) π~n(x1:n)=1Znγ~n(x1:n)=1Znγn(xn)k=1n1Lk(xkxk+1) The IS weights would then become wn(x1:n)=γ~n(x1:n)ηn(x1:n)=γn(xn)k=1n1Lk(xkxk+1)η1(x1)k=2nKk(xkxk1)=γn(xn)Ln1(xn1xn)Ln2(xn2xn1)L1(x1x2)η1(x1)Kn(xnxn1)Kn1(xn1xn2)K2(x2x1)=γn(xn)Ln1(xn1xn)γn1(xn1)Kn(xnxn1)γn1(xn1)Ln2(xn2xn1)L1(x1x2)η1(x1)Kn1(xn1xn2)K2(x2x1)=γn(xn)Ln1(xn1xn)γn1(xn1)Kn(xnxn1)wn1(x1:n1)=w~n(xnxn1)wn1(x1:n1)

where we have defined the incremental weight as w~n(xnxn1)=γn(xn)Ln1(xn1xn)γn1(xn1)Kn(xnxn1)

To summarize:

  • Importance Sampling at time n targets πn. It samples particles {Xn(i)} afresh from a proposal ηn and computes weights afresh as wn(Xn(i))=γn(Xn(i))/ηn(Xn(i)). For this to work, however, we need to be able to find proposals ηnπn which is in general very hard.
  • Sequential Importance Sampling also targets πn at time n. It tries to fix the problem of finding ηn by using a local Markov Kernel Kn(Xn1(i)) to sample a new set of particles {Xn(i)} starting from {Xn1(i)}. This, at time n, gives rise to the following proposal distributions ηn(xn)=En1η1(x1)k=2nKk(xkxk1)dx1:k1 We can now sample from ηn(xn) but we cannot evaluate ηn() due to the integral with respect to x1:k1. Evaluating ηn is needed to compute the IS weights wn(Xn(i))=γn(Xn(i))ηn(xn)
  • SMC Samplers overcomes the problem of integrating over x1:k1 by working with the integrand directly. The proposal and the target distributions are then ηn(x1:n) and π~n(x1:n). Notice the difference with respect to IS and SIS: In IS and SIS we get new particles at each time step, that is at time step n1 we have Xn1(1:N) and at time step n we have Xn(1:N). In an SMC sampler, instead, we extend the particles at time n1 X1:n1(1:N) by sampling from a kernel Xn(i)Kn(Xn1(i)) and then appending this to the current particles to obtain X1:n(1:N). Since we have appended Xn(1:N) to the previous particles, we need to update the weights and these are updated using an incremental weight w~n(xnxn1)=γn(xn)Ln1(xn1xn)γn1(xn1)Kn(xnxn1) Importantly, this requires us to introduce backwards kernels which essentially allow us to approach the problem from an auxiliary variable perspective.

Since the variance of the weights increases as ηn and π~n become further apart, one often resamples the particles according to π~nN(dx1:n)=i=1NWn(i)δX1:n(i)(dx1:n)

The algorithm is summarized below.

SMC sampler - Del Moral 2006

A few notes:

  • The particle estimate of the nth target is πnN(dx)=i=1NWn(X1:n(i))δXn(i)(dx)
  • It is helpful to remember the distributions of Xn(i) and X1:n(i) (using sloppy notation) Xn(i)En1η1(x1)k=2nKk(xkxk1)dx1:k1X1:n(i)η1(x1)k=2nKk(xkxk1)
  • The optimal backward kernel takes us back to IS on E Ln1opt(xn1xn)=ηn1(xn1)Kn(xnxn1)ηn(xn) It is difficult to use this kernel as it relies on ηn1 and ηn which are intractable (indeed it is the reason why we went from SIS to SMC samplers).
  • Sub-optimal kernel: substitute πn for ηn. This is motivated by the fact that, if ηn is a good proposal for πn then they should be sufficiently close. First, rewrite the optimal kernel Ln1opt(xn1xn)=ηn1(xn1)Kn(xnxn1)ηn(xn)=ηn1(xn1)Kn(xnxn1)Eηn1(xn1)Kn(xnxn1)dxn1 Now substitute πn and πn1 for ηn and ηn1 respectively. Ln1(xn1xn)=πn1(xn1)Kn(xnxn1)Eπn1(xn1)Kn(xnxn1)dxn1 The incremental weights become wn(xnxn1)=γn(xn)Ln1(xn1xn)γn1(xn1)Kn(xnxn1)=γn(xn)πn1(xn1)Kn(xnxn1)γn1(xn1)Kn(xnxn1)Eπn1(xn1)Kn(xnxn1)dxn1

IS Measure Theory

Suppose π1:E[0,1] is our target probability distribution and η1:E[0,1] is our IS proposal probability distribution. Suppose that they admit density with respect to the Lebesgue measure dx on (E,E) dπ1dx=p~π1(x)Ep~π1(y)dy=pπ1(x)dη1dx(x)=p~η1(x)Ep~η1(y)dy=pη1(x) Suppose that π1η1 then the Radon-Nikodym derivative of π1 with respect to η1 exists dπ1dη1=w1(x)Ew1(x)dη1(x) We can the approximate the expectation as follows Eπ1[ϕ]=Eϕ(x)dπ1(x)=Eϕ(x)dπ1dη1(x)dη1(x)=Eϕ(x)dπ1dη1(x)dη1dx(x)dx=Eϕ(x)w1(x)pη1(x)dxEw1(x)dη1(x)=Eϕ(x)w1(x)pη1(x)dxEw1(x)dη1dx(x)dx=Eϕ(x)w1(x)pη1(x)dxEw1(x)pη1(x)dx

Now suppose you have samples {X1(1:N)}η1 then we can construct the particle approximation η1N(dx)=1Ni=1NδX1(i)(dx) Substituting this into the expression for the expectation we get Eπ1[ϕ]=Eη1[ϕw1]Eη1[w1]Eη1N[ϕw1]Eη1N[w1]=1Ni=1NEϕ(x)w1(x)δX1(i)(dx)1Ni=1NEw1(x)δX1(i)(dx)=i=1Nϕ(X1(i))w1(X1(i))i=1Nw1(X1(i))=i=1Nϕ(X1(i))W1(X1(i)) where W1(X1(i))=w1(X1(i))j=1Nw1(X1(j))

SIS Proposal

Now let Kn:E×E[0,1] be a Markov Kernel. Then this kernels can operate on the left with measures. ηn=ηn1Knηn(A)=EKn(x,A)dηn1(x)=Edηn1(x)AKn(x,dy) Denote by Kn,x:E[0,1] the probability measure in the second argument of the kernel, i.e. Kn,x(A)=Kn(x,A) Suppose that Kn,x admits a density with respect to the Lebesgue measure dKn,xdy(y)=kn(yx) Then the new measure can be written as ηn=Edηn1dx[AKn(x,dy)]dxηn1dx=Edηn1dx[AdKn,x(y)]dxdef Kn,x=Epηn1(x)[AdKn,x(y)dydy]dxKn,xdy=Epηn1(x)[Akn(yx)dy]dx=EApηn1(x)kn(yx)dydx=A[Epηn1(x)kn(yx)dx]dy Then by definition we must have that the expression in parenthesis is indeed the density of ηn with respect to dy dηndy=Epηn1(x)kn(yx)dx Or more precisely, identifying x=xn1 and y=xn we have pηn(xn)=dηndxn(xn)=Epηn1(xn1)kn(xnxn1)dxn1 Indeed we have ηn(A)=Adηn(xn)=Adηndxndxn=Apηn(xn)dxn

SMC Proposal

The proposal in the SMC sampler is not given by ηn=ηn1Kn. Indeed in SMC we don’t simply propose Xn(i). We propose Xn(i) and then we append it to X1:n1(i) which was generated according to ηn1 and so on.

SMC Steps

  • Step n=1: Our target is π1 and proposal η1 is given. Sample X1(1:N)η1. Weights are RN-derivative w1dπ1dη1
  • Step n=2: Move particles forward using kernel K2:E×E[0,1]. We sample X2(i)K2(X1(i)). Marginally, each new particle Xn(i) is distributed as X2(i)η2=η1K2 Which can be written as η2(A)=Edη2=Ed(η1K2)=EK2(x1,A)dη1(x1)=EAK2(x1,dx2)η1(x1)=E[AdK2(x1,)dx2dx2]dη1(x1)dx1dx1=E[Ak2(x2x1)dx2]pη1(x1)dx1=EAk2(x2x1)pη1(x1)dx2dx1=A[Ek2(x2x1)pη1(x1)dx1]dx2 where we can see that the density of η2 is dη2dx2(x2)=Ek2(x2x1)pη1(x1)dx1. In SMC we then append X2(i) to X1(i) to get X1:2(i) so our aim is now to find a measure for it.

Define η1:2:=η×K(x1,) to be the following product measure on EE η1:2(A×B):=(η1×K(x1,))(A×B)=η1(A)K(x1,B)A×BEEx1E Since η1dx1 and K(x1,)dx2 by a standard result (see here and here) on product measures we have that η1:2d(x1×x2) and dη1:2d(x1×x2)(x1:2)=d(η1×K(x1,))d(x1×x2)(x1:2)=dη1dx1(x1)dK(x1,)dx2(x2)=pη1(x1)k2(x2x1) If we also define the extended target as the following product measure π1:2(A×B)=(L1(x2,)×π2)(A,B)=L1(x2,A)π2(B) then by the same arguments as above its Radon-Nikodym derivative will be given by dπ1:2d(x1×x2)(x1:2)=d(L1(x2,)×π2)d(x1×x2)(x1:2)=dL1(x2,)dx1(x1)dπ2dx2(x2)=(x1x2)pπ2(x2) where is the density of L with respect to x1. As long as π1:2η1:2 the weights are given by w1:2(x1:2)=dπ1:2dη1:2(x1:2)=dπ1:2d(x1×x2)d(x1×x2)dη1:2(x1:2)chain rule=dπ1:2d(x1×x2)(dη1:2d(x1×x2))1(x1:2)see below=pπ2(x2)(x1x2)pη1(x1)k2(x2x1)=pπ2(x2)(x1x2)pπ1(x1)k2(x2x1)pπ1(x1)pη1(x1)p~π2(x2)(x1x2)p~π1(x1)k2(x2x1)p~π1(x1)p~η1(x1) where on line 3 we have used the following fact and on line 5 we have basically multiplied by 1=dπ1dπ1=dπ1dx1(dπ1dx1)1

  • Step n: Target is π1:n. Perform importance sampling using η1:n which is a product measure η1:n=η1×Kn(xn1,)××K2(x1,) with density given by the product rule dη1:ndx1:n=pη1(x1)k=2nkk(xkxk1) Similarly the extended target is a product measure π1:n=πn×Ln1(xn,)××L1(x1,) with density dπ1:ndx1:n=pπn(xn)k=1n1k(xkxk+1) The weights are then given by w1:n=dπ1:ndη1:np~πn(xn)(xn1xn)p~πn1(xn1)kn(xnxn1)w1:n1
Previous
Next