Theory

Measure-Theoretic Probability

Probability Spaces

Intuitively, a probability is a quantitative measure of the likelihood of an event happening. But what is an event? We will consider events only sets that are well behaved so that we can assign probabilities to them in a coherent fashion. All events under consideration in a given set up can be grouped together into a set called a “sigma algebra”.

Sigma Algebra

Let X be a non-empty set. We call XP(X) a sigma algebra on X if satisfies:

  • XX.
  • It is closed under complementation: AXAc:=XAX
  • It is closed under countable unions: A1,A2,XiAiX

Any set AX we say it is X-measurable. If X is the set of all possible outcomes, we say that X is the space of possible events or the set of measurable events. In other words, a sigma algebra contains sets to which we can assign a size or, more precisely in our case, a probability.

We can now proceed to defining how one might go about assigning a probability to a set. The most obvious way to do this is with a function with some additional structure.

Measure

Let X be a sigma-algebra over the set X. A function μ:XR{±} is called a measure if it satisfies:

  • Non-negativity μ(E)0EX
  • Null empty set μ()=0
  • Countable additivity: If {Ei} is a countable collection of pairwise disjoint sets in X then μ(iEi)=iμ(Ei)

Hence μ assigns a “size” to elements of a sigma algebra. In the particular case in which μ(X)=1 we say that μ is a probability measure. Indeed if X is the space of all outcomes and μ is defined on (X,X) then μ(X)=1 simply means that the probability of anything happening at all is 1, which is in accordance with out usual understanding of probability.

Notice that in the definition of measure we allow μ to take ±. This would cause troubles in the world of probability so we give a name to those measures that assign finite size to X. If μ(X)< then we say that μ is a sigma-finite measure. Importantly, any sigma-finite measure μ is essentially equivalent (up to rescaling) to a probability measure since one can simply define μ˚=1μ(X)μ which is obviously a probability measure.

The pair (X,X) is called a measurable space while the triplet (X,X,μ) is called a measure space. When μ is a probability measure defined on (X,X) then we say (X,X,μ) is a probability space.

We can write the size of a set AX as an integral as follows μ(A)=Adμ

Densities and Distributions

Right now we can measure sets but what we need is to be able to measure sets according to our problem at hand, which will be different every time. This is where the concept of a random variable will come to the rescue. A random variable essentially is a well-behaved function that takes an outcome xX and maps it to a value yY that is coherent with our problem. For instance, if our outcomes are either head or tails X={Head,Tail} then the random variable might assign equal probabilities to each of them in fair coin Z(Head)=0.5 and Z(Tail)=0.5.

Random Variable

Let (X,X) and (Y,Y) be two measurable spaces. A function Z:XY is a random variable if for every Y-measurable set BY, its pre-image Z1(B) is X-measurable, that is Z1(B)X. Z1(B)XBY The pre-image is then defined as the set of all outcomes xX that are mapped onto the set BY by the random variable Z Z1(B):={xX:Z(x)B} A random variable is also called a measurable function.

Distribution

Let (X,X,μ) be a probability space and (Y,Y) a measurable space. Let Z:XY be a random variable between (X,X) and (Y,Y). The probability distribution of X, denoted by PZ, is the pushforward measure of μ by Z PZ=μZ1PZ:Y[0,1] To specify which measure is being pushed-forward and by which measurable function one can write PZ=Zμ. The distribution PZ is a probability measure on (Y,Y).

It is important to notice how this distribution is constructed. Originally given a probability measure μ that could only assign probabilities to AX. We have then introduced a random variable to be able to map outcomes to values via Z according to our application. Because of this, we would now like to give a probability to sets BY since these are the events we are actually interested in. To give a measure to the set BY we basically find its pre-image using the random variable (which by definition is X-measurable) and then we assign to B the same probability that Z1(B) has according to μ PZ(B)=μZ1(B)

We have seen that the distribution of a random variable is a measure. However, in practice we often don’t work directly with distributions but with probability density functions. The best way to develop an intuition about the relationship between a distribution and a probability density function is through a phisical analogy. One could interpret the “base” probability measure μ to somehow give us an indication of volume, while we could interpret the probability measure Px as giving an indication of “mass”. In this analogy, the probability density function is basically giving us the density of the random variable Z with respect to μ and PZ density=massvolume

Density

Let (X,X,μ) be a probability space and (Y,Y) be a measurable space. Let Z:XY be a random variable between (X,X) and (Y,Y) with probability distribution PZ as the pushforward measure PZ=Zμ:Y[0,1]. Suppose also that λ is another measure, called the reference measure, on (Y,Y) and that both PZ and λ are sigma-finite measures. Finally, suppose that the probability distribution is absolutely continuous with respect to the reference measure PZλ. Then the probability density function Z is defined as the Radon-Nikodym derivative of PZ with respect to λ pz=dPZdλ Then we can use the probability density function to write the measure of a set AX according to PZ based on μ PZ(A)=AdPZ=AdPZdλdλ=Apzdλ Most often λ will be the Lebesgue measure (here denoted dz) and therefore we can write the probability of event A according to the distribution PZ in the more familiar form PZ(A)=Apz(z)dz

Expectations

Expected Value

Let (X,X,μ) be a probability space and (Y,Y) be a measurable space. Let Z:XY be a random variable. The expected value of Z is E[Z]=XZdμ=XZ(x)dμ(x) Basically it is the average value of the measurable function Z according to μ, the measure defined on the “input” measurable space (X,X).

The following is an extremely useful trick that is often used to manipulate integrals in measure-theory and will allow us to write the expected value in a much more familiar form.

Change of Variables

Let (X,X,μ) be a probability space and (Y,Y) be a measurable space. Let Z:XY be a random variable with distribution PZ=Zμ. Let f:Y[0,+] be a measurable function. Then YfdPZ=YfdZμ=Z1(Y)fZdμ=XfZdμ If you look at the second and at the last integral you can notice that the change of variables formula essentially amounts to a change of probability spaces between (Y,Y,Zμ) and (X,X,μ).

Notice also that the change of variables formula can be written as EZμ[f]=Eμ[fZ] Indeed the change of variables formula allow us to write down the definition of expected value in the a more useful way. First of all, let Id:YY be the identity function on Y mapping Id(y)=y for every yY. Then the function Id:XY is measurable and so it is a random variable and indeed it identical to the random variable as Z. We can now write the expectation as follows E[Z]=E[IdZ]IdZ=Z=XIdZdμDef of Expectation=Z(X)IddZμChange of Variables=YId(y)dPz(y)Def of PZ=YydPZdydyDef Id and PZdy=Yypz(y)dyRadon-Nikodym

Where we have assumed that we have access to a measure dy on (Y,Y) such that PZdy, i.e. PZ is absolutely continuous with respect to dy. Alternatively we can also say that dy dominates or is a dominating measure for PZ.

One can generalize the result above to any measurable function. This result is sometimes called Law of the Unconscious Statistician.

Expectation of a Function

Previous