Measure-Theoretic Probability
Probability Spaces
Intuitively, a probability is a quantitative measure of the likelihood of an event happening. But what is an event? We will consider events only sets that are well behaved so that we can assign probabilities to them in a coherent fashion. All events under consideration in a given set up can be grouped together into a set called a “sigma algebra”.
Sigma Algebra
Let be a non-empty set. We call a sigma algebra on if satisfies:
- .
- It is closed under complementation:
- It is closed under countable unions:
Any set we say it is -measurable. If is the set of all possible outcomes, we say that is the space of possible events or the set of measurable events. In other words, a sigma algebra contains sets to which we can assign a size or, more precisely in our case, a probability.
We can now proceed to defining how one might go about assigning a probability to a set. The most obvious way to do this is with a function with some additional structure.
Measure
Let be a sigma-algebra over the set . A function is called a measure if it satisfies:
- Non-negativity
- Null empty set
- Countable additivity: If is a countable collection of pairwise disjoint sets in then
Hence assigns a “size” to elements of a sigma algebra. In the particular case in which we say that is a probability measure. Indeed if is the space of all outcomes and is defined on then simply means that the probability of anything happening at all is , which is in accordance with out usual understanding of probability.
Notice that in the definition of measure we allow to take . This would cause troubles in the world of probability so we give a name to those measures that assign finite size to . If then we say that is a sigma-finite measure. Importantly, any sigma-finite measure is essentially equivalent (up to rescaling) to a probability measure since one can simply define
which is obviously a probability measure.
The pair is called a measurable space while the triplet is called a measure space. When is a probability measure defined on then we say is a probability space.
We can write the size of a set as an integral as follows
Densities and Distributions
Right now we can measure sets but what we need is to be able to measure sets according to our problem at hand, which will be different every time. This is where the concept of a random variable will come to the rescue. A random variable essentially is a well-behaved function that takes an outcome and maps it to a value that is coherent with our problem. For instance, if our outcomes are either head or tails then the random variable might assign equal probabilities to each of them in fair coin and .
Random Variable
Let and be two measurable spaces. A function is a random variable if for every -measurable set , its pre-image is -measurable, that is .
The pre-image is then defined as the set of all outcomes that are mapped onto the set by the random variable
A random variable is also called a measurable function.
Distribution
Let be a probability space and a measurable space. Let be a random variable between and . The probability distribution of , denoted by , is the pushforward measure of by
To specify which measure is being pushed-forward and by which measurable function one can write . The distribution is a probability measure on .
It is important to notice how this distribution is constructed. Originally given a probability measure that could only assign probabilities to . We have then introduced a random variable to be able to map outcomes to values via according to our application. Because of this, we would now like to give a probability to sets since these are the events we are actually interested in. To give a measure to the set we basically find its pre-image using the random variable (which by definition is -measurable) and then we assign to the same probability that has according to
We have seen that the distribution of a random variable is a measure. However, in practice we often don’t work directly with distributions but with probability density functions. The best way to develop an intuition about the relationship between a distribution and a probability density function is through a phisical analogy. One could interpret the “base” probability measure to somehow give us an indication of volume, while we could interpret the probability measure as giving an indication of “mass”. In this analogy, the probability density function is basically giving us the density of the random variable with respect to and
Density
Let be a probability space and be a measurable space. Let be a random variable between and with probability distribution as the pushforward measure . Suppose also that is another measure, called the reference measure, on and that both and are sigma-finite measures. Finally, suppose that the probability distribution is absolutely continuous with respect to the reference measure . Then the probability density function is defined as the Radon-Nikodym derivative of with respect to
Then we can use the probability density function to write the measure of a set according to based on
Most often will be the Lebesgue measure (here denoted ) and therefore we can write the probability of event according to the distribution in the more familiar form
Expectations
Expected Value
Let be a probability space and be a measurable space. Let be a random variable. The expected value of is
Basically it is the average value of the measurable function according to , the measure defined on the “input” measurable space .
The following is an extremely useful trick that is often used to manipulate integrals in measure-theory and will allow us to write the expected value in a much more familiar form.
Change of Variables
Let be a probability space and be a measurable space. Let be a random variable with distribution . Let be a measurable function. Then
If you look at the second and at the last integral you can notice that the change of variables formula essentially amounts to a change of probability spaces between and .
Notice also that the change of variables formula can be written as
Indeed the change of variables formula allow us to write down the definition of expected value in the a more useful way. First of all, let be the identity function on mapping for every . Then the function is measurable and so it is a random variable and indeed it identical to the random variable as . We can now write the expectation as follows
Where we have assumed that we have access to a measure on such that , i.e. is absolutely continuous with respect to . Alternatively we can also say that dominates or is a dominating measure for .
One can generalize the result above to any measurable function. This result is sometimes called Law of the Unconscious Statistician.
Expectation of a Function