Summary

By now, we hope that you have become familiar with our slogan probability is a measure of the size of a set. Let us summarize:

sep0em
Probability = a probability law \(\Pb\). You can also view it as the value returned by \(\Pb\).
Measure = a ruler, a scale, a stopwatch, or another measuring device. It is a tool that tells you how large or small a set is. The measure has to be compatible with the set. If a set is finite, then the measure can be a counter. If a set is a continuous interval, then the measure can be the length of the interval.
Size = the relative weight of the set for the sample space. Measuring the size is done by using a weighting function. Think of a fair coin versus a biased coin. The former has a uniform weight, whereas the latter has a nonuniform weight.
Set = an event. An event is a subset in the sample space. A probability law \(\Pb\) always maps a set to a number. This is different from a typical function that maps a number to another number.

If you understand what this slogan means, you will understand why probability can be applied to discrete events, continuous events, events in \(n\)-D spaces, etc. You will also understand the notion of measure zero and the notion of almost sure. These concepts lie at the foundation of modern data science, in particular, theoretical machine learning.

The second half of this chapter discusses the concept of conditional probability. Conditional probability is a metaconcept that can be applied to any measure you use. The motivation of conditional probability is to restrict the probability to a subevent happening in the sample space. If \(B\) has happened, the probability for \(A\) to also happen is \(\Pb[A\cap B]/\Pb[B]\). If two events are not influencing each other, then we say that \(A\) and \(B\) are independent. According to Bayes' theorem, we can also switch the order of \(A\) given \(B\) and \(B\) given \(A\). Finally, the law of total probability gives us a way to decompose events into subevents.

We end this chapter by mentioning a few terms related to conditional probabilities that will become useful later. Let us use the tennis tournament as an example:

sep0em
\(\Pb[W\,|\,A]\) = conditional probability = Given that you played with player \(A\), what is the probability that you will win?
\(\Pb[A]\) = prior probability = Without even entering the game, what is the chance that you will face player \(A\)?
\(\Pb[A\,|\,W]\) = posterior probability = After you have won the game, what is the probability that you have actually played with \(A\)?

In many practical engineering problems, the question of interest is often the last one. That is, supposing that you have observed something, what is the most likely cause of that event? For example, supposing we have observed this particular dataset, what is the best Gaussian model that would fit the dataset? Questions like these require some analysis of conditional probability, prior probability, and posterior probability.