Conditional PMF and PDF
Whenever we have a pair of random variables \(X\) and \(Y\) that are correlated, we can define their conditional distributions, which quantify the probability of \(X = x\) given \(Y = y\). In this section, we discuss the concepts of conditional PMF and PDF.
5.3.1Conditional PMF
We start by defining the conditional PMF for a pair of discrete random variables.
Let \(X\) and \(Y\) be two discrete random variables. The conditional PMF of \(X\) given \(Y\) is
The simplest way to understand this is to view \(p_{X|Y}(x|y)\) as \(\Pb[X = x \,|\, Y = y]\). That is, given that \(Y = y\), what is the probability for \(X = x\)? To see why this perspective makes sense, let us recall the definition of a conditional probability:
As we can see, the last two equalities are essentially the definitions of conditional probability and the joint PMF.
How should we understand the notation \(p_{X|Y}(x|y)\)? Is it a one-variable function in \(x\) or a two-variable function in \((x,y)\)? What does \(p_{X|Y}(x|y)\) tell us? To answer these questions, let us first try to understand the randomness exhibited in a conditional PMF. In \(p_{X|Y}(x|y)\), the random variable \(Y\) is fixed to a specific value \(Y = y\). Therefore, there is nothing random about \(Y\). All the possibilities of \(Y\) have already been taken care of by the denominator \(p_Y(y)\). Only the variable \(x\) in \(p_{X|Y}(x|y)\) has randomness. What do we mean by “fixed at a value \(Y = y\)”? Consider the following example.
Suppose there are two coins. Let
Clearly, \(X\) has 3 states: 0, 1, 2, and \(Y\) has two states: either 0 or 1. When we say \(p_{X|Y}(x|1)\), we refer to the probability mass function of \(X\) when fixing \(Y = 1\). If we do not impose this condition, the probability mass of \(X\) is simple: $$p_X(x) = \left[\frac{1}{4}, \frac{1}{2}, \frac{1}{4}\right].$$ However, if we include the conditioning, then
To put this in plain words, when \(Y = 1\), there is no way for \(X\) to take the state 0. The chance for \(X\) to take the state 1 is \(1/2\) because only \((1,0)\) can give \(X = 1\) when the first coin is fixed at 1. The chance for \(X\) to take the state 2 is \(1/2\) because it has to be \((1,1)\) in order to give \(X = 2\). Therefore, when we say “conditioned on \(Y = 1\)”, we mean that we limit our observations to cases where \(Y = 1\). Since \(Y\) is already fixed at \(Y = 1\), there is nothing random about \(Y\). The only variable is \(X\). This example is illustrated in Figure 5.9.

Since \(Y\) is already fixed at a particular value \(Y = y\), \(p_{X|Y}(x|y)\) is a probability mass function of \(x\) (we want to emphasize again that it is \(x\) and not \(y\)). So \(p_{X|Y}(x|y)\) is a one-variable function in \(x\). It is not the same as the usual PMF \(p_X(x)\). \(p_{X|Y}(x|y)\) is conditioned on \(Y = y\). For example, \(p_{X|Y}(x|1)\) is the PMF of \(X\) restricted to the condition that \(Y = 1\). In fact, it follows that
and this tells us that \(p_{X|Y}(x|y)\) is a legitimate probability mass of \(X\). If we sum over the \(y\)'s instead, then we will hit a bump: $$\sum_{y \in \Omega_Y} p_{X|Y}(x|y) = \sum_{y \in \Omega_Y} \frac{p_{X,Y}(x,y)}{p_Y(y)} \not= 1.$$ Therefore, while \(p_{X|Y}(x|y)\) is a legitimate probability mass function of \(X\), it is not a probability mass function of \(Y\).
Consider a joint PMF given in the following table. Find the conditional PMF \(p_{X|Y}(x|1)\) and the marginal PMF \(p_X(x)\).
| Y= | ||||
| 1 | 2 | 3 | 4 | |
| X = 1 | \(\frac{1}{20}\) | \(\frac{1}{20}\) | \(\frac{1}{20}\) | \(\frac{0}{20}\) |
| 2 | \(\frac{1}{20}\) | \(\frac{2}{20}\) | \(\frac{3}{20}\) | \(\frac{1}{20}\) |
| 3 | \(\frac{1}{20}\) | \(\frac{2}{20}\) | \(\frac{3}{20}\) | \(\frac{1}{20}\) |
| 4 | \(\frac{0}{20}\) | \(\frac{1}{20}\) | \(\frac{1}{20}\) | \(\frac{1}{20}\) |
To find the marginal PMF, we sum over all the \(y\)'s for every \(x\):
Hence, the marginal PMF is
The conditional PMF \(p_{X|Y}(x|1)\) is
Consider two random variables \(X\) and \(Y\) defined as follows.
Find \(p_{X|Y}(x|y)\), \(p_X(x)\) and \(p_{X,Y}(x,y)\).
Since \(Y\) takes two different states, we can enumerate \(Y = 10^2\) and \(Y = 10^4\). This gives us
The joint PMF \(p_{X,Y}(x,y)\) is
Therefore, the joint PMF is given by the following table.
| \(10^4\) | 0 | 0 | \(\frac{1}{12}\) | \(\frac{1}{18}\) | \(\frac{1}{36}\) |
| \(10^2\) | \(\frac{5}{12}\) | \(\frac{5}{18}\) | \(\frac{5}{36}\) | 0 | 0 |
| 0.01 | 0.1 | 1 | 10 | 100 |
The marginal PMF \(p_X(x)\) is thus
In the previous two examples, what is the probability \(\Pb[X \in A \,|\, Y = y]\) or the probability \(\Pb[X \in A]\) for some events \(A\)? The answers are given by the following theorem.
Let \(X\) and \(Y\) be two discrete random variables. Let \(A\) be an event.
and
Proof. The first statement is based on the fact that if \(A\) contains a finite number of elements, then \(\Pb[X \in A]\) is equivalent to the sum \(\sum_{x \in A}\Pb[X = x]\). Thus,
The second statement holds because the inner summation \(\sum_{y \in \Omega_Y} p_{X|Y}(x|y)p_Y(y)\) is just the marginal PMF \(p_X(x)\). Thus the outer summation yields the probability.
■Let us follow up on Example 5.17. What is the probability that \(\Pb[X > 2|Y= 1]\)? What is the probability that \(\Pb[X > 2]\)?
Since the problem asks about the conditional probability, we know that it can be computed by using the conditional PMF. This gives us
The other probability is
- sep0ex
- The PMF/PDF should match with the probability you are finding.
- If you want to find the conditional probability \(\Pb[X \in A|Y=y]\), use the conditional PMF \(p_{X|Y}(x|y)\).
- If you want to find the probability \(\Pb[X \in A]\), use the marginal PMF \(p_{X}(x)\).
Finally, we define the conditional CDF for discrete random variables.
Let \(X\) and \(Y\) be discrete random variables. Then the conditional CDF of \(X\) given \(Y = y\) is
5.3.2Conditional PDF
We now discuss the conditioning of a continuous random variable.
Let \(X\) and \(Y\) be two continuous random variables. The conditional PDF of \(X\) given \(Y\) is
Let \(X\) and \(Y\) be two continuous random variables with a joint PDF
Find the conditional PDFs \(f_{X|Y}(x|y)\) and \(f_{Y|X}(y|x)\).
We first find the marginal PDFs.
Thus, the conditional PDFs are
Where does the conditional PDF come from? We cannot duplicate the argument we used for the discrete case because the denominator of a conditional PMF becomes \({\Pb[Y = y] = 0}\) when \(Y\) is continuous. To answer this question, we first define the conditional CDF for continuous random variables.
Let \(X\) and \(Y\) be continuous random variables. Then the conditional CDF of \(X\) given \(Y = y\) is
Why should the conditional CDF of continuous random variable be defined in this way? One way to interpret \(F_{X|Y}(x|y)\) is as the limiting perspective. We can define the conditional CDF as
With some calculations, we have that
The key here is that the small step size \(h\) in the numerator and the denominator will cancel each other out. Now, given the conditional CDF, we can verify the definition of the conditional PDF. It holds that
where (a) follows from the fundamental theorem of calculus.
Just like the conditional PMF, we can calculate the probabilities using the conditional PDFs. In particular, if we evaluate the probability where \(X \in A\) given that \(Y\) takes a particular value \(Y = y\), then we can integrate the conditional PDF \(f_{X|Y}(x|y)\), with respect to \(x\).
Let \(X\) and \(Y\) be continuous random variables, and let \(A\) be an event.
- sep0ex
- (i) \(\Pb[X \in A \;|\; Y = y] = \int_{A} f_{X|Y}(x|y) \;dx\),
- (ii) \(\Pb[X \in A] = \int_{\Omega_Y} \Pb[X \in A \,|\, Y=y] f_Y(y) \;dy\).
Let \(X\) be a random bit such that
Suppose that \(X\) is transmitted over a noisy channel so that the observed signal is
where \(N \sim \text{Gaussian}(0,1)\) is the noise, which is independent of the signal \(X\). Find the probabilities \(\Pb[X = +1 \,|\, Y > 0]\) and \(\Pb[X = -1 \,|\, Y > 0]\).
First, we know that
Therefore, integrating \(y\) from \(0\) to \(\infty\) gives us
Similarly, we have \(\Pb[Y > 0 \,|\, X = -1] = 1-\Phi(+1)\). The probability we want to find is \(\Pb[X = +1 \,|\, Y > 0]\), which can be determined using Bayes' theorem.
The denominator can be found by using the law of total probability:
since \(\Phi(+1)+\Phi(-1) = \Phi(+1)+1-\Phi(+1) = 1\). Therefore,
The implication is that if \(Y>0\), the probability \(\Pb[X = +1 \,|\, Y > 0] = 0.8413\). The complement of this result gives \(\Pb[X = -1 \,|\, Y > 0] = 1-0.8413 = 0.1587\).
Find \(\Pb[Y > y]\), where $$X \sim \mathrm{Uniform}[1,2], \quad Y \,|\, X \sim \mathrm{Exponential}(x).$$
The tricky part of this problem is the tendency to confuse the two variables \(X\) and \(Y\). Once you understand their roles, the problem becomes easy. First notice that \(Y \,|\, X \sim \mathrm{Exponential}(x)\) is a conditional distribution. It says that given \(X = x\), the probability distribution of \(Y\) is exponential, with the parameter \(x\). Thus, we have that
Why? Recall that if \(Y \sim \text{Exponential}(\lambda)\) then \(f_Y(y) = \lambda e^{-\lambda y}\). Now if we replace \(\lambda\) with \(x\), we have \(xe^{-xy}\). So the role of \(x\) in this conditional density function is as a parameter.
Given this property, we can compute the conditional probability:
Finally, we can compute the marginal probability:
We can double-check this result by noting that the problem asks about the probability \(\Pb[Y > y]\). Thus, the answer must be a function of \(y\) but not of \(x\).