Joint PMF and Joint PDF
Probability is a measure of the size of a set. This principle applies to discrete random variables, continuous random variables, single random variables, and multiple random variables. In situations with a pair of random variables, the measure should be applied to the coordinate \((X,Y)\) represented by the random variables \(X\) and \(Y\). Consequently, when measuring the probability, we either count these coordinates or integrate the area covered by these coordinates. In this section, we formalize this notion of measuring 2D events.
5.1.1Probability measure in 2D
Consider two random variables \(X\) and \(Y\). Let the sample space of \(X\) and \(Y\) be \(\Omega_X\) and \(\Omega_Y\), respectively. Define the Cartesian product of \(\Omega_X\) and \(\Omega_Y\) as \(\Omega_X \times \Omega_Y = \{(x,y) \;|\; x \in \Omega_X \;\text{and}\; y \in \Omega_Y\}\). That is, \(\Omega_X \times \Omega_Y\) contains all possible pairs \((X,Y)\).
If \(\Omega_X = \{1,2\}\) and \(\Omega_Y = \{4,5\}\), then \(\Omega_X \times \Omega_Y = \{(1,4),(1,5),\) \((2,4),(2,5)\}.\)
If \(\Omega_X = [3,4]\) and \(\Omega_Y = [1,2]\), then \(\Omega_X \times \Omega_Y =\) a rectangle with two diagonal vertices as \((3,1)\) and \((4,2)\).
Random variables are mappings from the sample space to the real line. If \(\omega \in \Omega_X\) is mapped to \(X(\omega) \in \R\), and \(\xi \in \Omega_Y\) is mapped to \(Y(\xi) \in \R\), then a coordinate \(\vomega = (\omega,\xi)\) in the sample space \(\Omega_X \times \Omega_Y\) should be mapped to a coordinate \((X(\omega),Y(\xi))\) in the 2D plane.
We denote such a vector-to-vector mapping as \(\mX(\cdot): \Omega_X \times \Omega_Y \rightarrow \R \times \R\), as illustrated in Figure 5.1.

Therefore, if we have an event \(\calA \in \R^2\), the probability that \(\calA\) happens is
In other words, we take the coordinate \(\mX(\vomega)\) and find its inverse image \(\mX^{-1}(\calA)\). The size of this inverse image \(\mX^{-1}(\calA)\) in the sample space \(\Omega_X \times \Omega_Y\) is then the probability. We summarize this general principle as follows.
For a pair of random variables \(\mX = (X,Y)\), the probability of an event \(\calA\) is measured in the product space \(\Omega_X \times \Omega_Y\) with the size $$\Pb[\{\vomega \;|\; \mX^{-1}(\calA)\}].$$
This definition is quite abstract. To make it more concrete, we will look at discrete and continuous random variables.
5.1.2Discrete random variables
Suppose that the random variables \(X\) and \(Y\) are discrete. Let \(\calA = \{X(\omega) = x, \; Y(\xi) = y\}\) be a discrete event. Then the above definition tells us that the probability of \(\calA\) is
We define this probability as the joint probability mass function (joint PMF) \(p_{X,Y}(x,y)\).
Let \(X\) and \(Y\) be two discrete random variables. The joint PMF of \(X\) and \(Y\) is defined as
We sometimes write the joint PMF as \(p_{X,Y}(x,y) = \Pb[X = x,\; Y = y]\). Figure 5.2 shows a graphical portrayal of the joint PMF. In a nutshell, \(p_{X,Y}(x,y)\) can be considered as a 2D extension of a single variable PMF. The probabilities are still represented by the impulses, but the domain of these impulses is now a 2D plane. If we have an event \(\calA\), then the size of the event is

Let \(X\) be a coin flip, \(Y\) be a die. The sample space of \(X\) is \(\{0,1\}\), whereas the sample space of \(Y\) is \(\{1,2,3,4,5,6\}\). The joint PMF, according to our definition, is the probability \(\Pb[X = x \text{ and } Y = y]\), where \(x\) takes a binary state and \(Y\) takes one of the 6 states. The following table summarizes all the 12 states of the joint distribution.
| Y | ||||||
| 1 | 2 | 3 | 4 | 5 | 6 | |
| X = 0 | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) |
| X = 1 | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) |
In this table, since there are 12 coordinates, and each coordinate has an equal chance of appearing, the probability for each coordinate becomes \(1/12\). Therefore, the joint PMF of \(X\) and \(Y\) is
In this example, we observe that if \(X\) and \(Y\) are not interacting with each other (formally, independent), the joint PMF is the product of the two individual probabilities.
In the previous example, if we define \(\calA = \{X+Y = 3\}\), the probability \(\Pb[\calA]\) is
If \(\calB = \{\min(X,Y) = 1\}\), the probability \(\Pb[\calB]\) is
5.1.3Continuous random variables
The continuous version of the joint PMF is called the joint probability density function (joint PDF), denoted by \(f_{X,Y}(x,y)\). A joint PDF is analogous to a joint PMF. For example, integrating it will give us the probability.
Let \(X\) and \(Y\) be two continuous random variables. The joint PDF of \(X\) and \(Y\) is a function \(f_{X,Y}(x,y)\) that can be integrated to yield a probability
for any event \(\calA \subseteq \Omega_X \times \Omega_Y\).
Pictorially, we can view \(f_{X,Y}\) as a 2D function where the height at a coordinate \((x,y)\) is \(f_{X,Y}(x,y)\), as can be seen from Figure 5.3. To compute the probability that \((X,Y) \in \calA\), we integrate the function \(f_{X,Y}\) with respect to the area covered by the set \(\calA\). For example, if the set \(\calA\) is a rectangular box \(\calA = [a,b] \times [c,d]\), then the integration becomes

Consider a uniform joint PDF \(f_{X,Y}(x,y)\) defined on \([0,2]^2\) with \(f_{X,Y}(x,y) = \frac{1}{4}\). Let \(\calA = [a,b] \times [c,d]\). Find \(\Pb[\calA]\).
In the previous example, let \(\calB = \{X + Y \le 2\}\). Find \(\Pb[\calB]\).
Here, the limits of the integration can be determined from Figure 5.4. The inner integration (with respect to \(x\)) should start from 0 and end at \(2-y\), which is the line defining the set \(x+y\le 2\). Since the inner integration is performed for every \(y\), we need to enumerate all the possible \(y\)'s to complete the outer integration. This leads to the outer limit from 0 to 2.

5.1.4Normalization
The normalization property of a two-dimensional PMF and PDF is the property that, when we enumerate all outcomes of the sample space, we obtain 1.
Let \(\Omega = \Omega_X \times \Omega_Y\). All joint PMFs and joint PDFs satisfy
Consider a joint uniform PDF defined in the shaded area \([0,3]\times[0,3]\) with PDF defined below. Find the constant \(c\).
To find the constant \(c\), we note that
Equating the two sides gives us \(c = \frac{1}{9}\).
Consider a joint PDF
Find the constant \(c\). Tip: Consider the area of integration as shown in Figure 5.5.
There are two ways to take the integration shown in Figure 5.5. We choose the inner integration w.r.t. \(y\) first.
Therefore, \(c = 2\).

5.1.5Marginal PMF and marginal PDF
If we only sum / integrate for one random variable, we obtain the PMF / PDF of the other random variable. The resulting PMF / PDF is called the marginal PMF / PDF.
The marginal PMF is defined as
and the marginal PDF is defined as
Since \(f_{X,Y}(x,y)\) is a two-dimensional function, when integrating over \(y\) from \(-\infty\) to \(\infty\), we project \(f_{X,Y}(x,y)\) onto the \(x\)-axis. Therefore, the resulting function depends on \(x\) only.
Consider the joint PDF \(f_{X,Y}(x,y) = \frac{1}{4}\) shown below. Find the marginal PDFs.

If we integrate over \(x\) and \(y\), we have
So the marginal PDFs are the projection of the joint PDFs onto the \(x\)- and \(y\)-axes.
A joint Gaussian random variable \((X,Y)\) has a joint PDF given by
Find the marginal PDFs \(f_X(x)\) and \(f_Y(y)\).
Recognizing that the last integral is equal to unity because it integrates a Gaussian PDF over the real line, it follows that
Similarly, we have $$f_Y(y) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left\{-\frac{(y-\mu_Y)^2}{2\sigma^2}\right\}.$$
5.1.6Independent random variables
Two random variables are said to be independent if and only if the joint PMF or PDF can be factorized as a product of the marginal PMF / PDFs.
Random variables \(X\) and \(Y\) are independent if and only if
This definition is consistent with the definition of independence of two events. Recall that two events \(A\) and \(B\) are independent if and only if \(\Pb[A \cap B] = \Pb[A]\Pb[B]\). Letting \(A = \{X = x\}\) and \(B = \{Y = y\}\), we see that if \(A\) and \(B\) are independent then \(\Pb[X = x \cap Y = y]\) is the product \(\Pb[X = x]\Pb[Y = y]\). This is precisely the relationship \(p_{X,Y}(x,y) = p_X(x)\;p_Y(y)\).
Consider two random variables with a joint PDF given by
Are \(X\) and \(Y\) independent?
We know that
Therefore, the random variables \(X\) and \(Y\) are independent.
Let \(X\) be a coin and \(Y\) be a die. Then the joint PMF is given by the table below.
| Y | ||||||
| 1 | 2 | 3 | 4 | 5 | 6 | |
| X = 0 | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) |
| X = 1 | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) | \(\frac{1}{12}\) |
Are \(X\) and \(Y\) independent?
For any \(x\) and \(y\), we have that
Therefore, the random variables \(X\) and \(Y\) are independent.
Consider two random variables \(X\) and \(Y\) with a joint PDF given by (We use the notation “\(\propto\)” to denote “proportional to”. It implies that the normalization constant is omitted.)
This PDF cannot be factorized into a product of two marginal PDFs. Therefore, the random variables are dependent.
We can extrapolate the definition of independence to multiple random variables. If there are many random variables \(X_1,X_2,\ldots,X_N\), they will have a joint PDF
If these random variables \(X_1,X_2,\ldots,X_N\) are independent, then the joint PDF can be factorized as
This gives us the definition of independence for \(N\) random variables.
A sequence of random variables \(X_1, \ldots, X_N\) is independent if and only if their joint PDF (or joint PMF) can be factorized.
Throw a die 4 times. Let \(X_1\), \(X_2\), \(X_3\) and \(X_4\) be the outcomes. Then, since these four throws are independent, the probability mass function of any quadruple \((x_1,x_2,x_3,x_4)\) is
For example, the probability of getting \((1,5,2,6)\) is
The example above demonstrates an interesting phenomenon. If the \(N\) random variables are independent, and if they all have the same distribution, then the joint PDF/PMF is just one of the individual PDFs taken to the power \(N\). Random variables satisfying this property are known as independent and identically distributed random variables.
A collection of random variables \(X_1,\ldots,X_N\) is called independent and identically distributed (i.i.d.) if
- sep0ex
- All \(X_1,\ldots,X_N\) are independent; and
- All \(X_1,\ldots,X_N\) have the same distribution, i.e., \(f_{X_1}(x) = \cdots = f_{X_N}(x)\).
If \(X_1,\ldots,X_N\) are i.i.d., we have that
where the particular choice of \(X_1\) is unimportant because \(f_{X_1}(x) = \cdots = f_{X_N}(x)\).
- sep0em
- If a set of random variables are i.i.d., then the joint PDF can be written as a product of PDFs.
- Integrating a joint PDF is difficult. Integrating a product of PDFs is much easier.
Let \(X_1,X_2,\ldots,X_N\) be a sequence of i.i.d. Gaussian random variables where each \(X_i\) has a PDF
The joint PDF of \(X_1,X_2,\ldots,X_N\) is
which is a function depending not on the individual values of \(x_1,x_2,\ldots,x_N\) but on the sum \(\sum_{i=1}^N x_i^2\). So we have “compressed” an \(N\)-dimensional function into a 1D function.
Let \(\theta\) be a deterministic number that was sent through a noisy channel. We model the noise as an additive Gaussian random variable with mean \(0\) and variance \(\sigma^2\). Supposing we have observed measurements \(X_i = \theta + W_i\), for \(i = 1,\ldots,N\), where \(W_i \sim \text{Gaussian}(0,\sigma^2)\), then the PDF of each \(X_i\) is $$f_{X_i}(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{(x-\theta)^2}{2\sigma^2}\right\}.$$ Thus the joint PDF of \((X_1,X_2,\ldots,X_N)\) is
Essentially, this joint PDF tells us the probability density of seeing sample data \(x_1,\ldots,x_N\).
5.1.7Joint CDF
We now introduce the cumulative distribution function (CDF) for multiple variables.
Let \(X\) and \(Y\) be two random variables. The joint CDF of \(X\) and \(Y\) is the function \(F_{X,Y}(x,y)\) such that
This definition can be more explicitly written as follows.
If \(X\) and \(Y\) are discrete, then
If \(X\) and \(Y\) are continuous, then
If the two random variables are independent, then we have
Let \(X\) and \(Y\) be two independent uniform random variables\ \(\text{Uniform}(0,1)\). Find the joint CDF.
Let \(X\) and \(Y\) be two independent Gaussian random variables \(\text{Gaussian}(\mu,\sigma^2)\). Find the joint CDF.
Let \(\Phi(\cdot)\) be the CDF of the standard Gaussian.
Here are a few properties of the CDF:
In addition, we can obtain the marginal CDF as follows.
Let \(X\) and \(Y\) be two random variables. The marginal CDF is
Proof. We prove only the first case. The second case is similar.
By the fundamental theorem of calculus, we can derive the PDF from the CDF.
Let \(F_{X,Y}(x,y)\) be the joint CDF of \(X\) and \(Y\). Then, the joint PDF is
The order of the partial derivatives can be switched, yielding a symmetric result:
Let \(X\) and \(Y\) be two uniform random variables with joint CDF \(F_{X,Y}(x,y) = xy\) for \(0 \le x \le 1\) and \(0 \le y \le 1\). Find the joint PDF.
which is consistent with the definition of a joint uniform random variable.
Let \(X\) and \(Y\) be two exponential random variables with joint CDF
Find the joint PDF.
which is consistent with the definition of a joint exponential random variable.