Summary

As you were reading this chapter, you may have felt that the first and second parts discuss distinctly different subjects, and in fact many books treat them as separate topics. We take a different approach. We think that they are essentially the same thing if you understand the following chain of distributions:

\begin{aligned} \underset{\text{\textcolor{myblue}{one variable}}}{\underbrace{f_X(x)}} \Longrightarrow \underset{\text{\textcolor{myblue}{two variables}}}{\underbrace{f_{X_1,X_2}(x_1,x_2)}} %\Longrightarrow \underset{\text{\textcolor{myblue}{three variables}}}{\underbrace{f_{X_1,X_2,X_3}(x_1,x_2,x_3)}} \Longrightarrow \cdots \Longrightarrow \underset{\text{\textcolor{myblue}{$N$ variables}}}{\underbrace{f_{X_1,\ldots,X_N}(x_1,\ldots,x_N)}}. \end{aligned}

The first part exclusively deals with two variables. The generalization from two variables to \(N\) variables is straightforward for PDFs and CDFs:

sep0ex
PDF: \(f_{X_1,X_2}(x_1,x_2) \Longrightarrow f_{X_1,\ldots,X_N}(x_1,\ldots,x_N)\).
CDF: \(F_{X_1,X_2}(x_1,x_2) \Longrightarrow F_{X_1,\ldots,X_N}(x_1,\ldots,x_N)\).

The joint expectation can also be generalized from two variables to \(N\) variables:

\begin{aligned} \begin{bmatrix} \text{Var}[X_1] & \text{Cov}(X_1,X_2)\\ \text{Cov}(X_2,X_1) & \text{Var}[X_2] \end{bmatrix} \Longrightarrow \begin{bmatrix} \text{Var}[X_1] & \cdots & \text{Cov}(X_1,X_N) \\ \vdots & \ddots & \vdots \\ \text{Cov}(X_N,X_1) & \cdots & \text{Var}[X_N] \end{bmatrix}. \end{aligned}

Conditional PDFs and conditional expectations are powerful tools for decomposing complex events into simpler events. Specifically, the law of total expectation,

\E[X] = \int \E[X|Y = y] f_Y(y) \;dy = \E_{Y}[\E_{X|Y}[X|Y]],

is instrumental for evaluating variables defined through conditional relationships. The idea is also extendable to more random variables, such as

\E[X_1] = \int \int \E[X_1| X_2 = x_2, X_3 = x_3] f_{X_2,X_3}(x_2, x_3) \;dx_2 \;dx_3,

where \(\E[X_1| X_2 = x_2, X_3 = x_3]\) can be evaluated through

\E[X_1| X_2 = x_2, X_3 = x_3] = \int x_1 f_{X_1 | X_2,X_3}(x_1 \;|\; x_2, x_3) \;dx_1.

This type of chain relationship can generalize to other high-order cases.

It is important to remember that for any high-dimensional random variables, the characterization is always made by the PDF \(f_{\mX}(\vx)\) (or the CDF). We did not go into the details of analyzing \(f_{\mX}(\vx)\) but have only discussed the mean vector \(\E[\mX] = \vmu\) and the covariance matrix \(\text{Cov}(\mX) = \mSigma\). We have been focusing exclusively on the high-dimensional Gaussian random variables

f_{\mX}(\vx) = \frac{1}{\sqrt{(2\pi)^d |\mSigma|}} \exp\left\{-\frac{1}{2}(\vx-\vmu)^T\mSigma^{-1}(\vx-\vmu)\right\},

because they are ubiquitous in data science today. We discussed the linear transformations from a zero-mean unit-variance Gaussian to another Gaussian, and vice versa.