Moment-Generating and Characteristic Functions
Consider two independent random variables \(X\) and \(Y\) with PDFs \(f_X(x)\) and \(f_Y(y)\), respectively. Let \(Z = X+Y\) be the sum of the two random variables. We know from Chapter 5 that the PDF of \(Z\), \(f_Z\), is the convolution of \(f_X\) and \(f_Y\). However, we think you will agree that convolutions are not easy to compute. Especially when the sum involves more random variables, computing the convolution would be tedious. So how should we proceed in this case? One approach is to use some kind of “frequency domain” method that transforms the PDFs to another domain and then perform multiplication instead of the convolution to make the calculations easy or at least easier. The moment-generating functions and the characteristic functions are designed for this purpose.
6.1.1Moment-generating function
For any random variable \(X\), the moment-generating function (MGF) \(M_X(s)\) is
The definition says that the moment-generating function (MGF) is the expectation of the random variable taken to the power \(e^{sX}\) for some \(s\). Effectively, it is the expectation of a function of random variables. The meaning of the expectation can be seen by writing out the definition. For the discrete case, the MGF is
whereas in the continuous case, the MGF is
The continuous case should remind us of the definition of a Laplace transform. For any function \(f(t)\), the Laplace transform is
From this perspective, we can interpret the MGF as the Laplace transform of the PDF. The argument \(s\) of the output can be regarded as the coordinate in the Laplace space. If \(s = -j\omega\), then \(M_X(-j\omega)\) becomes the Fourier transform of the PDF.
Consider a random variable \(X\) with three states \(0, 1, 2\) and with probability masses \(\frac{2}{6}, \frac{3}{6}, \frac{1}{6}\) respectively. Find the MGF.
The moment-generating function is
Find the MGF for a Poisson random variable.
The MGF of Poisson random variable can be found as
Find the MGF for an exponential random variable.
The MGF of an exponential random variable can be found as
Why are moment-generating functions so called? The following theorem reveals the reason.
The MGF has the properties that
- sep0ex
- \(M_X(0) = 1\),
- \(\frac{d}{ds} M_X(s)|_{s=0} = \E[X]\), \(\frac{d^2}{ds^2} M_X(s)|_{s=0} = \E[X^2]\),
- \(\frac{d^k}{ds^k} M_X(s)|_{s=0} = \E[X^k]\), for any positive integer \(k\).
Proof. The first property can be proved by noting that
The third property holds because
Setting \(s = 0\) yields
The second property is a special case of the third property.
■The theorem tells us that if we take the derivative of the MGF and set \(s = 0\), we will obtain the moment. The order of the moment depends on the order of the derivative. As a result, the MGF can “generate moments” by taking derivatives. This happens because of the exponential function \(e^{sx}\). Since \(\frac{d}{ds}e^{sx} = xe^{sx}\), the variable \(x\) appears whenever we take the derivative.
Let \(X\) be a Bernoulli random variable with parameter \(p\). Find the first two moments using MGF.
The MGF of a Bernoulli random variable is
The first and the second moment, using the derivative approach, are
To facilitate our discussions of MGF, we summarize a few MGFs in the table below.
| Distribution | PMF / PDF | \(\E[X]\) | \(\Var[X]\) | \(M_X(s)\) |
| Bernoulli | \(p_X(1) = p\) and \(p_X(0) = 1-p\) | \(p\) | \(p(1-p)\) | \(1-p+pe^s\) |
| Binomial | \(p_X(k) = {n \choose k}p^k(1-p)^{n-k}\) | \(np\) | \(np(1-p)\) | \((1-p+pe^s)^n\) |
| Geometric | \(p_X(k) = p(1-p)^{k-1}\) | \(\displaystyle \frac{1}{p}\) | \(\displaystyle \frac{1-p}{p^2}\) | \(\displaystyle \frac{pe^s}{1-(1-p)e^s}\) |
| Poisson | \(\displaystyle p_X(k) = \frac{\lambda^k e^{-\lambda}}{k!}\) | \(\lambda\) | \(\lambda\) | \(\displaystyle e^{\lambda(e^s-1)}\) |
| Gaussian | \(\displaystyle f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\}\) | \(\mu\) | \(\sigma^2\) | \(\displaystyle \exp\left\{\mu s + \frac{\sigma^2s^2}{2}\right\}\) |
| Exponential | \(f_X(x) = \lambda\exp\left\{-\lambda x\right\}\) | \(\displaystyle \frac{1}{\lambda}\) | \(\displaystyle \frac{1}{\lambda^2}\) | \(\displaystyle \frac{\lambda}{\lambda-s}\) |
| Uniform | \(\displaystyle f_X(x) = \frac{1}{b-a}\) | \(\displaystyle \frac{a+b}{2}\) | \(\displaystyle \frac{(b-a)^2}{12}\) | \(\displaystyle \frac{e^{sb}-e^{sa}}{s(b-a)}\) |
6.1.2Sum of independent variables via MGF
MGFs are most useful when analyzing the PDF of a sum of two random variables. The following theorem highlights the result.
Let \(X\) and \(Y\) be independent random variables. Let \(Z = X+Y\). Then
Proof. By the definition of MGF, we have that
where (a) is valid because \(X\) and \(Y\) are independent.
■Consider independent random variables \(X_1,\ldots,X_N\). Let \(Z = \sum_{n=1}^N X_n\) be the sum of random variables. Then the MGF of \(Z\) is
If these random variables are further assumed to be identically distributed, the MGF is
Proof. This follows immediately from the previous theorem:
If the random variables \(X_1,\ldots,X_N\) are i.i.d., then the product simplifies to
Let \(X_1\), …, \(X_N\) be a sequence of i.i.d. Bernoulli random variables with parameter \(p\). Let \(Z = X_1+\cdots+X_N\) be the sum. Then \(Z\) is a binomial random variable with parameters \((N,p)\).
Proof. Let us consider a sequence of i.i.d. Bernoulli random variables \(X_n \sim \mathrm{Bernoulli}(p)\) for \(n = 1,\ldots, N\). Let \(Z = X_1 + \cdots + X_N\). The moment-generating function of \(Z\) is
Now, let us check the moment-generating function of a binomial random variable: If \(Z \sim \mathrm{Binomial}(N,p)\), then
where the last equality holds because \(\sum_{k=0}^N {N \choose k} a^k b^{N-k} = (a+b)^N\). Therefore, the two moment-generating functions are identical.
■Let \(X_1\), …, \(X_N\) be a sequence of i.i.d. binomial random variables with parameters \((n,p)\). Let \(Z = X_1+\cdots+X_N\) be the sum. Then \(Z\) is a binomial random variable with parameters \((Nn,p)\).
Proof. The MGF of a binomial random variable is
If we have \(N\) of these random variables, then \(Z = X_1 + \cdots + X_N\) will have the MGF
Note that this is just the MGF of another binomial random variable with parameter \((Nn,p)\).
■Let \(X_1\), …, \(X_N\) be a sequence of i.i.d. Poisson random variables with parameter \(\lambda\). Let \(Z = X_1+\cdots+X_N\) be the sum. Then \(Z\) is a Poisson random variable with parameter \(N\lambda\).
Proof. The MGF of a Poisson random variable is
Assume that we have a sum of \(N\) i.i.d. Poisson random variables. Then, by the main theorem, we have that
Therefore, the resulting random variable \(Z\) is a Poisson with parameter \(N\lambda\).
■Let \(X_1\), …, \(X_N\) be a sequence of independent Gaussian random variables with parameters \((\mu_1,\sigma_1^2)\), …, \((\mu_N,\sigma_N^2)\). Let \(Z = X_1+\cdots+X_N\) be the sum. Then \(Z\) is a Gaussian random variable:
Proof. We skip the proof of the MGF of a Gaussian. It can be shown that
When we have a sequence of Gaussian random variables, then
Therefore, the resulting random variable \(Z\) is also a Gaussian. The mean and variance of \(Z\) are \(\sum_{n=1}^N \mu_n\) and \(\sum_{n=1}^N \sigma_n^2\), respectively.
■6.1.3Characteristic functions
Moment-generating functions are the Laplace transforms of the PDFs. However, since the Laplace transform is defined on the entire right half-plane, not all PDFs can be transformed. One way to mitigate this problem is to restrict \(s\) to the imaginary axis, \(s = j\omega\). This will give us the characteristic function.
The characteristic function of a random variable \(X\) is
However, we note that since \(\omega\) can take any value in \((-\infty,\infty)\), it does not matter if we consider \(\E[e^{-j\omega X}]\) or \(\E[e^{j\omega X}]\). This leads to the following equivalent definition of the characteristic function:
The characteristic function of a random variable \(X\) is
If we follow this definition, we see that the characteristic function can be written as
This is exactly the Fourier transform of the PDF. The reason for introducing this alternative characteristic function is that \(\E[e^{-j\omega X}]\) is the Fourier transform of \(f_X(x)\) but \(\E[e^{j\omega X}]\) is the inverse Fourier transform of \(f_X(x)\). The former is more convenient (in terms of notation) for students who have taken a course in signals and systems. However, we should stress that the usual way of defining the characteristic function is \(\E[e^{j\omega X}]\).
A list of common Fourier transforms is shown in the table below. Additional identities can be found in standard signals and systems textbooks.
| Fourier Transforms \(f(t) \longleftrightarrow F(\omega)\) | |||
| lightgray 1. | \(e^{-at}u(t) \longleftrightarrow \frac{1}{a+j\omega}\), \(a > 0\) | 10. | \(\mathrm{sinc}^2(\frac{Wt}{2}) \longleftrightarrow \frac{2\pi}{W}\Delta(\frac{\omega}{2W})\) |
| 2. | \(e^{at}u(-t) \longleftrightarrow \frac{1}{a-j\omega}\), \(a > 0\) | 11. | \(e^{-at}\sin(\omega_0 t)u(t) \longleftrightarrow \frac{\omega_0}{(a+j\omega)^2+\omega_0^2}\) |
| lightgray 3. | \(e^{-a|t|} \longleftrightarrow \frac{2a}{a^2+\omega^2}\), \(a > 0\) | 12. | \(e^{-at}\cos(\omega_0 t)u(t) \longleftrightarrow \frac{a+j\omega}{(a+j\omega)^2+\omega_0^2}\) |
| 4. | \(\frac{a^2}{a^2+t^2} \longleftrightarrow \pi a e^{-a|\omega|}\), \(a>0\) | 13. | \(e^{-\frac{t^2}{2\sigma^2}} \longleftrightarrow \sqrt{2\pi}\sigma e^{-\frac{\sigma^2\omega^2}{2}}\) |
| lightgray 5. | \(te^{-at}u(t) \longleftrightarrow \frac{1}{(a+j\omega)^2}\), \(a>0\) | 14. | \(\delta(t) \longleftrightarrow 1\) |
| 6. | \(t^n e^{-at}u(t) \longleftrightarrow \frac{n!}{(a+j\omega)^{n+1}}\), \(a>0\) | 15. | \(1 \longleftrightarrow 2\pi \delta(\omega)\) |
| lightgray 7. | \(\mathrm{rect}(\frac{t}{\tau}) \longleftrightarrow \tau \mathrm{sinc}(\frac{\omega\tau}{2})\) | 16. | \(\delta(t-t_0) \longleftrightarrow e^{-j\omega t_0}\) |
| 8. | \(\mathrm{sinc}(Wt) \longleftrightarrow \frac{\pi}{W}\mathrm{rect}(\frac{w}{2W})\) | 17. | \(e^{j\omega_0t} \longleftrightarrow 2\pi \delta(\omega-\omega_0)\) |
| lightgray 9. | \(\Delta(\frac{t}{\tau}) \longleftrightarrow \frac{\tau}{2}\mathrm{sinc}^2(\frac{\omega\tau}{4})\) | 18. | \(f(t)e^{j\omega_0t} \longleftrightarrow F(\omega-\omega_0)\) |
Let \(X\) be a random variable with PDF \(f_X(x) = \lambda e^{-\lambda x}\) for \(x \ge 0\). Find the characteristic function.
The Fourier transform pair is
Therefore, the characteristic function is \(\Phi_X(j\omega) = \frac{\lambda}{\lambda+j\omega}\).
Let \(X\) and \(Y\) be independent, and let
Find the PDF of \(Z = X+Y\).
The characteristic function of \(X\) and \(Y\) can be found from the Fourier table:
Therefore, the characteristic function of \(Z\) is
By inverse Fourier transform, we have that
Why \(\Phi_X(j\omega)\) but not \(M_X(s)\)? As we said, the MGF is not always defined. Recall that the expectation \(\E[X]\) exists only when \(f_X(x)\) is absolutely integrable, or \(\E[|X|] < \infty\). For a characteristic function, the expectation is valid because \(\E[|e^{j\omega X}|] = \E[1] = 1\). However, for a moment-generating function, \(\E[|e^{s X}|]\) could be unbounded. To see a counterexample, we consider the Cauchy distribution.
Consider the Cauchy distribution with PDF
The MGF of \(X\) is undefined but the characteristic function is well defined.
Proof. The MGF is
Therefore, the MGF is undefined. On the other hand, by the Fourier table we know that
Let \(X_0,X_1,\ldots\) be a sequence of independent random variables with PDF
Find the PDF of \(Y\), where \(Y = \sum_{k=0}^{\infty}X_k.\)
From the Fourier transform table, we know that
The characteristic function of \(Y\) is
Since \(\sum_{k=0}^{\infty} a_k = \sum_{k=0}^{\infty} \frac{1}{2^{k+1}} = \frac{1}{2}+\frac{1}{4} + \cdots = 1\), the characteristic function becomes \(\Phi_Y(j\omega) = e^{-|\omega|}\). The inverse Fourier transform gives us
Therefore the PDF of \(Y\) is
Two random variables \(X\) and \(Y\) have the PDFs
Find the PDF of \(Z = \max(X,Y) - \min(X,Y)\).
We first show that
Suppose \(X > Y\), then \(\max(X,Y) = X\) and \(\min(X,Y) = Y\). So \(Z = X - Y\). If \(X < Y\), then \(\max(X,Y) = Y\) and \(\min(X,Y) = X\). So \(Z = Y- X\). Combining the two cases gives us \(Z = |X-Y|\). Now, consider the Fourier transform of the PDFs:
Let \(U = X-Y\), and let \(Z = |U|\). The characteristic function is
With the PDF of \(U\), we can find the CDF of \(Z\):
Hence, the PDF is
Closing remark. Moment-generating functions and characteristic functions are useful tools. In this section, we have confined our discussion to using them to compute the sum of two random variables. Later sections and chapters will explain further uses for these functions. For example, we use the MGFs when proving Chernoff's bound and proving the Central Limit Theorem.