# Table of Content

## Chapter 1 Mathematical Background

• 1.1 Infinite series

• 1.1.1 Geometric series

• 1.1.2 Binomial Series

• 1.2 Approximation

• 1.2.1 Taylor approximation

• 1.2.2 Exponential series

• 1.2.3 Logarithmic approximation

• 1.3 Integration

• 1.3.1 Odd and even functions

• 1.3.2 Fundamental theorem of calculus

• 1.4 Linear Algebra

• 1.4.1 Why do we need linear algebra in data science?

• 1.4.2 Everything you need to know about linear algebra

• 1.4.3 Inner products and norms

• 1.4.4 Matrix calculus

• 1.5 Basic Combinatorics

• 1.5.2 Permutation

• 1.5.3 Combination

## Chapter 2 Probability

• 2.1 Set Theory

• 2.1.1 Why study set theory?

• 2.1.2 Basic concepts of a set

• 2.1.3 Subsets

• 2.1.4 Empty set and universal set

• 2.1.5 Union

• 2.1.6 Intersection

• 2.1.7 Complement and difference

• 2.1.8 Disjoint and partition

• 2.1.9 Set operations

• 2.1.10 Closing Remark

• 2.2 Probability Space

• 2.2.1 Sample space • 2.2.2 Event space • 2.2.3 Probability law • 2.2.4 Measure zero sets

• 2.2.5 Summary of the probability space

• 2.3 Axioms of Probability

• 2.3.1 Why these three probability axioms?

• 2.3.2 Axioms through the lens of measure

• 2.3.3 Corollaries derived from axioms

• 2.4 Conditional Probability

• 2.4.1 Definition of conditional probability

• 2.4.2 Independence

• 2.4.3 Bayes’ theorem and law of total probability

• 2.4.4 Prisoner's dilemma

## Chapter 3 Discrete Random Variables

• 3.1 Random Variables

• 3.1.1 A motivating example

• 3.1.2 Definition of a random variable

• 3.1.3 Probability measure on random variables

• 3.2 Probability Mass Function

• 3.2.1 Definition

• 3.2.2 PMF and probability measure

• 3.2.3 Normalization Property

• 3.2.4 PMF vs Histogram

• 3.2.5 Estimating histograms from real data

• 3.3 Cumulative Distribution Function

• 3.3.1 Definition

• 3.3.2 Properties of CDF

• 3.3.3 Converting between PMF and CDF

• 3.4 Expectation

• 3.4.1 Definition

• 3.4.2 Existence of Expectation

• 3.4.3 Properties of Expectation

• 3.4.4 Moments and Variance

• 3.5 Common Discrete Random Variables

• 3.5.1 Bernoulli Random Variable

• 3.5.2 Binomial random variable

• 3.5.3 Geometric random variable

• 3.5.4 Poisson random variable

## Chapter 4 Continuous Random Variables

• 4.1 Probability Density Function

• 4.1.1 Some intuition about probability density functions

• 4.1.2 More in-depth discussion about PDFs

• 4.1.3 Connecting with PMF

• 4.2 Expectation, Moment, and Variance

• 4.2.1 Definition and properties

• 4.2.2 Existence of Expectation

• 4.2.3 Moment and Variance

• 4.3 Cumulative Distribution Function

• 4.3.1 CDF for continuous random variables

• 4.3.2 Properties of CDF

• 4.3.3 Retrieving PDF from CDF

• 4.3.4 CDF: Unifying discrete and continuous random variables

• 4.4 Median, Mode, and Mean

• 4.4.1 Median

• 4.4.2 Mode

• 4.4.3 Mean

• 4.5 Uniform and Exponential Random Variables

• 4.5.1 Uniform Random Variable

• 4.5.2 Exponential Random Variable

• 4.5.3 Origin of exponential random variable

• 4.5.4 Applications of exponential random variables

• 4.6 Gaussian Random Variables

• 4.6.1 Definition of a Gaussian random variable

• 4.6.2 Standard Gaussian

• 4.6.3 Skewness and Kurtosis

• 4.6.4 Origin of Gaussian random variables

• 4.7 Functions of Random Variables

• 4.7.1 General principle

• 4.7.2 Worked examples

• 4.8 Generating Random Numbers

• 4.8.1 Principle

• 4.8.2 Examples

## Chapter 5 Joint Distributions

• 5.1 Joint PMF and Joint PDF

• 5.1.1 Probability measure in 2D

• 5.1.2 Discrete random variables

• 5.1.3 Continuous random variables

• 5.1.4 Normalization

• 5.1.5 Marginal PMF and marginal PDF

• 5.1.6 Independent random variables

• 5.1.7 Joint CDF

• 5.2 Joint Expectation

• 5.2.1 Definition and interpretation

• 5.2.2 Covariance and correlation coeffcient

• 5.2.3 Independence and correlation

• 5.2.4 Computing correlation from data

• 5.3 Conditional PMF and PDF

• 5.3.1 Conditional PMF

• 5.3.2 Conditional PDF

• 5.4 Conditional Expectation

• 5.4.1 Definition

• 5.4.2 Law of total expectation

• 5.5 Sum of Two Random Variables

• 5.5.1 Intuition through convolution

• 5.5.2 Main result

• 5.5.3 Sum of common distributions

• 5.6 Random Vector and Covariance Matrices

• 5.6.1 PDF of random vectors

• 5.6.2 Expectation of random vectors

• 5.6.3 Covariance matrix

• 5.6.4 Multi-dimensional Gaussian

• 5.7 Transformaiton of Multi-dimensional Gaussian

• 5.7.1 Linear transformation of mean and covariance

• 5.7.2 Eigenvalues and eigenvectors

• 5.7.3 Covariance matrices are always positive semi-definite

• 5.7.4 Gaussian whitening

• 5.8 Principal Component Analysis

• 5.8.1 The main idea: Eigen-decomposition

• 5.8.2 The Eigenface problem

• 5.8.3 What cannot be analyzed by PCA?

## Chapter 6 Sample Statistics

• 6.1 Moment Generating and Characteristic Functions

• 6.1.1 Moment Generating Function

• 6.1.2 Sum of independent variables via MGF

• 6.1.3 Characteristic Functions

• 6.2 Probability Inequalities

• 6.2.1 Union bound

• 6.2.2 Cauchy-Schwarz's inequality

• 6.2.3 Jensen's inequality

• 6.2.4 Markov's inequality

• 6.2.5 Chebyshev's inequality

• 6.2.6 Chernoff's bound

• 6.2.7 Comparing Chernoff and Chebyshev

• 6.2.8 Hoeffding's inequality

• 6.3 Law of Large Numbers

• 6.3.1 Sample average

• 6.3.2 Weak law of large numbers (WLLN)

• 6.3.3 Convergence in probability

• 6.3.4 Can we prove WLLN using Chernoff's bound?

• 6.3.5 Does weak of large numbers always hold?

• 6.3.6 Strong law of large numbers

• 6.3.7 Almost sure convergence

• 6.3.8 Proof of strong law of large numbers

• 6.4 Central Limit Theorem

• 6.4.1 Convergence in distribution

• 6.4.2 Central Limit Theorem

• 6.4.3 Examples

• 6.4.4 Limitation of the Central Limit Theorem

## Chapter 7 Regression

• 7.1 Principles of Regression

• 7.1.1 Intuition: how to fit a straight line?

• 7.1.2 Solving the linear regression problem

• 7.1.3 Extension: Beyond a straight line

• 7.1.4 Over-determined and under-determined systems

• 7.1.5 Robust linear regression

• 7.2 Over-ftting

• 7.2.1 Overview of overfitting

• 7.2.2 Analysis of the linear case

• 7.2.3 Interpreting the linear analysis results

• 7.3 Bias and variance trade off

• 7.3.1 Decomposing the testing error

• 7.3.2 Analysis of the bias

• 7.3.3 Variance

• 7.3.4 Bias and variance on the learning curve

• 7.4 Regularization

• 7.4.1 Ridge regularization

• 7.4.2 LASSO regularization

## Chapter 8 Estimation

• 8.1 Maximum-Likelihood Estimation

• 8.1.1 Likelihood function

• 8.1.2 Maximum-likelihood estimate

• 8.1.3 Application 1: Social network analysis

• 8.1.4 Application 2: Reconstructing images

• 8.1.5 More examples on ML estimation

• 8.1.6 Regression vs ML estimation

• 8.2 Properties of ML Estimates

• 8.2.1 Estimators

• 8.2.2 Unbiased estimators

• 8.2.3 Consistent estimators

• 8.2.4 Invariance principle

• 8.3 Maximum-A-Posteriori Estimation

• 8.3.1 The trio of likelihood, prior, and posterior

• 8.3.2 Understanding the priors

• 8.3.3 MAP formulation and solution

• 8.3.4 Analyzing the MAP solution

• 8.3.5 Analysis of the posterior distribution

• 8.3.6 Conjugate Prior

• 8.3.7 Linking MAP with regression

• 8.4 Mean-Square Error Estimation

• 8.4.1 Positioning the mean square error estimation

• 8.4.2 Mean square error

• 8.4.3 MMSE solution = conditional expectation

• 8.4.4 MMSE estimator for multi-dimensional Gaussian

• 8.4.5 Linking MMSE and neural networks

## Chapter 9 Confidence and Hypothesis

• 9.1 Confidence Interval

• 9.1.1 The randomness of an estimator

• 9.1.2 Understanding confidence intervals

• 9.1.3 Constructing a confidence interval

• 9.1.4 Properties about the confidence interval

• 9.1.5 Student's -distribution

• 9.1.6 Comparing Student's -distribution and Gaussian

• 9.2 Bootstrap

• 9.2.1 A brute force approach

• 9.2.2 Bootstrap

• 9.3 Hypothesis Testing

• 9.3.1 What is a hypothesis?

• 9.3.2 Critical-value test

• 9.3.3 -value test

• 9.3.4 -test and -test

• 9.4 Neyman-Pearson Test

• 9.4.1 Null and alternative distributions

• 9.4.2 Type 1 and type 2 error

• 9.4.3 Neyman-Pearson decision

• 9.5 ROC and Precision-Recall Curve

• 9.5.1 Receiver Operating Characteristic (ROC)

• 9.5.2 Comparing ROC curves

• 9.5.3 ROC curve in practice

• 9.5.4 Precision-Recall (PR) curve

## Chapter 10 Random Processes

• 10.1 Basic Concepts

• 10.1.1 Everything you need to know about a random process

• 10.1.2 Statistical and temporal perspectives

• 10.2 Mean and Correlation Functions

• 10.2.1 Mean function

• 10.2.2 Autocorrelation function

• 10.2.3 Independent Processes

• 10.3 Wide Sense Stationary Processes

• 10.3.1 Definition of a WSS process

• 10.3.2 Properties of • 10.3.3 Physical Interpretation of • 10.4 Power Spectral Density

• 10.4.1 Basic concepts

• 10.4.2 Origin of the power spectral density

• 10.5 WSS Process through LTI Systems

• 10.5.1 Review of a linear time-invariant (LTI) system

• 10.5.2 Mean and autocorrelation through LTI Systems

• 10.5.3 Power spectral density through LTI systems

• 10.5.4 Cross-correlation through LTI Systems

• 10.6 Optimal Linear Filter

• 10.6.1 Discrete-time random processes

• 10.6.2 Problem formulation

• 10.6.3 Yule-Walker equation

• 10.6.4 Linear prediction

• 10.6.5 Wiener Filter