Table of Content

Chapter 1 Mathematical Background

1.1 Infinite series
- 1.1.1 Geometric series
- 1.1.2 Binomial Series
1.2 Approximation
- 1.2.1 Taylor approximation
- 1.2.2 Exponential series
- 1.2.3 Logarithmic approximation
1.3 Integration
- 1.3.1 Odd and even functions
- 1.3.2 Fundamental theorem of calculus
1.4 Linear Algebra
- 1.4.1 Why do we need linear algebra in data science?
- 1.4.2 Everything you need to know about linear algebra
- 1.4.3 Inner products and norms
- 1.4.4 Matrix calculus
1.5 Basic Combinatorics
- 1.5.1 Birthday paradox
- 1.5.2 Permutation
- 1.5.3 Combination

Chapter 2 Probability

2.1 Set Theory
- 2.1.1 Why study set theory?
- 2.1.2 Basic concepts of a set
- 2.1.3 Subsets
- 2.1.4 Empty set and universal set
- 2.1.5 Union
- 2.1.6 Intersection
- 2.1.7 Complement and difference
- 2.1.8 Disjoint and partition
- 2.1.9 Set operations
- 2.1.10 Closing Remark
2.2 Probability Space
- 2.2.1 Sample space
- 2.2.2 Event space
- 2.2.3 Probability law
- 2.2.4 Measure zero sets
- 2.2.5 Summary of the probability space
2.3 Axioms of Probability
- 2.3.1 Why these three probability axioms?
- 2.3.2 Axioms through the lens of measure
- 2.3.3 Corollaries derived from axioms
2.4 Conditional Probability
- 2.4.1 Definition of conditional probability
- 2.4.2 Independence
- 2.4.3 Bayes’ theorem and law of total probability
- 2.4.4 Prisoner's dilemma

Chapter 3 Discrete Random Variables

3.1 Random Variables
- 3.1.1 A motivating example
- 3.1.2 Definition of a random variable
- 3.1.3 Probability measure on random variables
3.2 Probability Mass Function
- 3.2.1 Definition
- 3.2.2 PMF and probability measure
- 3.2.3 Normalization Property
- 3.2.4 PMF vs Histogram
- 3.2.5 Estimating histograms from real data
3.3 Cumulative Distribution Function
- 3.3.1 Definition
- 3.3.2 Properties of CDF
- 3.3.3 Converting between PMF and CDF
3.4 Expectation
- 3.4.1 Definition
- 3.4.2 Existence of Expectation
- 3.4.3 Properties of Expectation
- 3.4.4 Moments and Variance
3.5 Common Discrete Random Variables
- 3.5.1 Bernoulli Random Variable
- 3.5.2 Binomial random variable
- 3.5.3 Geometric random variable
- 3.5.4 Poisson random variable

Chapter 4 Continuous Random Variables

4.1 Probability Density Function
- 4.1.1 Some intuition about probability density functions
- 4.1.2 More in-depth discussion about PDFs
- 4.1.3 Connecting with PMF
4.2 Expectation, Moment, and Variance
- 4.2.1 Definition and properties
- 4.2.2 Existence of Expectation
- 4.2.3 Moment and Variance
4.3 Cumulative Distribution Function
- 4.3.1 CDF for continuous random variables
- 4.3.2 Properties of CDF
- 4.3.3 Retrieving PDF from CDF
- 4.3.4 CDF: Unifying discrete and continuous random variables
4.4 Median, Mode, and Mean
- 4.4.1 Median
- 4.4.2 Mode
- 4.4.3 Mean
4.5 Uniform and Exponential Random Variables
- 4.5.1 Uniform Random Variable
- 4.5.2 Exponential Random Variable
- 4.5.3 Origin of exponential random variable
- 4.5.4 Applications of exponential random variables
4.6 Gaussian Random Variables
- 4.6.1 Definition of a Gaussian random variable
- 4.6.2 Standard Gaussian
- 4.6.3 Skewness and Kurtosis
- 4.6.4 Origin of Gaussian random variables
4.7 Functions of Random Variables
- 4.7.1 General principle
- 4.7.2 Worked examples
4.8 Generating Random Numbers
- 4.8.1 Principle
- 4.8.2 Examples

Chapter 5 Joint Distributions

5.1 Joint PMF and Joint PDF
- 5.1.1 Probability measure in 2D
- 5.1.2 Discrete random variables
- 5.1.3 Continuous random variables
- 5.1.4 Normalization
- 5.1.5 Marginal PMF and marginal PDF
- 5.1.6 Independent random variables
- 5.1.7 Joint CDF
5.2 Joint Expectation
- 5.2.1 Definition and interpretation
- 5.2.2 Covariance and correlation coeffcient
- 5.2.3 Independence and correlation
- 5.2.4 Computing correlation from data
5.3 Conditional PMF and PDF
- 5.3.1 Conditional PMF
- 5.3.2 Conditional PDF
5.4 Conditional Expectation
- 5.4.1 Definition
- 5.4.2 Law of total expectation
5.5 Sum of Two Random Variables
- 5.5.1 Intuition through convolution
- 5.5.2 Main result
- 5.5.3 Sum of common distributions
5.6 Random Vector and Covariance Matrices
- 5.6.1 PDF of random vectors
- 5.6.2 Expectation of random vectors
- 5.6.3 Covariance matrix
- 5.6.4 Multi-dimensional Gaussian
5.7 Transformaiton of Multi-dimensional Gaussian
- 5.7.1 Linear transformation of mean and covariance
- 5.7.2 Eigenvalues and eigenvectors
- 5.7.3 Covariance matrices are always positive semi-definite
- 5.7.4 Gaussian whitening
5.8 Principal Component Analysis
- 5.8.1 The main idea: Eigen-decomposition
- 5.8.2 The Eigenface problem
- 5.8.3 What cannot be analyzed by PCA?

Chapter 6 Sample Statistics

6.1 Moment Generating and Characteristic Functions
- 6.1.1 Moment Generating Function
- 6.1.2 Sum of independent variables via MGF
- 6.1.3 Characteristic Functions
6.2 Probability Inequalities
- 6.2.1 Union bound
- 6.2.2 Cauchy-Schwarz's inequality
- 6.2.3 Jensen's inequality
- 6.2.4 Markov's inequality
- 6.2.5 Chebyshev's inequality
- 6.2.6 Chernoff's bound
- 6.2.7 Comparing Chernoff and Chebyshev
- 6.2.8 Hoeffding's inequality
6.3 Law of Large Numbers
- 6.3.1 Sample average
- 6.3.2 Weak law of large numbers (WLLN)
- 6.3.3 Convergence in probability
- 6.3.4 Can we prove WLLN using Chernoff's bound?
- 6.3.5 Does weak of large numbers always hold?
- 6.3.6 Strong law of large numbers
- 6.3.7 Almost sure convergence
- 6.3.8 Proof of strong law of large numbers
6.4 Central Limit Theorem
- 6.4.1 Convergence in distribution
- 6.4.2 Central Limit Theorem
- 6.4.3 Examples
- 6.4.4 Limitation of the Central Limit Theorem

Chapter 7 Regression

7.1 Principles of Regression
- 7.1.1 Intuition: how to fit a straight line?
- 7.1.2 Solving the linear regression problem
- 7.1.3 Extension: Beyond a straight line
- 7.1.4 Over-determined and under-determined systems
- 7.1.5 Robust linear regression
7.2 Over-ftting
- 7.2.1 Overview of overfitting
- 7.2.2 Analysis of the linear case
- 7.2.3 Interpreting the linear analysis results
7.3 Bias and variance trade off
- 7.3.1 Decomposing the testing error
- 7.3.2 Analysis of the bias
- 7.3.3 Variance
- 7.3.4 Bias and variance on the learning curve
7.4 Regularization
- 7.4.1 Ridge regularization
- 7.4.2 LASSO regularization

Chapter 8 Estimation

8.1 Maximum-Likelihood Estimation
- 8.1.1 Likelihood function
- 8.1.2 Maximum-likelihood estimate
- 8.1.3 Application 1: Social network analysis
- 8.1.4 Application 2: Reconstructing images
- 8.1.5 More examples on ML estimation
- 8.1.6 Regression vs ML estimation
8.2 Properties of ML Estimates
- 8.2.1 Estimators
- 8.2.2 Unbiased estimators
- 8.2.3 Consistent estimators
- 8.2.4 Invariance principle
8.3 Maximum-A-Posteriori Estimation
- 8.3.1 The trio of likelihood, prior, and posterior
- 8.3.2 Understanding the priors
- 8.3.3 MAP formulation and solution
- 8.3.4 Analyzing the MAP solution
- 8.3.5 Analysis of the posterior distribution
- 8.3.6 Conjugate Prior
- 8.3.7 Linking MAP with regression
8.4 Mean-Square Error Estimation
- 8.4.1 Positioning the mean square error estimation
- 8.4.2 Mean square error
- 8.4.3 MMSE solution = conditional expectation
- 8.4.4 MMSE estimator for multi-dimensional Gaussian
- 8.4.5 Linking MMSE and neural networks

Chapter 9 Confidence and Hypothesis

9.1 Confidence Interval
- 9.1.1 The randomness of an estimator
- 9.1.2 Understanding confidence intervals
- 9.1.3 Constructing a confidence interval
- 9.1.4 Properties about the confidence interval
- 9.1.5 Student's -distribution
- 9.1.6 Comparing Student's -distribution and Gaussian
9.2 Bootstrap
- 9.2.1 A brute force approach
- 9.2.2 Bootstrap
9.3 Hypothesis Testing
- 9.3.1 What is a hypothesis?
- 9.3.2 Critical-value test
- 9.3.3 -value test
- 9.3.4 -test and -test
9.4 Neyman-Pearson Test
- 9.4.1 Null and alternative distributions
- 9.4.2 Type 1 and type 2 error
- 9.4.3 Neyman-Pearson decision
9.5 ROC and Precision-Recall Curve
- 9.5.1 Receiver Operating Characteristic (ROC)
- 9.5.2 Comparing ROC curves
- 9.5.3 ROC curve in practice
- 9.5.4 Precision-Recall (PR) curve

Chapter 10 Random Processes

10.1 Basic Concepts
- 10.1.1 Everything you need to know about a random process
- 10.1.2 Statistical and temporal perspectives
10.2 Mean and Correlation Functions
- 10.2.1 Mean function
- 10.2.2 Autocorrelation function
- 10.2.3 Independent Processes
10.3 Wide Sense Stationary Processes
- 10.3.1 Definition of a WSS process
- 10.3.2 Properties of
- 10.3.3 Physical Interpretation of
10.4 Power Spectral Density
- 10.4.1 Basic concepts
- 10.4.2 Origin of the power spectral density
10.5 WSS Process through LTI Systems
- 10.5.1 Review of a linear time-invariant (LTI) system
- 10.5.2 Mean and autocorrelation through LTI Systems
- 10.5.3 Power spectral density through LTI systems
- 10.5.4 Cross-correlation through LTI Systems
10.6 Optimal Linear Filter
- 10.6.1 Discrete-time random processes
- 10.6.2 Problem formulation
- 10.6.3 Yule-Walker equation
- 10.6.4 Linear prediction
- 10.6.5 Wiener Filter