Why Another Probability Textbook?

Data Science = Probability + Programming

As I write the book, I do a fairly exhaustive search of the available textbooks on this subject. From what I see from the tsunami of data science books, there are essentially two categories:

The first type of books are written for programmers. These books put a strong emphasis on how to process the data by calling standard and/or customized libraries for various statistical tasks. Theories are usually explained at the high-level, with the goal to provide a reference point to the readers but not diving deep into the materials. I am not overlooking the importance of these books, but many people, especially college students, need a more solid mathematical training so that they can solve more difficult problems.

The other type of books are the classical probability textbooks written for mathematicians. This includes all college students who study science and engineering. Every insitution and every instructor has his/her own preferred book when teaching probability. However, while these textbooks are mathematically rigorous, majority of the students are not interested in reading them page by page. (At least I was not enthusiastic about these books when I was an undergraduate student.) Why? They are boring. It is easy to get lost in the theorems. The theorems are not directly connected to practical engineering problems. To be more precise, after I became a faculty at Purdue, I heard these comments from time to time:

  • Why should we learn probability?

  • How can flipping a coin be useful in modern data science?

  • Why is correlation the expectation of a product of random variables?

  • Why does the Gaussian have a bell shape?

  • How to fit data with a line?

  • How to tell whether a change is statistically significant?

When you look at the two ends of this spectrum, I hope you can see the gap  —  We need a book that balances the theory and practice. We need a book that provides insights and not just theorems and proofs. We need a book that motivates the students, telling them why probability is so essential to their work. We need a book that highlights the impacts of the subject. From over than half a decade of teaching the course, I have distilled what I believe to be the core of probabilistic methods. I put the book in the context of data science, to emphasize the inseparability between data (computing) and probability (theory) in our time.

Unique features of this book

  • Broad coverage from classical probability theory to modern data analytic techniques

  • Geometric and pictorial explanations of concepts

  • Tight integration with MATLAB / Python

  • Practical applications in machine learning

Brief history of this book

I joined Purdue University in 2014 as an assistant professor of ECE and Statistics. On the first day I walked into the classroom, I started to teach the undergraduate probability class. Over the years, the course has been constantly improving and the collection of the materials is expanding. With the additional opportunities to create and teach several other courses, I decided to put the materials together into the textbook. Therefore, the book is the reflection of several courses I have been teaching:

  • ECE 20875 (was ECE 295) Introduction to Data Science with Python (Sophomore)

  • ECE 302 Probabilistic Methods for Electrical and Computer Engineering (Junior-Senior)

  • ECE 595ML Machine Learning (Graduate)

  • ECE 645 Estimation Theory (Graduate)

Why free textbooks?

This book is FREE for everyone around the world. You may download a free PDF copy from the download page. If you want a hard copy, you may purchase one with the publisher at a discounted price (to cover the printing cost).

Some people ask how much money I can make from this book. The answer is ZERO. There is not a single penny that goes to my pocket. Why do I do that? Textbooks today are just ridiculously expensive. Below is a news article published in 2015. I want to slow down the trend. Education should be accessible to as many people as possible, especially to those underpreviledged families. To accomplish this goal I have minimized all expensive editorial and marketing services. So I need your help to promote the book and tell me if you see an editorial error.