This book is free for everyone around the world.
You may download a free PDF copy from the download page. If you want a hard copy, you can purchase one from the publisher at a discounted price to cover the printing cost.
Some people ask how much money I make from this book. The answer is zero — not a single penny goes to my pocket. Why? Textbooks today are just ridiculously expensive, and I want to slow down the trend. Education should be accessible to as many people as possible, especially to those from underprivileged families. To accomplish this, I have minimized all expensive editorial and marketing services — so I need your help to promote the book, and to tell me if you spot an editorial error.
As I wrote this book, I did a fairly exhaustive search of the available textbooks on the subject. From the tsunami of data science books, there are essentially two categories.
The first type are written for programmers. They emphasize processing data by calling standard or customized libraries for various statistical tasks. Theories are explained at a high level — a reference point rather than a deep dive. These books matter, but many people, especially college students, need more solid mathematical training to solve harder problems.
The other type are classical probability textbooks written for mathematicians. While mathematically rigorous, most students are not interested in reading them page by page. Why? They can be boring — it is easy to get lost in the theorems, and the theorems are not directly connected to practical engineering problems. As a faculty member at Purdue, I heard these comments time and again:
Between the two ends of this spectrum lies a gap. We need a book that balances theory and practice — one that provides insights, not just theorems and proofs; that motivates students by telling them why probability is essential to their work; and that highlights the impact of the subject. I put the book in the context of data science to emphasize the inseparability of data (computing) and probability (theory) in our time.
This is one of the best introductory books on probability that I have seen. It is rigorous, yet intuitive. It is full of beautiful illustrations and easy-to-understand code samples (in Python and Matlab). Before introducing each new theoretical concept, the author gives reasons for why the material is important in practice, thus providing motivation for learning it. The title focuses on "Data Science" but in fact this book could be used to provide a thorough introduction to probability for any STEM student.
— Kevin Murphy Research Scientist at Google DeepMind Author of Probabilistic Machine Learning
This is an excellent textbook for undergraduate EE and CS students, with thorough coverage of a wide range of topics, including fundamentals such as probability spaces, random variables, and sample statistics, as well as more applied problems such as regression, estimation, and hypothesis testing. New concepts are introduced with clear, intuitive explanations, followed by more rigorous theory. The book is beautifully illustrated with numerous diagrams, plots, and other visual illustrations, and the frequent computational examples play a valuable role in connecting theory and practice. It is also worth noting that the author has made this book available at no cost, despite the enormous effort that was clearly dedicated to writing it.
— Brendt Wohlberg Scientist at Los Alamos National Laboratory Former Editor-in-Chief, IEEE Transactions on Computational Imaging