Probability for Data Science
eBook  ›  Chapter 7 · Regression
Chapter 7

Summary

Regression is one of the most widely used techniques in data science. The formulation of the regression problem is as simple as setting up a system of linear equations:

$$\minimize{\vtheta \in \R^d} \;\; \|\mX\vtheta-\vy\|^2,$$

which has a closed-form solution. The biggest problems in practice are outliers, lack of training samples, and poor choice of the regression model.