ISLR - Linear Regression - Simple Linear Regression

214 阅读1分钟

In particular, linear regression is a useful tool for predicting a quantitative response, it serves as a good jumping-off point for newer approaches: as we will see in later chapters, many fancy statistical learning approaches can be seen as generalizations or extensions of linear regression.

Y ≈ β0 + β1X - regressing Y on X (or Y onto X)

yˆ = βˆ 0 + βˆ 1 x - use a hat symbol, ˆ , to denote the estimated value for an unknown parameter or coefficient, or to denote the predicted value of the response

1 Estimating the Coefficients

2 Assessing the Accuracy of the Coefficient Estimates

3 Assessing the Accuracy of the Model

Residual Standard Error

associated with each observation is an error term ε. Due to the presence of these error terms, even if we knew the true regression line (i.e. even if β0 and β1 were known), we would not be able to perfectly predict Y from X. The RSE is an estimate of the standard deviation of ε. Roughly speaking, it is the average amount that the response will deviate from the true regression line. It is computed using the formula

R2 Statistic

The RSE provides an absolute measure of lack of fit of the model to the data. But since it is measured in the units of Y , it is not always clear what constitutes a good RSE. The R2 statistic provides an alternative measure of fit. It takes the form of a proportion—the proportion of variance explained—and so it always takes on a value between 0 and 1, and is independent of the scale of Y .

An R2 statistic that is close to 1 indicates that a large proportion of the variability in the response has been explained by the regression. A number near 0 indicates that the regression did not explain much of the variability in the response; this might occur because the linear model is wrong, or the inherent error σ2 is high, or both.