![]() |
|||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Regression Analysis
The regression model is used to answer the question whether tow variables are related. An example is the relation between high blood pressure of the model and birthweight of the child. Is high blood pressure a prognostic factor of low birth weight? If y=birthweight and x=bloodpressure the regression line is y=a + bx where a is the intercept, and b is the slope of the regression line. This relation does not hold for every women. The outcome y is the average birthweight of the women with the same bloodpressure. To complete the model we need an error term e which represents deviations from the regression line. This error term is normally distributed with mean zero and variance sigma^2. The variance is a measure of quality of the regression line. Large values of sigma^2 indicate more scatter around the regression line. An extension is multiple correlation analysis where three or more variables are considered. In the study of birth weight in addition to high blood pressure, the cholesterol level of the mother would also be considered. A regression model is usefull for the following purposes. The regression model is the most direct approach offering quantities such as the multiple correlation coeffcient. The standard error of the estimate is the square root of the residual mean square given in the ANOVA table. Another name for the standard error of the estimate is the standard deviation of the residuals. This quantity is usefull in the calculations of confidence intervals of new values. The multiple correlation coeffcient describes how well the model fits. The square of this measure is the proportion explained variance by the model.
Anova Table
Observation = Model + Residual
This decomposition of the data also applies to the sum of squares of the deviations from the mean.
If we subtract the average of the observations we have
SS_Total = SS_regression + SS_residual
where SS is the abreviation of sum of squares. A similar equations holds for the degrees of freedom
df_Total = df_regression + df_residual
The other quantities in the ANOVA table follow from the equations given above. The mean square is the
sum of squares divided by the degrees of freedom. The F statistic is the mean square due to regression divided by the mean square due
to residual, and tests the hypothesis that all coefficients (except the intercept) are zero.
|