Search












Regression Analysis
The regression model is used to answer the question whether tow variables are related. An example is the relation between high blood pressure of the model and birthweight of the child. Is high blood pressure a prognostic factor of low birth weight? If y=birthweight and x=bloodpressure the regression line is

y=a + bx

where a is the intercept, and b is the slope of the regression line. This relation does not hold for every women. The outcome y is the average birthweight of the women with the same bloodpressure. To complete the model we need an error term e which represents deviations from the regression line. This error term is normally distributed with mean zero and variance sigma^2. The variance is a measure of quality of the regression line. Large values of sigma^2 indicate more scatter around the regression line.

An extension is multiple correlation analysis where three or more variables are considered. In the study of birth weight in addition to high blood pressure, the cholesterol level of the mother would also be considered.

A regression model is usefull for the following purposes.

  • to describe how the response changes when one of the explanatory variables is changed;
  • to describe the relation between the response and one or more explanatory variables;
  • or to predict the value of the response for a new observation with known variables.

    The regression model is the most direct approach offering quantities such as the multiple correlation coeffcient. The standard error of the estimate is the square root of the residual mean square given in the ANOVA table. Another name for the standard error of the estimate is the standard deviation of the residuals. This quantity is usefull in the calculations of confidence intervals of new values. The multiple correlation coeffcient describes how well the model fits. The square of this measure is the proportion explained variance by the model.

    Anova Table


    The ANOVA table summarizes results from a regression analyis or ANOVA. The bases is the equation

    Observation = Model + Residual

    This decomposition of the data also applies to the sum of squares of the deviations from the mean. If we subtract the average of the observations we have

    SS_Total = SS_regression + SS_residual

    where SS is the abreviation of sum of squares. A similar equations holds for the degrees of freedom

    df_Total = df_regression + df_residual

    The other quantities in the ANOVA table follow from the equations given above. The mean square is the sum of squares divided by the degrees of freedom. The F statistic is the mean square due to regression divided by the mean square due to residual, and tests the hypothesis that all coefficients (except the intercept) are zero.

    Anova TableSum of SquaresdfMean SquareF Statisticp-value
    Regression 20.896981 20.89698 1.67514 0.21809
    Residual 162.1723513 12.4748
    Total 183.0693314