Multiple Linear Regression Example

This is the standard model in many statistical problems. The data set contains several predictors and one or more responses.  The usual regression model is as follows:

y =  b0 + b1 x1 + b2 x2 + b3 x3 +...+ bP xP + Error

The regression parameters are estimated by least squares. Fitted values and residuals are obtained the usual way:

yi = b0 + b1 xi1 +...+ bp xip + ri

Example:  This example comes from Urban Housing in a US city in 1997.  The cases correspond to urban units each of the size of a neighborhood.  In each unit we observe a response that gives the increase in property taxes for the residential housing units in that neighborhood. The predictors are several demographic variables describing the type unit plus four variables given expenditures in transportation and in roads.  The question is to try to asses the impact of expenditures in transportation and roads on property values.  This dataset like many other contains a number of outlier and leverage points.  

libname mylib spss 'Reg.por';
options linesize=70 pagesize=55;
data a;

format  IN80BO IN80C IN90C INBBID INBP90 INCENP INCODE INCR80 INCR90
INEMP INMAIN INMAIN2 INMAIN3 INNV INO INREST INT79 INT80 INT90 INV90
INVAL SEV79 SEV80 SEVR80 SFAMV VAPT79 VAPT80 VCOM79 VCOM80 VFRM79
VFRM80 VIND79 VIND80 VRES79 VRES80 VTOT79 VTOT80 VVAL79 VVAL80 comma8.
NAMEV1 $CHAR200.;
set mylib._first_;
if _N_ < 100;
run;
proc plot;
plot ravlchm*(roadacc roadcap transacc transcap);
run;
proc reg data=a;
model  ravlchm = hhinc black hslds setr vac indexr pstu

           roadacc roadcap transacc transcap/P R COLLIN;

Reg.sas  Output file from "reg.sas"  Data File

 

We find that observations 12 and  13 are outliers so we may omit them for now.
We repeat the regression again and we find that #2 is also an outlier.

proc reg data=a;
model  ravlchm = hhinc black hslds setr vac indexr pstu
           roadacc roadcap transacc transcap/P R COLLIN;
output out=b p=pred r=res;
proc plot data=b;
plot res*pred res*(roadacc roadcap transacc transcap);
run;
proc reg data=a;
model  ravlchm = hhinc black hslds setr vac indexr pstu
           roadacc roadcap transacc transcap/selection=adjrsq;
proc reg data=a;
model  ravlchm = hhinc black hslds vac indexr roadacc
                  roadcap transacc transcap;

Output file from "reg1.sas"

Then we repeat and we find new outliers and so on

Output file from "reg2.sas"

On the other hand the robust procedure yields the output

  Intercept      HHINC      BLACK     HSLDS      SETR
 -72.54527 0.00780025 -0.1025825 0.1485328 -36.75587
        VAC       INDEXR
  0.1651357 -0.001151889
     PSTU  ROADACC  ROADCAP TRANSACC TRANSCAP
 4.031584 5.032116 48.80187 1.165922 166.9255