This is the
standard model in many statistical problems. The data set contains several
predictors and one or more responses. The
usual regression model is as follows:
y = b0 + b1 x1 + b2 x2 + b3 x3 +...+ bP xP + Error
The regression
parameters are estimated by least squares. Fitted values and residuals are
obtained the usual way:
yi = b0 + b1 xi1 +...+ bp xip + ri
Example: This example comes from
Urban Housing in a US city in 1997. The cases correspond to urban units
each of the size of a neighborhood. In each unit we observe a response
that gives the increase in property taxes for the residential housing units in
that neighborhood. The predictors are several demographic variables describing
the type unit plus four variables given expenditures in transportation and in
roads. The question is to try to asses the impact of expenditures in
transportation and roads on property values. This dataset like many other
contains a number of outlier and leverage points.
libname mylib spss 'Reg.por';
options
linesize=70 pagesize=55;
data a;
format
IN80BO IN80C IN90C INBBID INBP90 INCENP INCODE INCR80 INCR90
INEMP
INMAIN INMAIN2 INMAIN3 INNV INO INREST INT79 INT80 INT90 INV90
INVAL
SEV79 SEV80 SEVR80 SFAMV VAPT79 VAPT80 VCOM79 VCOM80 VFRM79
VFRM80
VIND79 VIND80 VRES79 VRES80 VTOT79 VTOT80 VVAL79 VVAL80 comma8.
NAMEV1 $CHAR200.;
set mylib._first_;
if _N_ < 100;
run;
proc plot;
plot ravlchm*(roadacc roadcap transacc transcap);
run;
proc reg data=a;
model ravlchm = hhinc black hslds setr vac indexr pstu
roadacc roadcap transacc transcap/P R COLLIN;
Reg.sas Output file from "reg.sas" Data File
We find that observations 12 and 13 are outliers so
we may omit them for now.
We repeat the regression again and we find that #2 is also an outlier.
proc reg data=a;
model ravlchm = hhinc black
hslds setr vac indexr pstu
roadacc roadcap transacc transcap/P R COLLIN;
output out=b p=pred r=res;
proc plot data=b;
plot res*pred res*(roadacc roadcap
transacc transcap);
run;
proc reg data=a;
model ravlchm = hhinc black
hslds setr vac indexr pstu
roadacc roadcap transacc transcap/selection=adjrsq;
proc reg data=a;
model ravlchm = hhinc black
hslds vac indexr roadacc
roadcap transacc transcap;
Then we repeat and
we find new outliers and so on
On the other hand the robust procedure yields the output
Intercept
HHINC BLACK
HSLDS SETR
-72.54527
0.00780025 -0.1025825 0.1485328 -36.75587
VAC INDEXR
0.1651357
-0.001151889
PSTU ROADACC ROADCAP TRANSACC TRANSCAP
4.031584
5.032116 48.80187 1.165922 166.9255