thr3ads.net - R help - [R] lm() and dffits [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Ranney, Steven

2008-Aug-29 22:20 UTC

[R] lm() and dffits

All -

My question is a bit involved, so bear with me.

I have some data that looks like:
Lake	LL	LW
81	2.176091259	1.342422681
81	2.176091259	1.414973348
81	2.176091259	1.447158031
81	2.181843588	1.414973348
81	2.181843588	1.447158031
81	2.184691431	1.462397998
81	2.187520721	1.447158031
81	2.187520721	1.477121255
81	2.187520721	1.505149978
...
[truncated]

I'm trying to:
1) fit a simple lm(LW~LL)
2) calculate the dffits for those data points
3) remove those data points that are 2*sqrt(p/n) (where p=the number of 
parameters and n=number of data points; p=3 in a linear model, correct?  
Intercept, slope, and error term?)
4) rerun the model MINUS those data points
5) compare the two lm()

Now, each of these steps I can do seperately, but only by outputting the 
dffits to a .csv then removing the large dffits by hand, reading the .csv 
back into R, rerunning the lm(), and comparing the first lm() to the second 
lm().  I would imagine that there is a better (easier, I hope!) way to doing 
all of this.  Any ideas?  

My programming knowledge of R is rather limited but getting better all the 
time thanks to this board and the R-help archive.

Thanks, 

SR 

Steven H. Ranney
 
Graduate Research Assistant (Ph.D) 
USGS Montana Cooperative Fishery Research Unit 
Montana State University 
P.O. Box 173460 
Bozeman, MT 59717-3460 

phone: (406) 994-6643 
fax: (406) 994-7479


	[[alternative HTML version deleted]]

Dieter Menne

2008-Aug-31 20:07 UTC

head link

[R] lm() and dffits

Ranney, Steven <steven.ranney <at> montana.edu> writes:
> 1) fit a simple lm(LW~LL)
> 2) calculate the dffits for those data points
> 3) remove those data points that are 2*sqrt(p/n) (where p=the number of 
> parameters and n=number of data points; p=3 in a linear model, correct?  
> Intercept, slope, and error term?)
> 4) rerun the model MINUS those data points
> 5) compare the two lm()
> 
> Now, each of these steps I can do seperately, but only by outputting the 
> dffits to a .csv then removing the large dffits by hand, reading the .csv 
> back into R, rerunning the lm(), and comparing the first lm() to the second
> lm().  I would imagine that there is a better (easier, I hope!) way to
doing
> all of this.  Any ideas?  
> 
You could do the following:

# --------------------
x = rnorm(100)
y=rnorm(100)
y[40] = y[40]+30 # generate outliere
df = data.frame(x=x,y=y)
lmfit1 = lm(y~x, data=df) # fit all data
thresh = 3 # Choose any data-dependent threshold
nice = abs(dffits(lmfit)) < thresh
# note that nice[40] is the only  FALSE
df2 = df[nice,]
lmfit2 = lm(y~x, data=df2)

summary(lmfit1)
summary(lmfit2)
# --------------------

However, this is a bit Denver-Style Home-Brewery. Instead of using this 
ad-hoc method, you are probably better off using one of the robust methods, for
example in MASS.

Dieter

Reasonably Related Threads

Search for more possibly parallel threads

R help - Aug 2008 - lm() and dffits

[R] lm() and dffits

[R] lm() and dffits

Reasonably Related Threads