John McHenry
2009-Feb-04 02:10 UTC
[R] Generic functions with models? 'predict', 'residuals', 'fitted', ...?
Hi WizaRds!
I don't know if my understanding of models is screwed up or
if there's a fundamental limitation to the way they are
used in R.
Here's what I'd like to do: create a model and then test data
against that model using generics like 'predict' and
'residuals'.
The problem is that generics don't work for new data sets, except
of course 'predict':
set.seed(1)
df<- data.frame(matrix(rnorm(10*4), nc=4))
cn<- c("y", "x1", "x2", "x3")
colnames(df)<- cn
model<- lm(y ~ x1 + x2 + x3 + 1, data=df)
u<- residuals(model, data=df)
test<- data.frame(matrix(rnorm(10*4), nc=4))
colnames(test)<- cn
y.hat<- predict(model, newdata=test)
u.new<- residuals(model, newdata=test)
# I would expect to get the residuals to the 'test'
# data fit to 'model' with 'residuals(model, newdata=test)',
# alas no:
cbind(u, u.new)
u u.new
1 0.032996295 0.032996295
2 0.575261650 0.575261650
3 -0.702501425 -0.702501425
4 0.482075538 0.482075538
5 0.300600839 0.300600839
6 -0.980832577 -0.980832577
7 0.265239680 0.265239680
8 -0.341625071 -0.341625071
9 0.363224527 0.363224527
10 0.005560545 0.005560545
# I would expect to get the 'test' fit to 'model'
# with 'fitted(model, newdata=test)' but again no:
cbind(fitted(model), fitted(model, newdata=test))
[,1] [,2]
1 -0.65945011 -0.65945011
2 -0.39161833 -0.39161833
3 -0.13312719 -0.13312719
4 1.11320526 1.11320526
5 0.02890693 0.02890693
6 0.16036419 0.16036419
7 0.22218937 0.22218937
8 1.07994978 1.07994978
9 0.21255683 0.21255683
10 -0.31094893 -0.31094893
I have to do:
y.hat<- predict(model, newdata=test)
# get test residuals:
u.new<- test$y - y.hat
u.new
1 2 3 4 5 6 7
1.3661751 -0.4077999 1.1740108 0.4492772 -1.5865168 -0.7633108 -0.8800892
8 9 10
1.7474658 -0.1016104 2.1081361
to get the residuals. Maybe this is what one is supposed to do?
To be fair, the help pages for e.g. 'residuals.lm'
don't say that you can do things like 'newdata=test' in:
'residuals(model, newdata=test)'
or 'data=test' in:
'residuals(model, data=test)'
but that is exactly what I would like to do!
Am I missing something here, or is this just the way it is?
Thanks guys!
Jack.