John McHenry
2009-Feb-04 02:10 UTC
[R] Generic functions with models? 'predict', 'residuals', 'fitted', ...?
Hi WizaRds! I don't know if my understanding of models is screwed up or if there's a fundamental limitation to the way they are used in R. Here's what I'd like to do: create a model and then test data against that model using generics like 'predict' and 'residuals'. The problem is that generics don't work for new data sets, except of course 'predict': set.seed(1) df<- data.frame(matrix(rnorm(10*4), nc=4)) cn<- c("y", "x1", "x2", "x3") colnames(df)<- cn model<- lm(y ~ x1 + x2 + x3 + 1, data=df) u<- residuals(model, data=df) test<- data.frame(matrix(rnorm(10*4), nc=4)) colnames(test)<- cn y.hat<- predict(model, newdata=test) u.new<- residuals(model, newdata=test) # I would expect to get the residuals to the 'test' # data fit to 'model' with 'residuals(model, newdata=test)', # alas no: cbind(u, u.new) u u.new 1 0.032996295 0.032996295 2 0.575261650 0.575261650 3 -0.702501425 -0.702501425 4 0.482075538 0.482075538 5 0.300600839 0.300600839 6 -0.980832577 -0.980832577 7 0.265239680 0.265239680 8 -0.341625071 -0.341625071 9 0.363224527 0.363224527 10 0.005560545 0.005560545 # I would expect to get the 'test' fit to 'model' # with 'fitted(model, newdata=test)' but again no: cbind(fitted(model), fitted(model, newdata=test)) [,1] [,2] 1 -0.65945011 -0.65945011 2 -0.39161833 -0.39161833 3 -0.13312719 -0.13312719 4 1.11320526 1.11320526 5 0.02890693 0.02890693 6 0.16036419 0.16036419 7 0.22218937 0.22218937 8 1.07994978 1.07994978 9 0.21255683 0.21255683 10 -0.31094893 -0.31094893 I have to do: y.hat<- predict(model, newdata=test) # get test residuals: u.new<- test$y - y.hat u.new 1 2 3 4 5 6 7 1.3661751 -0.4077999 1.1740108 0.4492772 -1.5865168 -0.7633108 -0.8800892 8 9 10 1.7474658 -0.1016104 2.1081361 to get the residuals. Maybe this is what one is supposed to do? To be fair, the help pages for e.g. 'residuals.lm' don't say that you can do things like 'newdata=test' in: 'residuals(model, newdata=test)' or 'data=test' in: 'residuals(model, data=test)' but that is exactly what I would like to do! Am I missing something here, or is this just the way it is? Thanks guys! Jack.