Hi, I am using R to fit statistical models to data were the observations are means of the original data. R is used to calculate the mean before fitting the model. My problem is: When R calculates the means using tapply, the class of the means differs from the class of the original data, which gives me trouble when I want to use the original data to calculate model predictions. Here is a simple example that demonstrates the problem:> data.in<-read.table('example.dat',header=TRUE) > > #Here are the data: > data.inlocation x y 1 A 17.2 28.46 2 A 91.7 143.33 3 A 93.6 148.05 4 B 95.8 150.28 5 B 54.9 89.49 6 B 51.1 82.51 7 C 53.9 88.46 8 C 40.3 63.62 9 C 38.5 64.46 >> attach(data.in) > > #Calculate means by variable "location": > data.mn<-data.frame(xm = tapply(x,location,mean), ym tapply(y,location,mean)) > detach(data.in) > > #Here are the means: > data.mnxm ym A 67.50000 106.6133 B 67.26667 107.4267 C 44.23333 72.1800> > #Fit the model: > mod1<-lm(ym ~ xm, data.mn) > > mod1Call: lm(formula = ym ~ xm, data = data.mn) Coefficients: (Intercept) xm 5.633 1.505> #R will make "predictions" using the data.mn data frame: > predict(mod1,newdata = data.mn)A B C 107.19260 106.84153 72.18587> > #But, even if new variables are created in the original data > #with names that match those names used in the regression:> data.in$xm<-data.in$x> data.in$ym<-data.in$y > data.inlocation x y xm ym 1 A 17.2 28.46 17.2 28.46 2 A 91.7 143.33 91.7 143.33 3 A 93.6 148.05 93.6 148.05 4 B 95.8 150.28 95.8 150.28 5 B 54.9 89.49 54.9 89.49 6 B 51.1 82.51 51.1 82.51 7 C 53.9 88.46 53.9 88.46 8 C 40.3 63.62 40.3 63.62 9 C 38.5 64.46 38.5 64.46> > #R will not use data.in to make predictions: > predict(mod1,newdata = data.in)Error: variable 'xm' was fitted with class "other" but class "numeric" was supplied> > data.in$xm[1] 17.2 91.7 93.6 95.8 54.9 51.1 53.9 40.3 38.5> data.mn$xmA B C 67.50000 67.26667 44.23333>Is there a way to make these variables have the same class? Or, is there something other than "tapply" that will work better for this? Thanks! [[alternative HTML version deleted]]
Prof Brian Ripley
2007-Jun-19 16:53 UTC
[R] Linear model predictions, differences in class
tapply gives an array: you want to use as.vector() on its result. On Tue, 19 Jun 2007, John Phillips wrote:> Hi, > > I am using R to fit statistical models to data were the observations are > means of the original data. R is used to calculate the mean before fitting > the model. My problem is: When R calculates the means using tapply, the > class of the means differs from the class of the original data, which gives > me trouble when I want to use the original data to calculate model > predictions. Here is a simple example that demonstrates the problem: > >> data.in<-read.table('example.dat',header=TRUE) >> >> #Here are the data: >> data.in > location x y > 1 A 17.2 28.46 > 2 A 91.7 143.33 > 3 A 93.6 148.05 > 4 B 95.8 150.28 > 5 B 54.9 89.49 > 6 B 51.1 82.51 > 7 C 53.9 88.46 > 8 C 40.3 63.62 > 9 C 38.5 64.46 > > >> attach(data.in) >> >> #Calculate means by variable "location": >> data.mn<-data.frame(xm = tapply(x,location,mean), ym > tapply(y,location,mean)) >> detach(data.in) >> >> #Here are the means: >> data.mn > xm ym > A 67.50000 106.6133 > B 67.26667 107.4267 > C 44.23333 72.1800 >> >> #Fit the model: >> mod1<-lm(ym ~ xm, data.mn) >> >> mod1 > > Call: > lm(formula = ym ~ xm, data = data.mn) > > Coefficients: > (Intercept) xm > 5.633 1.505 > >> #R will make "predictions" using the data.mn data frame: >> predict(mod1,newdata = data.mn) > A B C > 107.19260 106.84153 72.18587 >> >> #But, even if new variables are created in the original data >> #with names that match those names used in the regression: > > data.in$xm<-data.in$x >> data.in$ym<-data.in$y >> data.in > location x y xm ym > 1 A 17.2 28.46 17.2 28.46 > 2 A 91.7 143.33 91.7 143.33 > 3 A 93.6 148.05 93.6 148.05 > 4 B 95.8 150.28 95.8 150.28 > 5 B 54.9 89.49 54.9 89.49 > 6 B 51.1 82.51 51.1 82.51 > 7 C 53.9 88.46 53.9 88.46 > 8 C 40.3 63.62 40.3 63.62 > 9 C 38.5 64.46 38.5 64.46 >> >> #R will not use data.in to make predictions: >> predict(mod1,newdata = data.in) > Error: variable 'xm' was fitted with class "other" but class "numeric" was > supplied >> >> data.in$xm > [1] 17.2 91.7 93.6 95.8 54.9 51.1 53.9 40.3 38.5 >> data.mn$xm > A B C > 67.50000 67.26667 44.23333 >> > > Is there a way to make these variables have the same class? Or, is there > something other than "tapply" that will work better for this? > > Thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595