dear R wizards: X is factor with 20,000*20=800,000 observations of 20,000 factors. I.e., each factor has 20 observations. y is 800,000 normally distributed data points. I want to see how much R^2 the X factors can provide. Easy, right?> lm ( y ~ X)and> aov( y ~ X)Error: cannot allocate vector of size 3125000 Kb is this computationally infeasible? (I am not an expert, but off-hand, I thought this can work as long as the X's are just factors = fitted means.)> help.search("fixed.effects");fixed.effects(nlme) Extract Fixed Effects fixed.effects.lmList(nlme) Extract lmList Fixed Effects lme(nlme) Linear Mixed-Effects Models lmeStruct(nlme) Linear Mixed-Effects Structure nlme(nlme) Nonlinear Mixed-Effects Models nlmeStruct(nlme) Nonlinear Mixed-Effects Structure Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'. ok---I want to read the fixed.effects help. could the help system tell me how to inspect these entries?> help("fixed.effects")wrong> help.search("fixed.effects")two entries, as above. nothing new.> ?fixed.effectswrong eventually, it dawned on me that nlme in parens was not a function argument, but the name of the package within which fixed.effects lives. Suggestion: maybe a different notation to name packages would be more intuitive in the help system. yes, I know it now, but other novices may not. even a colon instead of a () may be more intuitive in this context.> library(nlme); ?lmeand then> lme(y ~ X)Error in getGroups.data.frame(dataMix, groups) : Invalid formula for groups now I have to beg for some help. ok, blatant free-riding. the lme docs tells me it does the Laird and Ware model, but I do not know this model. the only two examples given at the end of the lme help file seem to be similar to what I just specified. so, how do I execute a simple fixed-effects model? (later, I may want to add a variable Z that is a continuous random variable.) could someone please give me one quick example? help is, as always, highly appreciated. sincerely, /ivo welch
I guess you meant X has 20,000 levels (and 40 observations each)? In that case lm() will attempt to create a design matrix that's 8e5 by 2e4, and that's unlikely to fit in the RAM. It is very easy to compute by hand. I'm using a smaller data size first to check the result against summary(lm()), then the size you specified (hopefully with more correct arithmetics...) to show that it's quite doable even on a modest laptop:> set.seed(1) > y <- rnorm(80) > x <- factor(rep(paste("x", 1:20, sep=""), 4)) > summary(lm(y ~ x))$r.squared[1] 0.2555885> rsq <- function(x, y) {+ sse <- sum(ave(y, x, FUN=function(x) x - mean(x))^2) + sstotal <- var(y) * (length(y) - 1) + 1 - sse / sstotal + }> rsq(x, y)[1] 0.2555885> set.seed(1) > y <- rnorm(8e5) > x <- factor(rep(paste("x", 1:2e4, sep=""), 40)) > system.time(rsq(x, y))[1] 1.99 0.03 2.06 NA NA Andy From: ivo welch> > dear R wizards: > > X is factor with 20,000*20=800,000 observations of 20,000 factors. > I.e., each factor has 20 observations. y is 800,000 normally > distributed data points. I want to see how much R^2 the X > factors can provide. Easy, right? > > > lm ( y ~ X) > and > > aov( y ~ X) > Error: cannot allocate vector of size 3125000 Kb > > is this computationally infeasible? (I am not an expert, but > off-hand, I thought this can work as long as the X's are just > factors = fitted means.) > > > help.search("fixed.effects"); > > fixed.effects(nlme) Extract Fixed Effects > fixed.effects.lmList(nlme) Extract lmList Fixed Effects > lme(nlme) Linear Mixed-Effects Models > lmeStruct(nlme) Linear Mixed-Effects Structure > nlme(nlme) Nonlinear Mixed-Effects Models > nlmeStruct(nlme) Nonlinear Mixed-Effects Structure > > Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'. > > ok---I want to read the fixed.effects help. could the help > system tell me how to inspect these entries? > > > help("fixed.effects") > wrong > > help.search("fixed.effects") > two entries, as above. nothing new. > > ?fixed.effects > wrong > > eventually, it dawned on me that nlme in parens was not a > function argument, but the name of the package within which > fixed.effects lives. Suggestion: maybe a different notation > to name packages would be more intuitive in the help system. > yes, I know it now, but other novices may not. even a colon > instead of a () may be more intuitive in this context. > > > library(nlme); ?lme > and then > > lme(y ~ X) > Error in getGroups.data.frame(dataMix, groups) : > Invalid formula for groups > > > now I have to beg for some help. ok, blatant free-riding. > the lme docs tells me it does the Laird and Ware model, but I > do not know this model. the only two examples given at the > end of the lme help file seem to be similar to what I just > specified. so, how do I execute a simple fixed-effects > model? (later, I may want to add a variable Z that is a > continuous random variable.) could someone please give me > one quick example? help is, as always, highly appreciated. > > sincerely, > > /ivo welch > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > R-project.org/posting-guide.html > >
Hmm... didn't read the whole post before I hit `send'... I think you basically will have to fit the model `by hand', which is not that hard, given the simple structure of the model(s). The formulae for the quantities of interests are quite straightforward and easy to code in R (similar to the rsq function I showed). Adding a continuous variable to the model simply requires regressing the residuals; i.e., ave(y, x, function(x) x - mean(x)), on the new variable z (easy for lm()), and take one more degree of freedom from SS(total). Andy From: Liaw, Andy> > I guess you meant X has 20,000 levels (and 40 observations > each)? In that case lm() will attempt to create a design > matrix that's 8e5 by 2e4, and that's unlikely to fit in the RAM. > > It is very easy to compute by hand. I'm using a smaller data > size first to check the result against summary(lm()), then > the size you specified (hopefully with more correct > arithmetics...) to show that it's quite doable even on a > modest laptop: > > > set.seed(1) > > y <- rnorm(80) > > x <- factor(rep(paste("x", 1:20, sep=""), 4)) > > summary(lm(y ~ x))$r.squared > [1] 0.2555885 > > rsq <- function(x, y) { > + sse <- sum(ave(y, x, FUN=function(x) x - mean(x))^2) > + sstotal <- var(y) * (length(y) - 1) > + 1 - sse / sstotal > + } > > rsq(x, y) > [1] 0.2555885 > > set.seed(1) > > y <- rnorm(8e5) > > x <- factor(rep(paste("x", 1:2e4, sep=""), 40)) > system.time(rsq(x, y)) > [1] 1.99 0.03 2.06 NA NA > > Andy > > > From: ivo welch > > > > dear R wizards: > > > > X is factor with 20,000*20=800,000 observations of 20,000 factors. > > I.e., each factor has 20 observations. y is 800,000 normally > > distributed data points. I want to see how much R^2 the X > > factors can provide. Easy, right? > > > > > lm ( y ~ X) > > and > > > aov( y ~ X) > > Error: cannot allocate vector of size 3125000 Kb > > > > is this computationally infeasible? (I am not an expert, but > > off-hand, I thought this can work as long as the X's are just > > factors = fitted means.) > > > > > help.search("fixed.effects"); > > > > fixed.effects(nlme) Extract Fixed Effects > > fixed.effects.lmList(nlme) Extract lmList Fixed Effects > > lme(nlme) Linear Mixed-Effects Models > > lmeStruct(nlme) Linear Mixed-Effects Structure > > nlme(nlme) Nonlinear Mixed-Effects Models > > nlmeStruct(nlme) Nonlinear Mixed-Effects > Structure > > > > Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'. > > > > ok---I want to read the fixed.effects help. could the help > > system tell me how to inspect these entries? > > > > > help("fixed.effects") > > wrong > > > help.search("fixed.effects") > > two entries, as above. nothing new. > > > ?fixed.effects > > wrong > > > > eventually, it dawned on me that nlme in parens was not a > > function argument, but the name of the package within which > > fixed.effects lives. Suggestion: maybe a different notation > > to name packages would be more intuitive in the help system. > > yes, I know it now, but other novices may not. even a colon > > instead of a () may be more intuitive in this context. > > > > > library(nlme); ?lme > > and then > > > lme(y ~ X) > > Error in getGroups.data.frame(dataMix, groups) : > > Invalid formula for groups > > > > > > now I have to beg for some help. ok, blatant free-riding. > > the lme docs tells me it does the Laird and Ware model, but I > > do not know this model. the only two examples given at the > > end of the lme help file seem to be similar to what I just > > specified. so, how do I execute a simple fixed-effects > > model? (later, I may want to add a variable Z that is a > > continuous random variable.) could someone please give me > > one quick example? help is, as always, highly appreciated. > > > > sincerely, > > > > /ivo welch > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > R-project.org/posting-guide.html > > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > R-project.org/posting-guide.html > > -------------------------------------------------------------- > ---------------- > Notice: This e-mail message, together with any attachments, > contains information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, New Jersey, USA 08889), and/or its > affiliates (which may be known outside the United States as > Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as > Banyu) that may be confidential, proprietary copyrighted > and/or legally privileged. It is intended solely for the use > of the individual or entity named on this message. If you > are not the intended recipient, and have received this > message in error, please notify us immediately by reply > e-mail and then delete it from your system. > -------------------------------------------------------------- > ---------------- >
Have you tried the following: lme(y~1, random=~1|X, data=DF) where DF = a data.frame with columns y and X. The authoritative reference on library(nlme) is Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer). I've learned a lot from Bates and from this book in particular. hope this helps, spencer graves ivo welch wrote:> dear R wizards: > > X is factor with 20,000*20=800,000 observations of 20,000 factors. > I.e., each factor has 20 observations. y is 800,000 normally > distributed data points. I want to see how much R^2 the X factors can > provide. Easy, right? > > >>lm ( y ~ X) > > and > >> aov( y ~ X) > > Error: cannot allocate vector of size 3125000 Kb > > is this computationally infeasible? (I am not an expert, but > off-hand, I thought this can work as long as the X's are just factors > = fitted means.) > > >>help.search("fixed.effects"); > > > fixed.effects(nlme) Extract Fixed Effects > fixed.effects.lmList(nlme) Extract lmList Fixed Effects > lme(nlme) Linear Mixed-Effects Models > lmeStruct(nlme) Linear Mixed-Effects Structure > nlme(nlme) Nonlinear Mixed-Effects Models > nlmeStruct(nlme) Nonlinear Mixed-Effects Structure > > Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'. > > ok---I want to read the fixed.effects help. could the help system > tell me how to inspect these entries? > > >>help("fixed.effects") > > wrong > >>help.search("fixed.effects") > > two entries, as above. nothing new. > >>?fixed.effects > > wrong > > eventually, it dawned on me that nlme in parens was not a function > argument, but the name of the package within which fixed.effects > lives. Suggestion: maybe a different notation to name packages would > be more intuitive in the help system. yes, I know it now, but other > novices may not. even a colon instead of a () may be more intuitive > in this context. > > >>library(nlme); ?lme > > and then > >>lme(y ~ X) > > Error in getGroups.data.frame(dataMix, groups) : > Invalid formula for groups > > > now I have to beg for some help. ok, blatant free-riding. the lme > docs tells me it does the Laird and Ware model, but I do not know this > model. the only two examples given at the end of the lme help file > seem to be similar to what I just specified. so, how do I execute a > simple fixed-effects model? (later, I may want to add a variable Z > that is a continuous random variable.) could someone please give me > one quick example? help is, as always, highly appreciated. > > sincerely, > > /ivo welch > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! R-project.org/posting-guide.html