Sergii Ivakhno
2009-Feb-12 15:45 UTC
[R] repost: problems with lm for nested fixed-factor Anova (ANOVA I)
Dear R users, I have posted this question several days ago and received not a single suggestion. I believe I have provided sufficient information for at least some help. Here I repost the question with several modifications. I want to run nested fixed-factor Anova in R on different experiments. I have 48 levels of the main factor x1 and 242 levels of the nested factor z1, and continuous response variable y1 with around 15.000 data points. There is no interaction between specification of the main factor and nested factor levels. When I run lm on using nested design anov1=lm(c(as.vector(y1))~as.factor(x1)+ as.factor(x1)/as.factor(z1)) I get a warning summary(anov1) Coefficients: (.. not defined because of singularities). I obtain interaction between factors that should not be included because of the nested design. Furthermore, the running time takes more than 7 hours! Why R includes the interactions that are not part of the model? Is it because R solves anova via least squares that requires matrix inversion? Is there any way to specify the model so that only matrices for each level of the first factor are inverted so that no computations Many thanks for your help in advance! Extract: Coefficients: (... not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) -0.1759 0.3226 -0.545 0.5869 as.factor(x1)2 -0.1276 0.4473 -0.285 0.7762 as.factor(x1)3 0.6461 0.4785 1.350 0.1802 as.factor(x1)1:as.factor(z1)2 0.1049 0.4473 0.234 0.8151 as.factor(x1)2:as.factor(z1)2 NA NA NA NA as.factor(x1)3:as.factor(z1)2 NA NA NA NA as.factor(x1)1:as.factor(z1)3 NA NA NA NA as.factor(x1)2:as.factor(z1)3 1.1520 0.4473 2.575 0.0116 * as.factor(x1)3:as.factor(z1)3 NA NA NA NA as.factor(x1)1:as.factor(z1)4 NA NA NA NA as.factor(x1)2:as.factor(z1)4 NA NA NA NA as.factor(x1)3:as.factor(z1)4 NA NA NA NA as.factor(x1)1:as.factor(z1)5 NA NA NA NA as.factor(x1)2:as.factor(z1)5 NA NA NA NA ---------------------------------------------- Sergii Ivakhno PhD student Computational Biology Group Cancer Research UK Cambridge Research Institute Li Ka Shing Centre Robinson Way Cambridge CB2 0RE England +44 (0)1223 404293 (O) +44 (0)1223 404128 (F) http://www.compbio.group.cam.ac.uk <BLOCKED::http://www.compbio.group.cam.ac.uk/> / This communication is from Cancer Research UK. Our website is at www.cancerresearchuk.org. We are a charity registered under number 1089464 and a company limited by guarantee registered in England & Wales under number 4325234. Our registered address is 61 Lincoln's Inn Fields, London WC2A 3PX. Our central telephone number is 020 7242 0200. This communication and any attachments contain information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of disclosure, distribution, copying or use of this communication or the information in it or in any attachments is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender and delete the email and destroy any copies of it. E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses. We do not accept liability for any such matters or their consequences. Anyone who communicates with us by e-mail is taken to accept the risks in doing so. [[alternative HTML version deleted]]
Sergii Ivakhno
2009-Feb-12 16:29 UTC
[R] repost: problems with lm for nested fixed-factor Anova (ANOVA I)
Deal All I am terribly sorry for forgetting to provide a toy example. Obviously it can not reproduce the problem of the running t ime. x1=c(rep(1,25),rep(2,25),rep(3,50)) z1=c(rep(1,12),rep(2,13),rep(3,12),rep(4,13),rep(5,12),rep(6,13),rep(7,2 5)) y1 = rnorm(100,0,1) anov1=lm(c(as.vector(y1))~as.factor(x1)+ as.factor(x1)/as.factor(z1)) summary(anov1) Call: lm(formula = c(as.vector(y1)) ~ as.factor(x1) + as.factor(x1)/as.factor(z1)) Residuals: Min 1Q Median 3Q Max -2.48430 -0.64636 0.02539 0.60281 2.11850 Coefficients: (14 not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) 0.05042 0.27660 0.182 0.856 as.factor(x1)2 -0.17378 0.38357 -0.453 0.652 as.factor(x1)3 0.08791 0.33650 0.261 0.794 as.factor(x1)1:as.factor(z1)2 0.13258 0.38357 0.346 0.730 as.factor(x1)2:as.factor(z1)2 NA NA NA NA as.factor(x1)3:as.factor(z1)2 NA NA NA NA as.factor(x1)1:as.factor(z1)3 NA NA NA NA as.factor(x1)2:as.factor(z1)3 -0.22248 0.38357 -0.580 0.563 as.factor(x1)3:as.factor(z1)3 NA NA NA NA as.factor(x1)1:as.factor(z1)4 NA NA NA NA as.factor(x1)2:as.factor(z1)4 NA NA NA NA as.factor(x1)3:as.factor(z1)4 NA NA NA NA as.factor(x1)1:as.factor(z1)5 NA NA NA NA as.factor(x1)2:as.factor(z1)5 NA NA NA NA as.factor(x1)3:as.factor(z1)5 -0.07494 0.33650 -0.223 0.824 as.factor(x1)1:as.factor(z1)6 NA NA NA NA as.factor(x1)2:as.factor(z1)6 NA NA NA NA as.factor(x1)3:as.factor(z1)6 -0.38572 0.32763 -1.177 0.242 as.factor(x1)1:as.factor(z1)7 NA NA NA NA as.factor(x1)2:as.factor(z1)7 NA NA NA NA as.factor(x1)3:as.factor(z1)7 NA NA NA NA Residual standard error: 0.9582 on 93 degrees of freedom Multiple R-Squared: 0.03819, Adjusted R-squared: -0.02386 F-statistic: 0.6154 on 6 and 93 DF, p-value: 0.7174 Dear R users, I have posted this question several days ago and received not a single suggestion. I believe I have provided sufficient information for at least some help. Here I repost the question with several modifications. I want to run nested fixed-factor Anova in R on different experiments. I have 48 levels of the main factor x1 and 242 levels of the nested factor z1, and continuous response variable y1 with around 15.000 data points. There is no interaction between specification of the main factor and nested factor levels. When I run lm on using nested design anov1=lm(c(as.vector(y1))~as.factor(x1)+ as.factor(x1)/as.factor(z1)) I get a warning summary(anov1) Coefficients: (.. not defined because of singularities). I obtain interaction between factors that should not be included because of the nested design. Furthermore, the running time takes more than 7 hours! Why R includes the interactions that are not part of the model? Is it because R solves anova via least squares that requires matrix inversion? Is there any way to specify the model so that only matrices for each level of the first factor are inverted so that no computations Many thanks for your help in advance! Extract: Coefficients: (... not defined because of singularities) Estimate Std. Error t value Pr(>|t|) (Intercept) -0.1759 0.3226 -0.545 0.5869 as.factor(x1)2 -0.1276 0.4473 -0.285 0.7762 as.factor(x1)3 0.6461 0.4785 1.350 0.1802 as.factor(x1)1:as.factor(z1)2 0.1049 0.4473 0.234 0.8151 as.factor(x1)2:as.factor(z1)2 NA NA NA NA as.factor(x1)3:as.factor(z1)2 NA NA NA NA as.factor(x1)1:as.factor(z1)3 NA NA NA NA as.factor(x1)2:as.factor(z1)3 1.1520 0.4473 2.575 0.0116 * as.factor(x1)3:as.factor(z1)3 NA NA NA NA as.factor(x1)1:as.factor(z1)4 NA NA NA NA as.factor(x1)2:as.factor(z1)4 NA NA NA NA as.factor(x1)3:as.factor(z1)4 NA NA NA NA as.factor(x1)1:as.factor(z1)5 NA NA NA NA as.factor(x1)2:as.factor(z1)5 NA NA NA NA ---------------------------------------------- Sergii Ivakhno PhD student Computational Biology Group Cancer Research UK Cambridge Research Institute Li Ka Shing Centre Robinson Way Cambridge CB2 0RE England +44 (0)1223 404293 (O) +44 (0)1223 404128 (F) http://www.compbio.group.cam.ac.uk <BLOCKED::http://www.compbio.group.cam.ac.uk/> / This communication is from Cancer Research UK. Our website is at www.cancerresearchuk.org. We are a charity registered under number 1089464 and a company limited by guarantee registered in England & Wales under number 4325234. Our registered address is 61 Lincoln's Inn Fields, London WC2A 3PX. Our central telephone number is 020 7242 0200. This communication and any attachments contain information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of disclosure, distribution, copying or use of this communication or the information in it or in any attachments is strictly prohibited and may be unlawful. If you have received this communication in error, please notify the sender and delete the email and destroy any copies of it. E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses. We do not accept liability for any such matters or their consequences. Anyone who communicates with us by e-mail is taken to accept the risks in doing so. [[alternative HTML version deleted]]
Richard M. Heiberger
2009-Feb-12 17:18 UTC
[R] repost: problems with lm for nested fixed-factor Anova (ANOVA I)
tmp <- data.frame(y=rnorm(15000), x1 <- factor(sample(48, 15000, replace=TRUE)), z1 <- factor(sample(242, 15000, replace=TRUE))) system.time( tmp.aov <- aov(y ~ x1/z1, data=tmp) ) ## exceeds memory tmp2 <- data.frame(y=rnorm(15000), x1 <- factor(sample(48, 15000, replace=TRUE)), z1 <- factor(sample(5, 15000, replace=TRUE))) system.time( tmp2.aov <- aov(y ~ x1/z1, data=tmp2) ) anova(tmp2.aov) ## about 5 seconds Use data.frames. They make it easier to read. Use aov() instead of lm(). It is the same arithmetic, but the unneeded columns of X are handled more gracefully. My guess is that your data has 100s of distinct values for z1. Therefore excess space was allocated. It is easier to understand with distinct values of z1, but as you see it is costly in computer resources. You can force the actual numerical values of the second term to be distinct across levels of x1 with the interaction() function. Then use the simpler model and let the linear dependencies work in your favor. system.time( tmp.aov <- aov(y ~ x1 + interaction(x1, z1), data=tmp) ) anova(tmp.aov) ## about 6 seconds Rich
Possibly Parallel Threads
- problems with lm for nested fixed-factor Anova
- Indexing in anova summary output of the form: summary(aov(y ~ x1, Error = (x1/x2)))
- lm and aov produce different results for nested fixed-factor anova
- about lm restrictions...
- question about multinom function (nnet)