Sergii Ivakhno
2009-Feb-12 15:45 UTC
[R] repost: problems with lm for nested fixed-factor Anova (ANOVA I)
Dear R users,
I have posted this question several days ago and received not a single
suggestion. I believe I have provided sufficient information for at
least some help. Here I repost the question with several modifications.
I want to run nested fixed-factor Anova in R on different experiments.
I have 48 levels of the main factor x1 and 242 levels of the nested
factor z1, and continuous response variable y1 with around 15.000 data
points. There is no interaction between specification of the main factor
and nested factor levels.
When I run lm on using nested design
anov1=lm(c(as.vector(y1))~as.factor(x1)+ as.factor(x1)/as.factor(z1))
I get a warning
summary(anov1)
Coefficients: (.. not defined because of singularities). I obtain
interaction between factors that should not be included because of the
nested design. Furthermore, the running time takes more than 7 hours!
Why R includes the interactions that are not part of the model? Is it
because R solves anova via least squares that requires matrix inversion?
Is there any way to specify the model so that only matrices for each
level of the first factor are inverted so that no computations
Many thanks for your help in advance!
Extract:
Coefficients: (... not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1759 0.3226 -0.545 0.5869
as.factor(x1)2 -0.1276 0.4473 -0.285 0.7762
as.factor(x1)3 0.6461 0.4785 1.350 0.1802
as.factor(x1)1:as.factor(z1)2 0.1049 0.4473 0.234 0.8151
as.factor(x1)2:as.factor(z1)2 NA NA NA NA
as.factor(x1)3:as.factor(z1)2 NA NA NA NA
as.factor(x1)1:as.factor(z1)3 NA NA NA NA
as.factor(x1)2:as.factor(z1)3 1.1520 0.4473 2.575 0.0116 *
as.factor(x1)3:as.factor(z1)3 NA NA NA NA
as.factor(x1)1:as.factor(z1)4 NA NA NA NA
as.factor(x1)2:as.factor(z1)4 NA NA NA NA
as.factor(x1)3:as.factor(z1)4 NA NA NA NA
as.factor(x1)1:as.factor(z1)5 NA NA NA NA
as.factor(x1)2:as.factor(z1)5 NA NA NA NA
----------------------------------------------
Sergii Ivakhno
PhD student
Computational Biology Group
Cancer Research UK Cambridge Research Institute
Li Ka Shing Centre
Robinson Way
Cambridge CB2 0RE
England
+44 (0)1223 404293 (O)
+44 (0)1223 404128 (F)
http://www.compbio.group.cam.ac.uk
<BLOCKED::http://www.compbio.group.cam.ac.uk/> /
This communication is from Cancer Research UK. Our website is at
www.cancerresearchuk.org. We are a charity registered under number 1089464 and a
company limited by guarantee registered in England & Wales under number
4325234. Our registered address is 61 Lincoln's Inn Fields, London WC2A 3PX.
Our central telephone number is 020 7242 0200.
This communication and any attachments contain information which is confidential
and may also be privileged. It is for the exclusive use of the intended
recipient(s). If you are not the intended recipient(s) please note that any
form of disclosure, distribution, copying or use of this communication or the
information in it or in any attachments is strictly prohibited and may be
unlawful. If you have received this communication in error, please notify the
sender and delete the email and destroy any copies of it.
E-mail communications cannot be guaranteed to be secure or error free, as
information could be intercepted, corrupted, amended, lost, destroyed, arrive
late or incomplete, or contain viruses. We do not accept liability for any such
matters or their consequences. Anyone who communicates with us by e-mail is
taken to accept the risks in doing so.
[[alternative HTML version deleted]]
Sergii Ivakhno
2009-Feb-12 16:29 UTC
[R] repost: problems with lm for nested fixed-factor Anova (ANOVA I)
Deal All
I am terribly sorry for forgetting to provide a toy example. Obviously
it can not reproduce the problem of the running t ime.
x1=c(rep(1,25),rep(2,25),rep(3,50))
z1=c(rep(1,12),rep(2,13),rep(3,12),rep(4,13),rep(5,12),rep(6,13),rep(7,2
5))
y1 = rnorm(100,0,1)
anov1=lm(c(as.vector(y1))~as.factor(x1)+ as.factor(x1)/as.factor(z1))
summary(anov1)
Call:
lm(formula = c(as.vector(y1)) ~ as.factor(x1) +
as.factor(x1)/as.factor(z1))
Residuals:
Min 1Q Median 3Q Max
-2.48430 -0.64636 0.02539 0.60281 2.11850
Coefficients: (14 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.05042 0.27660 0.182 0.856
as.factor(x1)2 -0.17378 0.38357 -0.453 0.652
as.factor(x1)3 0.08791 0.33650 0.261 0.794
as.factor(x1)1:as.factor(z1)2 0.13258 0.38357 0.346 0.730
as.factor(x1)2:as.factor(z1)2 NA NA NA NA
as.factor(x1)3:as.factor(z1)2 NA NA NA NA
as.factor(x1)1:as.factor(z1)3 NA NA NA NA
as.factor(x1)2:as.factor(z1)3 -0.22248 0.38357 -0.580 0.563
as.factor(x1)3:as.factor(z1)3 NA NA NA NA
as.factor(x1)1:as.factor(z1)4 NA NA NA NA
as.factor(x1)2:as.factor(z1)4 NA NA NA NA
as.factor(x1)3:as.factor(z1)4 NA NA NA NA
as.factor(x1)1:as.factor(z1)5 NA NA NA NA
as.factor(x1)2:as.factor(z1)5 NA NA NA NA
as.factor(x1)3:as.factor(z1)5 -0.07494 0.33650 -0.223 0.824
as.factor(x1)1:as.factor(z1)6 NA NA NA NA
as.factor(x1)2:as.factor(z1)6 NA NA NA NA
as.factor(x1)3:as.factor(z1)6 -0.38572 0.32763 -1.177 0.242
as.factor(x1)1:as.factor(z1)7 NA NA NA NA
as.factor(x1)2:as.factor(z1)7 NA NA NA NA
as.factor(x1)3:as.factor(z1)7 NA NA NA NA
Residual standard error: 0.9582 on 93 degrees of freedom
Multiple R-Squared: 0.03819, Adjusted R-squared: -0.02386
F-statistic: 0.6154 on 6 and 93 DF, p-value: 0.7174
Dear R users,
I have posted this question several days ago and received not a single
suggestion. I believe I have provided sufficient information for at
least some help. Here I repost the question with several modifications.
I want to run nested fixed-factor Anova in R on different experiments.
I have 48 levels of the main factor x1 and 242 levels of the nested
factor z1, and continuous response variable y1 with around 15.000 data
points. There is no interaction between specification of the main factor
and nested factor levels.
When I run lm on using nested design
anov1=lm(c(as.vector(y1))~as.factor(x1)+ as.factor(x1)/as.factor(z1))
I get a warning
summary(anov1)
Coefficients: (.. not defined because of singularities). I obtain
interaction between factors that should not be included because of the
nested design. Furthermore, the running time takes more than 7 hours!
Why R includes the interactions that are not part of the model? Is it
because R solves anova via least squares that requires matrix inversion?
Is there any way to specify the model so that only matrices for each
level of the first factor are inverted so that no computations
Many thanks for your help in advance!
Extract:
Coefficients: (... not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1759 0.3226 -0.545 0.5869
as.factor(x1)2 -0.1276 0.4473 -0.285 0.7762
as.factor(x1)3 0.6461 0.4785 1.350 0.1802
as.factor(x1)1:as.factor(z1)2 0.1049 0.4473 0.234 0.8151
as.factor(x1)2:as.factor(z1)2 NA NA NA NA
as.factor(x1)3:as.factor(z1)2 NA NA NA NA
as.factor(x1)1:as.factor(z1)3 NA NA NA NA
as.factor(x1)2:as.factor(z1)3 1.1520 0.4473 2.575 0.0116 *
as.factor(x1)3:as.factor(z1)3 NA NA NA NA
as.factor(x1)1:as.factor(z1)4 NA NA NA NA
as.factor(x1)2:as.factor(z1)4 NA NA NA NA
as.factor(x1)3:as.factor(z1)4 NA NA NA NA
as.factor(x1)1:as.factor(z1)5 NA NA NA NA
as.factor(x1)2:as.factor(z1)5 NA NA NA NA
----------------------------------------------
Sergii Ivakhno
PhD student
Computational Biology Group
Cancer Research UK Cambridge Research Institute
Li Ka Shing Centre
Robinson Way
Cambridge CB2 0RE
England
+44 (0)1223 404293 (O)
+44 (0)1223 404128 (F)
http://www.compbio.group.cam.ac.uk
<BLOCKED::http://www.compbio.group.cam.ac.uk/> /
This communication is from Cancer Research UK. Our website is at
www.cancerresearchuk.org. We are a charity registered under number 1089464 and a
company limited by guarantee registered in England & Wales under number
4325234. Our registered address is 61 Lincoln's Inn Fields, London WC2A 3PX.
Our central telephone number is 020 7242 0200.
This communication and any attachments contain information which is confidential
and may also be privileged. It is for the exclusive use of the intended
recipient(s). If you are not the intended recipient(s) please note that any
form of disclosure, distribution, copying or use of this communication or the
information in it or in any attachments is strictly prohibited and may be
unlawful. If you have received this communication in error, please notify the
sender and delete the email and destroy any copies of it.
E-mail communications cannot be guaranteed to be secure or error free, as
information could be intercepted, corrupted, amended, lost, destroyed, arrive
late or incomplete, or contain viruses. We do not accept liability for any such
matters or their consequences. Anyone who communicates with us by e-mail is
taken to accept the risks in doing so.
[[alternative HTML version deleted]]
Richard M. Heiberger
2009-Feb-12 17:18 UTC
[R] repost: problems with lm for nested fixed-factor Anova (ANOVA I)
tmp <- data.frame(y=rnorm(15000),
x1 <- factor(sample(48, 15000, replace=TRUE)),
z1 <- factor(sample(242, 15000, replace=TRUE)))
system.time(
tmp.aov <- aov(y ~ x1/z1, data=tmp)
)
## exceeds memory
tmp2 <- data.frame(y=rnorm(15000),
x1 <- factor(sample(48, 15000, replace=TRUE)),
z1 <- factor(sample(5, 15000, replace=TRUE)))
system.time(
tmp2.aov <- aov(y ~ x1/z1, data=tmp2)
)
anova(tmp2.aov)
## about 5 seconds
Use data.frames. They make it easier to read.
Use aov() instead of lm(). It is the same arithmetic,
but the unneeded columns of X are handled more gracefully.
My guess is that your data has 100s of distinct values for z1.
Therefore excess space was allocated. It is easier to understand with
distinct values of z1, but as you see it is costly in computer
resources.
You can force the actual numerical values of the second term to be
distinct across levels of x1 with the interaction() function. Then
use the simpler model and let the linear dependencies work in your
favor.
system.time(
tmp.aov <- aov(y ~ x1 + interaction(x1, z1), data=tmp)
)
anova(tmp.aov)
## about 6 seconds
Rich
Apparently Analagous Threads
- problems with lm for nested fixed-factor Anova
- Indexing in anova summary output of the form: summary(aov(y ~ x1, Error = (x1/x2)))
- lm and aov produce different results for nested fixed-factor anova
- about lm restrictions...
- question about multinom function (nnet)