On Thursday 16 October 2003 17:59, Alexander Sirotkin \[at Yahoo\] wrote:> Thanks for all the help on my previous questions. > > One more (hopefully last one) : I've been very > surprised when I tried to fit a model (using aov()) > for a sample of size 200 and 10 variables and their > interactions.That doesn't really say much. How many of these variables are factors ? How many levels do they have ? And what is the order of the interaction ? (Note that for 10 numeric variables, if you allow all interactions, then there will be a 100 terms in your model. This increases for factors.) In other words, how big is your model matrix ? (See ?model.matrix) Deepayan
Thanks for all the help on my previous questions. One more (hopefully last one) : I've been very surprised when I tried to fit a model (using aov()) for a sample of size 200 and 10 variables and their interactions. It turns out that even 2GB of RAM is not anough for aov() with this sample size, which does not seem so big for me. Am I doing something wrong or this is considered a normal memory requirements ? Frankly, I just don't have an access to a machine with more then 2GB of RAM so I'm not sure how I should attack this problem. 10x. P.S. When I reduced sample size to 50 2GB RAM was enough, but aov() kept working for all night and has not finished yet.
--- Deepayan Sarkar <deepayan at stat.wisc.edu> wrote:> On Thursday 16 October 2003 19:03, Alexander > Sirotkin \[at Yahoo\] wrote: > > > > > Thanks for all the help on my previous > questions. > > > > > > > > One more (hopefully last one) : I've been very > > > > surprised when I tried to fit a model (using > > > > aov()) > > > > for a sample of size 200 and 10 variables and > > > > their interactions. > > > > > > That doesn't really say much. How many of these > > > variables are factors ? How > > > many levels do they have ? And what is the order > of > > > the interaction ? (Note > > > that for 10 numeric variables, if you allow all > > > interactions, then there will > > > be a 100 terms in your model. This increases for > > > factors.) > > > > > > In other words, how big is your model matrix ? > (See > > > ?model.matrix) > > > > > > Deepayan > > > > I see... > > > > Unfortunately, model.matrix() ran out of memory :) > > I have 10 variables, 6 of which are factor, 2 of > which > > > > have quite a lot of levels (about 40). And I would > > like to allow all interactions. > > > > I understand your point about categorical > variables, > > but still - this does not seem like too much data > to me. > > That's one way to look at it. You don't have enough > data for the model you are > trying to fit. The usual approach under these > circumstances is to try > 'simpler' models. > > Please try to understand what you are trying to do > (in this case by reading an > introductory linear model text) before blindly > applying a methodology. > > Deepayan > >I did study ANOVA and I do have enough observations. 200 was only a random sample of more then 5000 which I think should be enough. However, I'm afraid to even think about amount of RAM I will need with R to fit a model for this data.
A couple of comments: o Methods such as decision trees do not need to expand factors into columns of 1df contrasts, so the memory requirement is vastly different. The models produced is also very, very different. o Why would you want "all possible interactions" of 10 variables, 6 of which are factors? How do you intend to interpret, e.g., the 6-factor interaction? What can you conclude about a significant 10-variable interaction? What is your ultimate goal for this exercise? Answer to that should help you decide on more reasonable models to fit. o One thing to try is fit the ANOVA model "by hand" by computing cell means and examine them. This avoids creating the huge design matrix that's mostly 0s. HTH, Andy> -----Original Message----- > From: Alexander Sirotkin [at Yahoo] [mailto:alex_s_42 at yahoo.com] > Sent: Friday, October 17, 2003 4:30 AM > To: John Fox > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] R memory and CPU requirements > > > I agree completely. > > In fact, I have about 5000 observations, which should > be enough. > I was using 200 samples because of RAM limitations and > I'm afraid to think about what amount of RAM I'll > need to fit an aov() for such data. > > > --- John Fox <jfox at mcmaster.ca> wrote: > > Dear Alexander, > > > > If I understand you correctly, you have a sample of > > 200 observations. Even > > if you had only two factors with 40 levels each, the > > main effects and > > interactions of these factors would require about > > 1600 degrees of freedom > > -- that is, more than the number of observations. > > This doesn't make a whole > > lot of sense. > > > > I hope that this helps, > > John > > > > At 05:03 PM 10/16/2003 -0700, Alexander Sirotkin > > \[at Yahoo\] wrote: > > > > >--- Deepayan Sarkar <deepayan at stat.wisc.edu> wrote: > > > > On Thursday 16 October 2003 17:59, Alexander > > > > Sirotkin \[at Yahoo\] wrote: > > > > > Thanks for all the help on my previous > > questions. > > > > > > > > > > One more (hopefully last one) : I've been very > surprised when I > > > > > tried to fit a model (using > > > > aov()) > > > > > for a sample of size 200 and 10 variables and > > > > their > > > > > interactions. > > > > > > > > That doesn't really say much. How many of these > > > > variables are factors ? How > > > > many levels do they have ? And what is the order > > of > > > > the interaction ? (Note > > > > that for 10 numeric variables, if you allow all > interactions, then > > > > there will be a 100 terms in your model. This increases for > > > > factors.) > > > > > > > > In other words, how big is your model matrix ? > > (See > > > > ?model.matrix) > > > > > > > > Deepayan > > > > > > > > > > > > >I see... > > > > > >Unfortunately, model.matrix() ran out of memory :) > > >I have 10 variables, 6 of which are factor, 2 of > > which > > > > > >have quite a lot of levels (about 40). And I would > > >like > > >to allow all interactions. > > > > > >I understand your point about categorical > > variables, > > >but > > >still - this does not seem like too much data to > > me. > > > > > > > > >I remmeber fitting all kinds of models (mostly > > >decision > > >trees) for much, much larger data sets. > > > > > >______________________________________________ > > >R-help at stat.math.ethz.ch mailing list > > > >https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > ----------------------------------------------------- > > John Fox > > Department of Sociology > > McMaster University > > Hamilton, Ontario, Canada L8S 4M4 > > email: jfox at mcmaster.ca > > phone: 905-525-9140x23604 > > web: www.socsci.mcmaster.ca/jfox > > > ----------------------------------------------------- > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo> /r-help >
On 17 Oct 2003 at 1:33, Alexander Sirotkin [at Yahoo] wrote: You mentioned in an earlier post that at least one of your factors have 40 levels. If you use the default contrast, contrast.traetment, the design matrix for this factor will be dominated by zeros. Maybe you shoukd look at tha CRAN package SparseM, which have function slm for linear models with sparse matrices? (I did'nt try this , but it could be worthwile) Still, I don't think it makes much sense to start with a model with all the interactions in! Kjetil Halvorsen> > --- Deepayan Sarkar <deepayan at stat.wisc.edu> wrote: > > On Thursday 16 October 2003 19:03, Alexander > > Sirotkin \[at Yahoo\] wrote: > > > > > > > Thanks for all the help on my previous > > questions. > > > > > > > > > > One more (hopefully last one) : I've been very > > > > > surprised when I tried to fit a model (using > > > > > aov()) > > > > > for a sample of size 200 and 10 variables and > > > > > their interactions. > > > > > > > > That doesn't really say much. How many of these > > > > variables are factors ? How > > > > many levels do they have ? And what is the order > > of > > > > the interaction ? (Note > > > > that for 10 numeric variables, if you allow all > > > > interactions, then there will > > > > be a 100 terms in your model. This increases for > > > > factors.) > > > > > > > > In other words, how big is your model matrix ? > > (See > > > > ?model.matrix) > > > > > > > > Deepayan > > > > > > I see... > > > > > > Unfortunately, model.matrix() ran out of memory :) > > > I have 10 variables, 6 of which are factor, 2 of > > which > > > > > > have quite a lot of levels (about 40). And I would > > > like to allow all interactions. > > > > > > I understand your point about categorical > > variables, > > > but still - this does not seem like too much data > > to me. > > > > That's one way to look at it. You don't have enough > > data for the model you are > > trying to fit. The usual approach under these > > circumstances is to try > > 'simpler' models. > > > > Please try to understand what you are trying to do > > (in this case by reading an > > introductory linear model text) before blindly > > applying a methodology. > > > > Deepayan > > > > > > > I did study ANOVA and I do have enough observations. > 200 was only a random sample of more then 5000 which I > think should be enough. However, I'm afraid to even > think about amount of RAM I will need with R to fit a > model for this data. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help