Torsten Hothorn
2011-Oct-17 14:53 UTC
[R] Party package: varimp(..., conditional=TRUE) error: term 1 would require 9e+12 columns (fwd)
> > I would like to build a forest of regression trees to see how well some > covariates predict a response variable and to examine the importance of > the > covariates. I have a small number of covariates (8) and large number of > records (27368). The response and all of the covariates are continuous > variables. > > A cursory examination of the covariates does not suggest they are > correlated > in a simple fashion (e.g. the variance inflation factors are all fairly > low) > but common sense suggests there should be some relationship: one of them > is > the day of the year and some of the others are environmental parameters > such > as water temperature. For this reason I would like to follow the advice > of > Strobl et al. (2008) and try the authors' conditional variable > importance > measure. This is implemented in the party package by calling varimp(..., > conditional=TRUE). Unfortunately, when I call that on my forest I > receive > the error: > >> varimp(myforest, conditional=TRUE) > Error in model.matrix.default(as.formula(f), data = blocks) : > term 1 would require 9e+12 columns > > Does anyone know what is wrong? >Hi Jason, the particular feature doesn't scale well in its current implementation. Anyway, thanks for looking up previous reports closely. I can offer to have a look at your data if you send them along with the code to reproduce the problem. Best, Torsten> I noticed a post in June 2011 where a user reported this message and the > ultimate problem was that the importance measure was being conditioned > on > too many variables (47). I have only a small number of variables here so > I > guessed that was not the problem. > > Another suggestion was that there could be a factor with too many > levels. In > my case, all of the variables are continuous. Term 1 (x1 below) is the > day > of the year, which does happen to be integers 1 ... 366. But the > variable is > class numeric, not integer, so I don't believe cforest would treat it as > a > factor, although I do not know how to tell whether cforest is treating > something as continuous or as a factor. > > Thank you for any help you can provide. I am running R 2.13.1 with party > 0.9-99994. You can download the data from > http://www.duke.edu/~jjr8/data.rdata (512 KB). Here is the complete > code: > >> load("\\Temp\\data.rdata") >> nrow(df) > [1] 27368 >> summary(df) > y x1 x2 x3 > x4 x5 x6 x7 > x8 > > Min. : 0.000 Min. : 1.0 Min. :0.0000 Min. : 1.00 > Min. > : 52 Min. : 0.008184 Min. :16.71 Min. :0.0000000 Min. : > 0.02727 > 1st Qu.: 0.000 1st Qu.:105.0 1st Qu.:0.0000 1st Qu.: 30.00 1st > Qu.:1290 1st Qu.: 6.747035 1st Qu.:23.92 1st Qu.:0.0000000 1st > Qu.: > 0.11850 > Median : 1.282 Median :169.0 Median :0.2353 Median : 38.00 > Median > :1857 Median :11.310277 Median :26.35 Median :0.0001569 Median : > 0.14625 > Mean : 5.651 Mean :178.7 Mean :0.2555 Mean : 55.03 > Mean > :1907 Mean :12.889021 Mean :26.31 Mean :0.0162043 Mean : > 0.20684 > 3rd Qu.: 5.353 3rd Qu.:262.0 3rd Qu.:0.4315 3rd Qu.: 47.00 3rd > Qu.:2594 3rd Qu.:18.427410 3rd Qu.:28.95 3rd Qu.:0.0144660 3rd > Qu.: > 0.20095 > Max. :195.238 Max. :366.0 Max. :1.0000 Max. :400.00 > Max. > :3832 Max. :29.492380 Max. :31.73 Max. :0.3157486 Max. > :11.76877 >> library(HH) > <output deleted> >> vif(y ~ ., data=df) > x1 x2 x3 x4 x5 x6 x7 x8 > 1.374583 1.252250 1.021672 1.218801 1.015124 1.439868 1.075546 1.060580 >> library(party) > <output deleted> >> mycontrols <- cforest_unbiased(ntree=50, mtry=3) # Small >> forest > but requires a few minutes >> myforest <- cforest(y ~ ., data=df, controls=mycontrols) >> varimp(myforest) > x1 x2 x3 x4 x5 x6 > x7 > x8 > 11.924498 103.180195 16.228864 30.658946 5.053500 12.820551 > 2.113394 > 6.911377 >> varimp(myforest, conditional=TRUE) > Error in model.matrix.default(as.formula(f), data = blocks) : > term 1 would require 9e+12 columns > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >