Related to my earlier question to which I received very helpful replies, when I provide a subsetting method that automatically drops unused levels of a factor variable, I am getting into a bit of trouble using model.frame.default. I know that model.frame.default has its own mechanism for dropping unused levels, but my personal preference is to handle this on a more basic level using [.factor and to not specify drop.unused.levels=TRUE to model.frame.default. That way subsetting operations that are not carried out by model.frame also work the way I want, especially [.data.frame when I attach or otherwise reference a subset of a data frame. Inside model.frame.default, a 'variables' list is constructed. For factor variables this has all the original levels. Then .Internal(model.frame()) is invoked. This will invoke my local [.factor which drops unused levels. However, model.frame is affected by the disparity in levels between what's in 'variables' and what is returned during [.data.frame (which calls [.factor), causing model.frame to return an invalid factor variable in which levels are shifted and some real levels at the end have zero frequencies [I am leaving `drop.unused.levels'=FALSE when running model.frame]. Is model.frame doing this by intentional design? If not, can it be fixed? It seems to me that to be general .Internal(model.frame()) should not depend on levels not changing when [.data.frame is executed. If model.frame really needs to operate this way, does anyone see a workaround? Thanks again, and I'll put in one more plug for [.factor to be modified so that if a system option 'drop.unused.levels' is TRUE (i.e., NOT by default) drop=TRUE is assumed unless drop=FALSE is explicitly stated by the user. Then I can dispose of my local [.factor once and for all. Frank -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I sent this note last Friday just before the weekend and didn't get any replies. I'm sending it again in the hope that someone will offer some insight. -Frank Related to my earlier question to which I received very helpful replies, when I provide a subsetting method that automatically drops unused levels of a factor variable, I am getting into a bit of trouble using model.frame.default. I know that model.frame.default has its own mechanism for dropping unused levels, but my personal preference is to handle this on a more basic level using [.factor and to not specify drop.unused.levels=TRUE to model.frame.default. That way subsetting operations that are not carried out by model.frame also work the way I want, especially [.data.frame when I attach or otherwise reference a subset of a data frame. Inside model.frame.default, a 'variables' list is constructed. For factor variables this has all the original levels. Then .Internal(model.frame()) is invoked. This will invoke my local [.factor which drops unused levels. However, model.frame is affected by the disparity in levels between what's in 'variables' and what is returned during [.data.frame (which calls [.factor), causing model.frame to return an invalid factor variable in which levels are shifted and some real levels at the end have zero frequencies [I am leaving `drop.unused.levels'=FALSE when running model.frame]. Is model.frame doing this by intentional design? If not, can it be fixed? It seems to me that to be general .Internal(model.frame()) should not depend on levels not changing when [.data.frame is executed. If model.frame really needs to operate this way, does anyone see a workaround? Thanks again, and I'll put in one more plug for [.factor to be modified so that if a system option 'drop.unused.levels' is TRUE (i.e., NOT by default) drop=TRUE is assumed unless drop=FALSE is explicitly stated by the user. Then I can dispose of my local [.factor once and for all. Frank -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Frank E Harrell Jr writes:> Thanks again, and I'll put in one more plug for [.factor to be modified so > that if a system option 'drop.unused.levels' is TRUE (i.e., NOT by default) > drop=TRUE is assumed unless drop=FALSE is explicitly stated by the user. > Then I can dispose of my local [.factor once and for all.I'll second this notion. Although a wiser man than I would hesitate to comment on the relationship between R-core and Professor Harrell, it would seem a nice thing for R-core to do to "Welcome Aboard" Professor Harrell to the land of R, given his many contributions (Hmisc, Design, et cetera) to the S language over the years. Of course, there is probably some deep design issue why this is hard to do or some (obvious) statistical reason why one would not want R to provide this much "rope" to unsophisticated users. If so, I would enjoy being educated about the issues involved. In my own small corner of the world, I would use such an option. One of the biggest complaints that my colleagues have about dataframes is precisely this behavior. It was also one of my own biggest confusions when starting with S+. Just my 2 cents, Dave Kane -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._