I was using summarize() in a data set in which one of the levels of the by variable was "". The summary statistic was consistently off by one level and the "" level was not in the output data frame. I tried to report it as a bug, but I could not log into the Hmisc bug reporting website to do so. I searched for this in the email archives. If it's there, I failed to find it. Should I try to pursue this as a bug, or am I using summarize incorrectly? Here is my example along with the output:> tst1 <- data.frame(a=factor(c("", "A", "B", "C")),+ x=1:4)> tst1a x 1 1 2 A 2 3 B 3 4 C 4> with(tst1, summarize(x, by=llist(a), FUN=mean))a x 1 A 1 2 B 2 3 C 3> with(tst1, aggregate(x, by=list(a), FUN=mean))Group.1 x 1 1 2 A 2 3 B 3 4 C 4> sessionInfo()R version 2.9.0 (2009-04-17) i486-pc-linux-gnu locale: LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Hmisc_3.6-0 loaded via a namespace (and not attached): [1] cluster_1.11.13 grid_2.9.0 lattice_0.17-22 Michael
Frank E Harrell Jr
2009-Jun-13 13:46 UTC
[R] Hmisc summarize() with level "" in by variable
Sorry about the bug, which is now fixed. You can get the fix by entering source('http://biostat.mc.vanderbilt.edu/cgi-bin/viewvc.cgi/*checkout*/Hmisc/trunk/R/summary.formula.s?rev=661') until we update the package. Frank Michael Erickson wrote:> I was using summarize() in a data set in which one of the levels of > the by variable was "". The summary statistic was consistently off by > one level and the "" level was not in the output data frame. I tried > to report it as a bug, but I could not log into the Hmisc bug > reporting website to do so. I searched for this in the email > archives. If it's there, I failed to find it. Should I try to pursue > this as a bug, or am I using summarize incorrectly? Here is my > example along with the output: > >> tst1 <- data.frame(a=factor(c("", "A", "B", "C")), > + x=1:4) >> tst1 > a x > 1 1 > 2 A 2 > 3 B 3 > 4 C 4 >> with(tst1, summarize(x, by=llist(a), FUN=mean)) > a x > 1 A 1 > 2 B 2 > 3 C 3 >> with(tst1, aggregate(x, by=list(a), FUN=mean)) > Group.1 x > 1 1 > 2 A 2 > 3 B 3 > 4 C 4 > >> sessionInfo() > R version 2.9.0 (2009-04-17) > i486-pc-linux-gnu > > locale: > LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Hmisc_3.6-0 > > loaded via a namespace (and not attached): > [1] cluster_1.11.13 grid_2.9.0 lattice_0.17-22 > > > Michael >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University