Kwok, Heemun
2011-Mar-27 06:09 UTC
[R] Hmisc summary.formula formats for binary and continuous variables
Hello, I am using Hmisc summary.formula, latex and Sweave to produce tables for publication. Is it possible to change the formats for binary and continuous variables? I would prefer to show 35 (10%) and 1.5 (1.2-1.8) rather than 10% (35) and 1.2 / 1.5 / 1.8. Here is a simple example: sex <- factor(sample(c("m","f"), 500, rep=TRUE)) age <- rnorm(500, 50, 5) treatment <- factor(sample(c("Drug","Placebo"), 500, rep=TRUE)) s1 <- summary(~sex + age) s2 <- summary(treatment ~ sex + age, method="reverse") print(s1); print(s2) Descriptive Statistics (N=500) +-------+-----------------+ | | | +-------+-----------------+ |sex : m| 46% (232) | +-------+-----------------+ |age |47.22/50.31/53.37| +-------+-----------------+ Descriptive Statistics by treatment +-------+-----------------+-----------------+ | |Drug |Placebo | | |(N=257) |(N=243) | +-------+-----------------+-----------------+ |sex : m| 47% (122) | 45% (110) | +-------+-----------------+-----------------+ |age |47.35/50.00/52.68|46.78/50.92/53.97| +-------+-----------------+-----------------+ Thanks, Heemun ------------------------------------------------- Heemun Kwok, M.D. Research Fellow Harbor-UCLA Department of Emergency Medicine 1000 West Carson Street, Box 21 Torrance, CA 90509-2910 office 310-222-3501, fax 310-212-6101
Joshua Wiley
2011-Mar-27 09:44 UTC
[R] Hmisc summary.formula formats for binary and continuous variables
I played around with this for awhile and did not get very far. I did not see any arguments in summary.formula or its print methods to reorder (happy to be corrected). Another approach I toyed with was to create a custom function to pass to summary.formula() that would itself create (something like) the desired output. foo <- function(x) { n <- length(x) pct <- n/5 c(FOO = paste(n, "(", round(pct, digits = 0), "%)", sep = '')) }> summary(treatment ~ sex + age, fun = foo, method = "response")treatment N=500 +-------+-----------+---+---------+ | | |N |FOO | +-------+-----------+---+---------+ |sex |f |273|273(55%) | | |m |227|227(45%) | +-------+-----------+---+---------+ |age |[36.8,46.7)|125|125(25%) | | |[46.7,50.0)|125|125(25%) | | |[50.0,53.3)|125|125(25%) | | |[53.3,67.5]|125|125(25%) | +-------+-----------+---+---------+ |Overall| |500|500(100%)| +-------+-----------+---+---------+ However, it does not work with method = "reverse". Also, this approach would seem to require either defining a very flexible function or multiple ones for each different situation you come across. Looking at print.summary.formula.reverse, the magic seems to happen on lines 47-50: cs <- formatCats(stats[[i]], nam, tr, type[i], if (length(x$group.freq)) x$group.freq else x$n[i], npct, pctdig, exclude1, long, prtest, pdig = pdig, eps = eps) which lead me to explore formatCats(). A small tweak in the order of the paste() call on lines 25-33 (and creating a copy in of the altered version plus print.summary.formula.reverse in the global environment), got me: print.summary.formula.reverse(summary(treatment ~ sex + age, method="reverse")) Descriptive Statistics by treatment +-------+--------------+--------------+ | |Drug |Placebo | | |(N=262) |(N=238) | +-------+--------------+--------------+ |sex : m| (118) 45% | (114) 48% | +-------+--------------+--------------+ |age |46.5/50.0/53.8|46.6/49.5/52.6| +-------+--------------+--------------+ which has the percentage info on the right side, though I did not take the time to get the parentheses moved over. Still, it seems like adding an argument that just flipped the order might not take that much work/code. Cheers, Josh (Though I cannot help but wonder if in response to "I want to cross the street" I just said "we could start building a two-lane, underground tunnel with...." and someone is probably going to come along and point out the cross walk 10 feet down the street) On Sat, Mar 26, 2011 at 11:09 PM, Kwok, Heemun <hkwok at emedharbor.edu> wrote:> > Hello, > I am using Hmisc summary.formula, latex and Sweave to produce tables for publication. ?Is it possible to change the formats for binary and continuous variables? ?I would prefer to show 35 (10%) and 1.5 (1.2-1.8) rather than 10% (35) and 1.2 / 1.5 / 1.8. Here is a simple example: > > sex <- factor(sample(c("m","f"), 500, rep=TRUE)) > age <- rnorm(500, 50, 5) > treatment <- factor(sample(c("Drug","Placebo"), 500, rep=TRUE)) > > s1 <- summary(~sex + age) > s2 <- summary(treatment ~ sex + age, method="reverse") > print(s1); print(s2) > > Descriptive Statistics ?(N=500) > > +-------+-----------------+ > | ? ? ? | ? ? ? ? ? ? ? ? | > +-------+-----------------+ > |sex : m| ? ?46% (232) ? ?| > +-------+-----------------+ > |age ? ?|47.22/50.31/53.37| > +-------+-----------------+ > > > > Descriptive Statistics by treatment > > +-------+-----------------+-----------------+ > | ? ? ? |Drug ? ? ? ? ? ? |Placebo ? ? ? ? ?| > | ? ? ? |(N=257) ? ? ? ? ?|(N=243) ? ? ? ? ?| > +-------+-----------------+-----------------+ > |sex : m| ? ?47% (122) ? ?| ? ?45% (110) ? ?| > +-------+-----------------+-----------------+ > |age ? ?|47.35/50.00/52.68|46.78/50.92/53.97| > +-------+-----------------+-----------------+ > > Thanks, > Heemun > > > ------------------------------------------------- > Heemun Kwok, M.D. > Research Fellow > Harbor-UCLA Department of Emergency Medicine > 1000 West Carson Street, Box 21 > Torrance, CA 90509-2910 > office 310-222-3501, fax 310-212-6101 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Frank Harrell
2011-Mar-27 17:20 UTC
[R] Hmisc summary.formula formats for binary and continuous variables
If by 35 (10%) you mean that 35 is the numerator, this is not such a good idea. That's because it emphasizes something that is not a scientific quantity. A scientific quantity is something that has a meaning outside the current sample. The numerator is dependent on the denominator. Regarding the other formatting issue, summary.formula with method='reverse' is not flexible enough to allow that. Frank Kwok, Heemun wrote:> > Hello, > I am using Hmisc summary.formula, latex and Sweave to produce tables for > publication. Is it possible to change the formats for binary and > continuous variables? I would prefer to show 35 (10%) and 1.5 (1.2-1.8) > rather than 10% (35) and 1.2 / 1.5 / 1.8. Here is a simple example: > > sex <- factor(sample(c("m","f"), 500, rep=TRUE)) > age <- rnorm(500, 50, 5) > treatment <- factor(sample(c("Drug","Placebo"), > 500, rep=TRUE)) > > s1 <- summary(~sex + age) > s2 <- summary(treatment ~ sex + age, method="reverse") > print(s1); print(s2) > > Descriptive Statistics (N=500) > > +-------+-----------------+ > | | | > +-------+-----------------+ > |sex : m| 46% (232) | > +-------+-----------------+ > |age |47.22/50.31/53.37| > +-------+-----------------+ > > > > Descriptive Statistics by treatment > > +-------+-----------------+-----------------+ > | |Drug |Placebo | > | |(N=257) |(N=243) | > +-------+-----------------+-----------------+ > |sex : m| 47% (122) | 45% (110) | > +-------+-----------------+-----------------+ > |age |47.35/50.00/52.68|46.78/50.92/53.97| > +-------+-----------------+-----------------+ > > Thanks, > Heemun > > > ------------------------------------------------- > Heemun Kwok, M.D. > Research Fellow > Harbor-UCLA Department of Emergency Medicine > 1000 West Carson Street, Box 21 > Torrance, CA 90509-2910 > office 310-222-3501, fax 310-212-6101 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Hmisc-summary-formula-formats-for-binary-and-continuous-variables-tp3408967p3409563.html Sent from the R help mailing list archive at Nabble.com.