Keith Wong
2007-May-14 01:56 UTC
[R] Nicely formatted summary table with mean, standard deviation or number and proportion
Dear all, The incredibly useful Hmisc package provides a method to generate summary tables that can be typeset in latex. The Alzola and Harrell book "An introduction to S and the Hmisc and Design libraries" provides an example that generates mean and quartiles for continuous variables, and numbers and percentages for count variables: summary() with method = 'reverse'. I wonder if there is a way to change it so the mean and standard deviation are reported instead for continuous variables. I illustrate my question below using an example from the book. Thank you. Keith > #### > library(Hmisc) > > set.seed(173) > sex = factor(sample(c("m", "f"), 500, rep = T)) > age = rnorm(500, 50, 5) > treatment = factor(sample(c("Drug", "Placebo"), 500, rep = T)) > summary(sex ~ treatment, fun = table) sex N=500 +---------+-------+---+---+---+ | | |N |f |m | +---------+-------+---+---+---+ |treatment|Drug |263|140|123| | |Placebo|237|133|104| +---------+-------+---+---+---+ |Overall | |500|273|227| +---------+-------+---+---+---+ > > > > (x = summary(treatment ~ age + sex, method = "reverse")) > # generates quartiles for continuous variables Descriptive Statistics by treatment +-------+--------------+--------------+ | |Drug |Placebo | | |(N=263) |(N=237) | +-------+--------------+--------------+ |age |46.5/49.9/53.2|46.7/50.0/53.4| +-------+--------------+--------------+ |sex : m| 47% (123) | 44% (104) | +-------+--------------+--------------+ > > > # latex(x) generates a very nicely formatted table > # but I'd like "mean (standard deviation)" instead of quartiles. > # this function from http://tolstoy.newcastle.edu.au/R/e2/help/06/11/4713.html > g <- function(y) { + s <- apply(y, 2, + function(z) { + z <- z[!is.na(z)] + n <- length(z) + if(n==0) c(NA,NA,NA,0) else + if(n==1) c(z, NA,NA,1) else { + m <- mean(z) + s <- sd(z) + c(N=n, Mean=m, SD=s) + } + }) + w <- as.vector(s) + names(w) <- as.vector(outer(rownames(s), colnames(s), paste, sep='')) + w + } > > summary(treatment ~ age + sex, method = "reverse", fun = g) > # does not work, 'fun' or 'FUN" argument is ignored. Descriptive Statistics by treatment +-------+--------------+--------------+ | |Drug |Placebo | | |(N=263) |(N=237) | +-------+--------------+--------------+ |age |46.5/49.9/53.2|46.7/50.0/53.4| +-------+--------------+--------------+ |sex : m| 47% (123) | 44% (104) | +-------+--------------+--------------+ > > > (x1 = summarize(cbind(age), llist(treatment), FUN = g, stat.name=c("n", "mean", "sd"))) treatment n mean sd 1 Drug 263 49.9 4.94 2 Placebo 237 50.1 4.97 > > # this works but table is rotated, and it count data has to be > # treated separately. -- Keith Wong PhD candidate Sleep & Circadian Research Group Woolcock Institute of Medical Research email keithw at med.usyd.edu.au Phone +61 2 9515 8981 Fax +61 2 9515 7070 Mail PO Box M77, Missenden Road NSW 2050, Australia
Frank E Harrell Jr
2007-May-14 02:11 UTC
[R] Nicely formatted summary table with mean, standard deviation or number and proportion
Keith Wong wrote:> Dear all, > > The incredibly useful Hmisc package provides a method to generate > summary tables that can be typeset in latex. The Alzola and Harrell book > "An introduction to S and the Hmisc and Design libraries" provides an > example that generates mean and quartiles for continuous variables, and > numbers and percentages for count variables: summary() with method = > 'reverse'. > > I wonder if there is a way to change it so the mean and standard > deviation are reported instead for continuous variables. > > I illustrate my question below using an example from the book. > > Thank you. > > KeithNewer versions of Hmisc have an option to add mean and SD for method='reverse'. Quartiles are always there. Frank> > > > #### > > library(Hmisc) > > > > set.seed(173) > > sex = factor(sample(c("m", "f"), 500, rep = T)) > > age = rnorm(500, 50, 5) > > treatment = factor(sample(c("Drug", "Placebo"), 500, rep = T)) > > summary(sex ~ treatment, fun = table) > sex N=500 > > +---------+-------+---+---+---+ > | | |N |f |m | > +---------+-------+---+---+---+ > |treatment|Drug |263|140|123| > | |Placebo|237|133|104| > +---------+-------+---+---+---+ > |Overall | |500|273|227| > +---------+-------+---+---+---+ > > > > > > > > (x = summary(treatment ~ age + sex, method = "reverse")) > > # generates quartiles for continuous variables > > > Descriptive Statistics by treatment > > +-------+--------------+--------------+ > | |Drug |Placebo | > | |(N=263) |(N=237) | > +-------+--------------+--------------+ > |age |46.5/49.9/53.2|46.7/50.0/53.4| > +-------+--------------+--------------+ > |sex : m| 47% (123) | 44% (104) | > +-------+--------------+--------------+ > > > > > > # latex(x) generates a very nicely formatted table > > # but I'd like "mean (standard deviation)" instead of quartiles. > > > > > # this function from > http://tolstoy.newcastle.edu.au/R/e2/help/06/11/4713.html > > g <- function(y) { > + s <- apply(y, 2, > + function(z) { > + z <- z[!is.na(z)] > + n <- length(z) > + if(n==0) c(NA,NA,NA,0) else > + if(n==1) c(z, NA,NA,1) else { > + m <- mean(z) > + s <- sd(z) > + c(N=n, Mean=m, SD=s) > + } > + }) > + w <- as.vector(s) > + names(w) <- as.vector(outer(rownames(s), colnames(s), paste, sep='')) > + w > + } > > > > > summary(treatment ~ age + sex, method = "reverse", fun = g) > > # does not work, 'fun' or 'FUN" argument is ignored. > > > Descriptive Statistics by treatment > > +-------+--------------+--------------+ > | |Drug |Placebo | > | |(N=263) |(N=237) | > +-------+--------------+--------------+ > |age |46.5/49.9/53.2|46.7/50.0/53.4| > +-------+--------------+--------------+ > |sex : m| 47% (123) | 44% (104) | > +-------+--------------+--------------+ > > > > > > (x1 = summarize(cbind(age), llist(treatment), FUN = g, > stat.name=c("n", "mean", "sd"))) > treatment n mean sd > 1 Drug 263 49.9 4.94 > 2 Placebo 237 50.1 4.97 > > > > # this works but table is rotated, and it count data has to be > > # treated separately. > > >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University