If I use latex(summary(X)) where X is a data frame with four variables I get something like Rainfall Education Popden Nonwhite Min. :10.00 Min. : 9.00 Min. :1441 Min. : 0.80 1st Qu.:32.75 1st Qu.:10.40 1st Qu.:3104 1st Qu.: 4.95 Median :38.00 Median :11.05 Median :3567 Median :10.40 Mean :37.37 Mean :10.97 Mean :3866 Mean :11.87 3rd Qu.:43.25 3rd Qu.:11.50 3rd Qu.:4520 3rd Qu.:15.65 Max. :60.00 Max. :12.30 Max. :9699 Max. :38.50 where the row headings are repeated four times times. Is there an easy way to get a nicely formatted table, something like Rainfall Education Popden Nonwhite Min. 10.00 9.00 1441 0.80 1st Qu. 32.75 10.40 3104 4.95 Median 38.00 11.05 3567 10.40 Mean 37.37 10.97 3866 11.87 3rd Qu. 43.25 11.50 4520 15.65 Max. 60.00 12.30 9699 38.50 Steve
steve wrote:> If I use latex(summary(X)) where X is a data frame with four > variables I get something like > > Rainfall Education Popden Nonwhite > Min. :10.00 Min. : 9.00 Min. :1441 Min. : 0.80 > 1st Qu.:32.75 1st Qu.:10.40 1st Qu.:3104 1st Qu.: 4.95 > Median :38.00 Median :11.05 Median :3567 Median :10.40 > Mean :37.37 Mean :10.97 Mean :3866 Mean :11.87 > 3rd Qu.:43.25 3rd Qu.:11.50 3rd Qu.:4520 3rd Qu.:15.65 > Max. :60.00 Max. :12.30 Max. :9699 Max. :38.50 > > > where the row headings are repeated four times times. > Is there an easy way to get a nicely formatted table, > something like > > Rainfall Education Popden Nonwhite > Min. 10.00 9.00 1441 0.80 > 1st Qu. 32.75 10.40 3104 4.95 > Median 38.00 11.05 3567 10.40 > Mean 37.37 10.97 3866 11.87 > 3rd Qu. 43.25 11.50 4520 15.65 > Max. 60.00 12.30 9699 38.50 > > >Hmm, no. Not without further ado. The function summary.data.frame produces a table with character entries like "Min. : 1.00 ". To do better, you first have to note that it can only possibly work for purely numeric data frames. If you have one of those, then you might base something off sapply(X, summary), except that it won't work if only some columns have NA's. Here's an idea:> my.summary <- function(x){s <- summary(x); if (length(s)==6)c(s,"NA's"=0) else s}> sapply(airquality,my.summary)Ozone Solar.R Wind Temp Month Day Min. 1.00 7.0 1.700 56.00 5.000 1.0 1st Qu. 18.00 115.8 7.400 72.00 6.000 8.0 Median 31.50 205.0 9.700 79.00 7.000 16.0 Mean 42.13 185.9 9.958 77.88 6.993 15.8 3rd Qu. 63.25 258.8 11.500 85.00 8.000 23.0 Max. 168.00 334.0 20.700 97.00 9.000 31.0 NA's 37.00 7.0 0.000 0.00 0.000 0.0 However, there's an issue with the NA count getting displayed to three decimal places...
On Thu, 2006-12-14 at 16:37 -0500, steve wrote:> If I use latex(summary(X)) where X is a data frame with four > variables I get something like > > Rainfall Education Popden Nonwhite > Min. :10.00 Min. : 9.00 Min. :1441 Min. : 0.80 > 1st Qu.:32.75 1st Qu.:10.40 1st Qu.:3104 1st Qu.: 4.95 > Median :38.00 Median :11.05 Median :3567 Median :10.40 > Mean :37.37 Mean :10.97 Mean :3866 Mean :11.87 > 3rd Qu.:43.25 3rd Qu.:11.50 3rd Qu.:4520 3rd Qu.:15.65 > Max. :60.00 Max. :12.30 Max. :9699 Max. :38.50 > > > where the row headings are repeated four times times. > Is there an easy way to get a nicely formatted table, > something like > > Rainfall Education Popden Nonwhite > Min. 10.00 9.00 1441 0.80 > 1st Qu. 32.75 10.40 3104 4.95 > Median 38.00 11.05 3567 10.40 > Mean 37.37 10.97 3866 11.87 > 3rd Qu. 43.25 11.50 4520 15.65 > Max. 60.00 12.30 9699 38.50 > > > SteveThe problem is that summary(), as above, returns a character based table/matrix. For example, using the 'iris' data set:> summary(iris[, 1:4])Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500> str(summary(iris[, 1:4]))'table' chr [1:6, 1:4] "Min. :4.300 " "1st Qu.:5.100 " ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:6] "" "" "" "" ... ..$ : chr [1:4] " Sepal.Length" " Sepal.Width" " Petal.Length" " Petal.Width" Hence, the numbers are not separate from the labels, but part of the table elements. I might be tempted to construct a better underlying function that just returned the summary statistics as unformatted numbers in a matrix. It seems to me that there are such functions, for example in the Hmisc and the doBY packages, on CRAN. Since you are using latex(), you already have Hmisc. That being said, you could brute force something like this: # See ?strsplit and ?sapply mat <- matrix(sapply(strsplit(summary(iris[, 1:4]), ":"), "[[", 2), ncol = 4)> mat[,1] [,2] [,3] [,4] [1,] "4.300 " "2.000 " "1.000 " "0.100 " [2,] "5.100 " "2.800 " "1.600 " "0.300 " [3,] "5.800 " "3.000 " "4.350 " "1.300 " [4,] "5.843 " "3.057 " "3.758 " "1.199 " [5,] "6.400 " "3.300 " "5.100 " "1.800 " [6,] "7.900 " "4.400 " "6.900 " "2.500 " Then add the row and column titles: rownames(mat) <- c("Min", "1st Qu", "Median", "Mean", "3rd Qu", "Max") colnames(mat) <- colnames(iris[1:4])> matSepal.Length Sepal.Width Petal.Length Petal.Width Min "4.300 " "2.000 " "1.000 " "0.100 " 1st Qu "5.100 " "2.800 " "1.600 " "0.300 " Median "5.800 " "3.000 " "4.350 " "1.300 " Mean "5.843 " "3.057 " "3.758 " "1.199 " 3rd Qu "6.400 " "3.300 " "5.100 " "1.800 " Max "7.900 " "4.400 " "6.900 " "2.500 "> latex(mat, file = "")% latex.default(mat, file = "") % \begin{table}[!tbp] \begin{center} \begin{tabular}{lllll}\hline\hline \multicolumn{1}{l}{mat}& \multicolumn{1}{c}{Sepal.Length}& \multicolumn{1}{c}{Sepal.Width}& \multicolumn{1}{c}{Petal.Length}& \multicolumn{1}{c}{Petal.Width} \\ \hline Min&4.300 &2.000 &1.000 &0.100 \\ 1st Qu&5.100 &2.800 &1.600 &0.300 \\ Median&5.800 &3.000 &4.350 &1.300 \\ Mean&5.843 &3.057 &3.758 &1.199 \\ 3rd Qu&6.400 &3.300 &5.100 &1.800 \\ Max&7.900 &4.400 &6.900 &2.500 \\ \hline \end{tabular} \end{center} \end{table} HTH, Marc Schwartz
How about:> apply(iris[, 1:4], 2, summary)Sepal.Length Sepal.Width Petal.Length Petal.Width Min. 4.300 2.000 1.000 0.100 1st Qu. 5.100 2.800 1.600 0.300 Median 5.800 3.000 4.350 1.300 Mean 5.843 3.057 3.758 1.199 3rd Qu. 6.400 3.300 5.100 1.800 Max. 7.900 4.400 6.900 2.500 Max -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Marc Schwartz Sent: Thursday, December 14, 2006 5:02 PM To: steve Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Nicely formatted tables On Thu, 2006-12-14 at 16:37 -0500, steve wrote:> If I use latex(summary(X)) where X is a data frame with four > variables I get something like > > Rainfall Education Popden Nonwhite > Min. :10.00 Min. : 9.00 Min. :1441 Min. : 0.80 > 1st Qu.:32.75 1st Qu.:10.40 1st Qu.:3104 1st Qu.: 4.95 > Median :38.00 Median :11.05 Median :3567 Median :10.40 > Mean :37.37 Mean :10.97 Mean :3866 Mean :11.87 > 3rd Qu.:43.25 3rd Qu.:11.50 3rd Qu.:4520 3rd Qu.:15.65 > Max. :60.00 Max. :12.30 Max. :9699 Max. :38.50 > > > where the row headings are repeated four times times. > Is there an easy way to get a nicely formatted table, > something like > > Rainfall Education Popden Nonwhite > Min. 10.00 9.00 1441 0.80 > 1st Qu. 32.75 10.40 3104 4.95 > Median 38.00 11.05 3567 10.40 > Mean 37.37 10.97 3866 11.87 > 3rd Qu. 43.25 11.50 4520 15.65 > Max. 60.00 12.30 9699 38.50 > > > SteveThe problem is that summary(), as above, returns a character based table/matrix. For example, using the 'iris' data set:> summary(iris[, 1:4])Sepal.Length Sepal.Width Petal.Length Petal.Width Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 Median :5.800 Median :3.000 Median :4.350 Median :1.300 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500> str(summary(iris[, 1:4]))'table' chr [1:6, 1:4] "Min. :4.300 " "1st Qu.:5.100 " ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:6] "" "" "" "" ... ..$ : chr [1:4] " Sepal.Length" " Sepal.Width" " Petal.Length" " Petal.Width" Hence, the numbers are not separate from the labels, but part of the table elements. I might be tempted to construct a better underlying function that just returned the summary statistics as unformatted numbers in a matrix. It seems to me that there are such functions, for example in the Hmisc and the doBY packages, on CRAN. Since you are using latex(), you already have Hmisc. That being said, you could brute force something like this: # See ?strsplit and ?sapply mat <- matrix(sapply(strsplit(summary(iris[, 1:4]), ":"), "[[", 2), ncol = 4)> mat[,1] [,2] [,3] [,4] [1,] "4.300 " "2.000 " "1.000 " "0.100 " [2,] "5.100 " "2.800 " "1.600 " "0.300 " [3,] "5.800 " "3.000 " "4.350 " "1.300 " [4,] "5.843 " "3.057 " "3.758 " "1.199 " [5,] "6.400 " "3.300 " "5.100 " "1.800 " [6,] "7.900 " "4.400 " "6.900 " "2.500 " Then add the row and column titles: rownames(mat) <- c("Min", "1st Qu", "Median", "Mean", "3rd Qu", "Max") colnames(mat) <- colnames(iris[1:4])> matSepal.Length Sepal.Width Petal.Length Petal.Width Min "4.300 " "2.000 " "1.000 " "0.100 " 1st Qu "5.100 " "2.800 " "1.600 " "0.300 " Median "5.800 " "3.000 " "4.350 " "1.300 " Mean "5.843 " "3.057 " "3.758 " "1.199 " 3rd Qu "6.400 " "3.300 " "5.100 " "1.800 " Max "7.900 " "4.400 " "6.900 " "2.500 "> latex(mat, file = "")% latex.default(mat, file = "") % \begin{table}[!tbp] \begin{center} \begin{tabular}{lllll}\hline\hline \multicolumn{1}{l}{mat}& \multicolumn{1}{c}{Sepal.Length}& \multicolumn{1}{c}{Sepal.Width}& \multicolumn{1}{c}{Petal.Length}& \multicolumn{1}{c}{Petal.Width} \\ \hline Min&4.300 &2.000 &1.000 &0.100 \\ 1st Qu&5.100 &2.800 &1.600 &0.300 \\ Median&5.800 &3.000 &4.350 &1.300 \\ Mean&5.843 &3.057 &3.758 &1.199 \\ 3rd Qu&6.400 &3.300 &5.100 &1.800 \\ Max&7.900 &4.400 &6.900 &2.500 \\ \hline \end{tabular} \end{center} \end{table} HTH, Marc Schwartz ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ---------------------------------------------------------------------- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
On Thu, 2006-12-14 at 17:09 -0500, Kuhn, Max wrote:> How about: > > > apply(iris[, 1:4], 2, summary) > Sepal.Length Sepal.Width Petal.Length Petal.Width > Min. 4.300 2.000 1.000 0.100 > 1st Qu. 5.100 2.800 1.600 0.300 > Median 5.800 3.000 4.350 1.300 > Mean 5.843 3.057 3.758 1.199 > 3rd Qu. 6.400 3.300 5.100 1.800 > Max. 7.900 4.400 6.900 2.500 > > Max<snip> Yep, that will do it too Max. :-) Thanks for pointing it out. Clearly, in need of more oxygen to the old cranium... Regards, Marc
> apply(iris[, 1:4], 2, summary)Nice solution! However, latex(apply(iris[, 1:4], 2, summary)) has the odd effect that the upper left corner is "apply". This is the "title", so to produce a file "abc.tex" and have an empty upper left corner you need latex(apply(iris[, 1:4], 2, summary),title="",file="abc.tex") And, since I wanted a more compact table, the following works just as expected: latex(format(apply(iris[, 1:4], 2, summary),digits=2),title="",file="abc.tex") thank you! Steve