Juan Telleria Ruiz de Aguirre
2018-Jul-30 19:27 UTC
[Rd] Code Optimization: print.data.frame + as.data.frame(head(x, n = options("max.print")))
Dear R Developers, I would like to propose a simple optimization for print.data.frame base function: To add: x <- as.data.frame(head(x, n = options("max.print"))) This would prevent that, if for example, we have a 10GB data.frame (e.g.: Instead of a data.table), and we accidentally print it, the R Session does not "collapse", forcing us to press ESC or kill the RSession. function (x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE) { n <- length(row.names(x)) if (length(x) == 0L) { cat(sprintf(ngettext(n, "data frame with 0 columns and %d row", "data frame with 0 columns and %d rows"), n), "\n", sep = "") } else if (n == 0L) { print.default(names(x), quote = FALSE) cat(gettext("<0 rows> (or 0-length row.names)\n")) } else { x <- as.data.frame(head(x, n = options("max.print"))) m <- as.matrix(format.data.frame(x, digits = digits, na.encode = FALSE)) if (!isTRUE(row.names)) dimnames(m)[[1L]] <- if (isFALSE(row.names)) rep.int("", n) else row.names print(m, ..., quote = quote, right = right) } invisible(x) } Thank you. Best, Juan
Juan Telleria Ruiz de Aguirre
2018-Jul-31 06:19 UTC
[Rd] Code Optimization: print.data.frame + as.data.frame(head(x, n = options("max.print")))
I polished a little bit more the function: * Used: getOption("max.print") * Added comment at the end: cat('[ reached getOption("max.print") -- omitted ', omitted,' rows ]') function (x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE) { n <- length(row.names(x)) if (length(x) == 0L) { cat(sprintf(ngettext(n, "data frame with 0 columns and %d row", "data frame with 0 columns and %d rows"), n), "\n", sep = "") } else if (n == 0L) { print.default(names(x), quote = FALSE) cat(gettext("<0 rows> (or 0-length row.names)\n")) } else { omitted <- nrow(x)-getOption("max.print") x <- as.data.frame(head(x, n = getOption("max.print"))) m <- as.matrix(format.data.frame(x, digits = digits, na.encode = FALSE)) if (!isTRUE(row.names)) dimnames(m)[[1L]] <- if (isFALSE(row.names)) rep.int("", n) else row.names print(m, ..., quote = quote, right = right) if((nrow(x)-getOption("max.print"))>0){ cat('[ reached getOption("max.print") -- omitted ', omitted,' rows ]') } } invisible(x) } [[alternative HTML version deleted]]
Martin Maechler
2018-Jul-31 07:33 UTC
[Rd] Code Optimization: print.data.frame + as.data.frame(head(x, n = options("max.print")))
>>>>> Juan Telleria Ruiz de Aguirre >>>>> on Tue, 31 Jul 2018 08:19:33 +0200 writes:> I polished a little bit more the function: > * Used: getOption("max.print") > * Added comment at the end: cat('[ reached getOption("max.print") -- > omitted ', omitted,' rows ]') > I polished a little bit more the function: > * Used: getOption("max.print") > * Added comment at the end: cat('[ reached getOption("max.print") -- > omitted ', omitted,' rows ]') and before > I would like to propose a simple optimization for print.data.frame > base function: > > To add: x <- as.data.frame(head(x, n = options("max.print"))) > > This would prevent that, if for example, we have a 10GB data.frame > (e.g.: Instead of a data.table), and we accidentally print it, the R > Session does not "collapse", forcing us to press ESC or kill the > RSession. Thank you, Juan. You are right: The whole idea of introducing the 'max.print' option (and the corresponding 'max' argument in print.default() {and print.Date() currently }) was that print() ing should not use too much resources. and you are also right to use 'max.print' .. but R should be as functional a language as sensible, and hence print(<data.frame>) should be getting an argument 'max' which by default is equal to the "max.print" option. Also, any good citizen print() method *must* return its argument invisibly. ==> you are not supposed to change 'x' here. But I entirely agree with your basic intuition for the problem resolution. Very good, thank you, indeed! I'm currently running 'make check-all' with the following change to the source code (aka "patch") : ==================================================================--- src/library/base/R/dataframe.R (revision 75016) +++ src/library/base/R/dataframe.R (working copy) @@ -1477,7 +1477,7 @@ print.data.frame <- function(x, ..., digits = NULL, quote = FALSE, right = TRUE, - row.names = TRUE) + row.names = TRUE, max = NULL) { n <- length(row.names(x)) if(length(x) == 0L) { @@ -1489,12 +1489,19 @@ print.default(names(x), quote = FALSE) cat(gettext("<0 rows> (or 0-length row.names)\n")) } else { + if(is.null(max)) max <- getOption("max.print", 99999L) ## format.<*>() : avoiding picking up e.g. format.AsIs - m <- as.matrix(format.data.frame(x, digits = digits, na.encode = FALSE)) + omit <- (n0 <- max %/% length(x)) < n + m <- as.matrix( + format.data.frame(if(omit) x[seq_len(n0), , drop=FALSE] else x, + digits = digits, na.encode = FALSE)) if(!isTRUE(row.names)) dimnames(m)[[1L]] <- if(isFALSE(row.names)) rep.int("", n) else row.names print(m, ..., quote = quote, right = right) + if(omit) + cat(" [ reached 'max' / getOption(\"max.print\") -- omitted", + n - n0, "rows ]\n") } invisible(x) }