Henrik Bengtsson
2014-Feb-04 03:22 UTC
[Rd] capture.output(): Using a rawConnection() [linear] instead of textConnection() [exponential]?
I've noticed that the processing time for the default capture.output() grows exponentially in the number of characters outputted/captured. The default settings sinks to a temporary textConnection(). When instead sinking to a rawConnection(), the processing time becomes linear. See below example and attached PNG figure [also at http://xfer.aroma-project.org/tmp/20140203/capture.output_text-vs-raw.png]. I know little about text encoding, but I wouldn't be surprised I'm overseeing encoding issues when sinking to raw followed by rawToChar(). According to the below example, both approaches seem to work with my current setup (sessionInfo() below), but on the other hand it's only writing an ASCII "a" character multiple times. Q. Is it possible to utilize rawConnection() instead of textConnection() and still get the same output regardless of encodings? Q. What causes textConnection() to be exponential when rawConnection() is linear? REPRODUCIBLE EXAMPLE: captureViaText <- function(..., local=TRUE, envir=parent.frame()) { eval({ # Basically what capture.output() does by default file <- textConnection("rval", open="w", local=local) capture.output(..., file=file) close(file) rval }, envir=envir) } # captureViaText() captureViaRaw <- function(..., local=TRUE, envir=parent.frame()) { eval({ # Basically what capture.output() does by default file <- rawConnection(raw(0L), open="w") capture.output(..., file=file) res <- rawConnectionValue(file) close(file) res <- rawToChar(res) res <- unlist(strsplit(res, split="\n", fixed=TRUE)) res }, envir=envir) } # captureViaRaw() data <- data.frame(size=10^seq(from=3, to=7.9, by=0.1), captureViaText=NA, captureViaRaw=NA) names <- colnames(data)[-1] for (kk in which(is.na(data[[ncol(data)]]))) { # String to output x <- rep("a", times=data$size[kk]) x[seq(from=1L, to=length(x), by=1e3)] <- "\n" x <- paste(x, collapse="") cat(sprintf("Output size: %d characters\n", nchar(x))) # Benchmark values <- list() for (name in names) { print(name) fcn <- get(name, mode="function") t <- system.time({ res <- fcn(cat(x)) }, gcFirst=TRUE) n <- sum(nchar(res)) values[[name]] <- res rm(list="res") cat(sprintf("%s: %.2g seconds to capture %d characters\n", name, t[3], n)) data[kk,name] <- t[3] } # for (name ...) # Assert same captures for (cc in seq_along(values)) stopifnot(identical(values[[cc]], values[[1]])) rm(list="values") } # for (size ...) local({ png("capture.output_text-vs-raw.png", width=840, height=840) on.exit(dev.off()) plot(data[,c(1,2)], type="n", xlab="Number of characters", ylab="Time (s)") for (cc in 2:ncol(data)) lines(data[,c(1,cc)], lwd=2, col=cc) legend("topleft", lwd=2, col=2:ncol(data), names, bty="n") })> sessionInfo()R Under development (unstable) (2014-01-30 r64901) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.1.0 Thanks, Henrik -------------- next part -------------- A non-text attachment was scrubbed... Name: capture.output_text-vs-raw.png Type: image/png Size: 6579 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140203/ee20d00f/attachment.png>