Hadley Wickham
2013-Oct-30 22:22 UTC
[Rd] Huge performance difference between implicit and explicit print
Hi all, Can anyone help me understand why an implicit print (i.e. just typing df at the console), is so much slower than an explicit print (i.e. print(df)) in the example below? I see the difference in both Rstudio and in a terminal. # Construct large df as quickly as possible dummy <- 1:18e6 df <- lapply(1:10, function(x) dummy) names(df) <- letters[1:10] class(df) <- c("myobj", "data.frame") attr(df, "row.names") <- .set_row_names(18e6) print.myobj <- function(x, ...) { print.data.frame(head(x, 2)) } start <- proc.time(); df; flush.console(); proc.time() - start # user system elapsed # 0.408 0.557 0.965 start <- proc.time(); print(df); flush.console(); proc.time() - start # user system elapsed # 0.019 0.002 0.020 sessionInfo() # R version 3.0.2 (2013-09-25) # Platform: x86_64-apple-darwin10.8.0 (64-bit) # # locale: # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 # # attached base packages: # [1] stats graphics grDevices utils datasets methods base Thanks! Hadley -- Chief Scientist, RStudio http://had.co.nz/
Gabor Grothendieck
2013-Oct-30 23:14 UTC
[Rd] Huge performance difference between implicit and explicit print
On Wed, Oct 30, 2013 at 6:22 PM, Hadley Wickham <h.wickham at gmail.com> wrote:> Hi all, > > Can anyone help me understand why an implicit print (i.e. just typing > df at the console), is so much slower than an explicit print (i.e. > print(df)) in the example below? I see the difference in both Rstudio > and in a terminal. > > # Construct large df as quickly as possible > dummy <- 1:18e6 > df <- lapply(1:10, function(x) dummy) > names(df) <- letters[1:10] > class(df) <- c("myobj", "data.frame") > attr(df, "row.names") <- .set_row_names(18e6) > > print.myobj <- function(x, ...) { > print.data.frame(head(x, 2)) > } > > start <- proc.time(); df; flush.console(); proc.time() - start > # user system elapsed > # 0.408 0.557 0.965 > start <- proc.time(); print(df); flush.console(); proc.time() - start > # user system elapsed > # 0.019 0.002 0.020If I change print(df) to print.data.frame(df) it hangs. R version 3.0.2 Patched (2013-10-06 r64031) -- "Frisbee Sailing" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit)
Gabriel Becker
2013-Oct-30 23:32 UTC
[Rd] Huge performance difference between implicit and explicit print
Hadley, As far as I can tell from a quick look, it is because implicit printing uses a different mechanism which does a fair bit more work.>From comments in print.c in the R sources:* print.default() -> do_printdefault (with call tree below) * * auto-printing -> PrintValueEnv * -> PrintValueRec * -> call print() for objects * Note that auto-printing does not call print.default. * PrintValue, R_PV are similar to auto-printing. PrintValueEnv includes, among other things, checks for functions, S4 objects, and s3 objects before constructing (in C code) an R call to print for S3 objects and show for S4 objects and evaluating it using Rf_eval. So there is an extra trip to the R evaluator. I imagine that extra work is where the hangup is but that is a slightly-informed guess as I haven't done any detailed timings or checks. Basically my understanding of the processes is as follows: print(df) print call is evaluated, S3 dispatch happens, print.default in C is called, result printed to terminal, print call returns df expression "df" evaluated, auto-print initiated, type of object returned by expression is determined, print call is constructed in C code, print call is evaluated in C code, THEN all the stuff above happens. I dunno if that helps or not as I can't speak to how to change/fix it atm. ~G On Wed, Oct 30, 2013 at 3:22 PM, Hadley Wickham <h.wickham@gmail.com> wrote:> Hi all, > > Can anyone help me understand why an implicit print (i.e. just typing > df at the console), is so much slower than an explicit print (i.e. > print(df)) in the example below? I see the difference in both Rstudio > and in a terminal. > > # Construct large df as quickly as possible > dummy <- 1:18e6 > df <- lapply(1:10, function(x) dummy) > names(df) <- letters[1:10] > class(df) <- c("myobj", "data.frame") > attr(df, "row.names") <- .set_row_names(18e6) > > print.myobj <- function(x, ...) { > print.data.frame(head(x, 2)) > } > > start <- proc.time(); df; flush.console(); proc.time() - start > # user system elapsed > # 0.408 0.557 0.965 > start <- proc.time(); print(df); flush.console(); proc.time() - start > # user system elapsed > # 0.019 0.002 0.020 > > sessionInfo() > # R version 3.0.2 (2013-09-25) > # Platform: x86_64-apple-darwin10.8.0 (64-bit) > # > # locale: > # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > # > # attached base packages: > # [1] stats graphics grDevices utils datasets methods base > > Thanks! > > Hadley > > -- > Chief Scientist, RStudio > http://had.co.nz/ > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Gabriel Becker Graduate Student Statistics Department University of California, Davis [[alternative HTML version deleted]]
Reasonably Related Threads
- Previously extended schema not working in 4.4.0
- Previously extended schema not working in 4.4.0
- Reference classes: accessor functions via 'getRefClass(...)$accessors(...)'
- "registering" a generic method for a class
- Help! Big problem when using "browser()" to do R debugging?