garvin@cs.rice.edu
2004-May-13 23:21 UTC
[Rd] Rprof ignores top-level computation (PR#6883)
Full_Name: John Garvin Version: 1.9.0 OS: Linux Submission from: (NULL) (128.42.129.78) This may or may not technically be a bug, but it's certainly an annoyance. Rprof only takes into account computation that occurs inside functions. If a time-consuming operation occurs outside a function, it doesn't record the time it takes. Consider this program 'array.r': Rprof() foo <- array(1:1000000, dim=c(100,50,200)) bar <- c(1:1000, dim=c(1000000,3)) for (i in 1:100) { foo[bar>42] <- foo[bar>42] } Rprof(NULL) Here's a comparison of the running time measured through the OS and the time measured by Rprof: $ time R --slave < array.r real 0m16.068s user 0m14.060s sys 0m1.550s $ R CMD Rprof Rprof.out Each sample represents 0.02 seconds. Total run time: 0.12 seconds. Total seconds: time spent in function and callees. Self seconds: time spent in function alone. % total % self total seconds self seconds name 66.67 0.08 66.67 0.08 "as.vector" 66.67 0.08 0.00 0.00 "array" 33.33 0.04 33.33 0.04 ">" % self % total self seconds total seconds name 66.67 0.08 66.67 0.08 "as.vector" 33.33 0.04 33.33 0.04 ">" Rprof gives a much more sensible result when the same code is wrapped in a function definition, like this: Rprof() f <- function() { foo <- array(1:1000000, dim=c(100,50,200)) bar <- c(1:1000, dim=c(1000000,3)) for (i in 1:100) { foo[bar>42] <- foo[bar>42] } } f() Rprof(NULL) $ time R --slave < array.r real 0m16.436s user 0m14.110s sys 0m1.490s $ R CMD Rprof Rprof.out Each sample represents 0.02 seconds. Total run time: 14.4799999999998 seconds. Total seconds: time spent in function and callees. Self seconds: time spent in function alone. % total % self total seconds self seconds name 100.00 14.48 99.17 14.36 "f" 0.55 0.08 0.55 0.08 "as.vector" 0.55 0.08 0.00 0.00 "array" 0.28 0.04 0.28 0.04 ">" % self % total self seconds total seconds name 99.17 14.36 100.00 14.48 "f" 0.55 0.08 0.55 0.08 "as.vector" 0.28 0.04 0.28 0.04 ">" All that time disappears when the code is not in a function. Rprof may have been written to work that way--I don't know--but in any case its results with certain inputs are pretty misleading.