I have some trouble interpreting the output from profiling. I have read the help pages Rprof, summaryRprof and consult the R extensions manual, but I still have problems understanding the output. Basically the output consist of self.time and total.time. I have the understanding that total.time is the time spent in a given function including any subcalls or child functions or whatever the technical term for that activity is. In contrasts self.time is the time spent in the function excluding subcalls. Now, in my understanding basically everything in R is functions. I would then guess that for almost any function (except the "atomic ones") the self.time would be very small as it would spend most of its time calling other functions (again, since almost everything is a function). So how do R determine when a subfunction is called? Looking at a practical example, which basically consists of a model search, with each model being evaluated by a nnet fit, I get (using the default value of interval=0.02 - the output is basically similar for interval = 0.01)> prof02.out$by.total[1:10,]total.time total.pct self.time self.pct cv.risk.dsa2 2373.50 100.0 0.00 0.0 validation.risk.dsa2 2373.50 100.0 0.00 0.0 avg.pred.nnet 2370.98 99.9 4.36 1.5 dsa.2 2337.32 98.5 0.00 0.0 best.move.dsa2 2335.22 98.4 0.06 0.0 avg.nnet 2335.10 98.4 0.04 0.0 nnet 2188.22 92.2 1.74 0.6 nnet.default 2186.48 92.1 33.32 11.2 sub.dsa2 1466.32 61.8 0.18 0.1 del.dsa2 560.36 23.6 0.02 0.0 This tells me that the majority (92%) is spent in the nnet fit. Therefore it seems that the only real improvement to the code would be to change the algorithm in order to call nnet fewer times. The actual search code only amounts to a little amout of the total time. Looking at sampling.time, the fourth column (total.time in percent) is the third column (total.time) divided by sampling.time (which makes sense). My trouble really begins when I look at the self.time column.> prof02.out$by.self[1:10,]self.time self.pct total.time total.pct nnet.default 33.32 11.2 2186.48 92.1 FUN 23.42 7.8 46.34 2.0 as.integer 22.28 7.5 30.92 1.3 mean.default 16.14 5.4 22.92 1.0 as.double.default 11.52 3.9 11.52 0.5 apply 10.22 3.4 60.80 2.6 as.double 10.14 3.4 30.24 1.3 rep.default 9.98 3.3 19.96 0.8 matrix 9.62 3.2 75.86 3.2 add.net 8.80 2.9 10.16 0.4 First of, I would guess that the sum of the self.time column _ought_??? to be equal to the sampling time, which it is not> sum(prof02.out$by.self[,1])[1] 298.4> prof02.out$sampling.time[1] 2373.5 Why does such a large discrepancy happen? Secondly, looking at the output for as.integer, there is a big difference between total.time and self.time - but I would _guess_ as.integer to be a pretty basic function, which does not call any subfunctions, and in that case I would expect total.time to be equal to self.time. In short : I think I grasp the total time concept, but I have a hard time understanding self.time. Specifically - Why is the sampling time so different from the sum of the self.time - Which function calls (or operations) are considered to belong to a given function's self.time and which are not. - Why is total.time != self.time for the as.integer function. I guess some of the answers would be clear if I had a firm grasp of the inner workings of R :) Thanks in advance, Kasper -- Kasper Daniel Hansen, Research Assistant Department of Biostatistics, University of Copenhagen
Kasper Daniel Hansen <k.hansen at biostat.ku.dk> writes:> I have some trouble interpreting the output from profiling. I have > read the help pages Rprof, summaryRprof and consult the R extensions > manual, but I still have problems understanding the output. > > Basically the output consist of self.time and total.time. I have the > understanding that total.time is the time spent in a given function > including any subcalls or child functions or whatever the technical > term for that activity is. In contrasts self.time is the time spent in > the function excluding subcalls. > > Now, in my understanding basically everything in R is functions. I > would then guess that for almost any function (except the "atomic > ones") the self.time would be very small as it would spend most of its > time calling other functions (again, since almost everything is a > function). So how do R determine when a subfunction is called?...brutal snippage...> I guess some of the answers would be clear if I had a firm grasp of > the inner workings of R :)In a word, yes... The profiling keeps track of R's *context stack* which is not quite the same as function calls. Essentially, it only counts R functions that are actually written in R, but not .Internal, .Primitive, etc. So "self" counts the amount of time that a function was at the top of the stack. This is done by a periodic poll which dumps out the context stack at regular intervals. Seeing cases where the self percentages don't add to 100% is, I believe, simply due to truncation of the tails -- that is, there is a large number of different functions which each are counted a few times, and these are not shown in the summary output. [Sorry about the late reply, but I was out of town, and noone seems to have answered this.] -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Tue, Jul 27, 2004 at 09:12:56AM +0200, Peter Dalgaard wrote:> [What did ess-help have to do with this?? Snipped from CC:]This happens when posting in a hurry using a new mail client. Sorry. See below btw.> Kasper Daniel Hansen <k.hansen at biostat.ku.dk> writes: > > > Thanks to both of you for considering my question. I still have some > > problems however. Consider the nnet.default function which uses the .C > > interface. As > > > typeof(.C) > > [1] "builtin" > > I would guess - from my current understanding - that the time spent in > > the C functions of nn.default are added to its self.time. Now look at > > the output > > > > > prof02.out$by.self[1:10,] > > self.time self.pct total.time total.pct > > nnet.default 33.32 11.2 2186.48 92.1 > > > > So nnet.default only used 33.3 on its C calls, but 2186 in total > > time. From this I would guess that nnet.default - in my case - spends > > its majority of time setting up the fit and postprocessing it (as this > > is done by R functions which does not add to its self time). The > > actual fit (which is done by the C functions) only takes very little > > time. > > > > Looking further down the table I see > > > > self.time self.pct total.time total.pct > > as.integer 22.28 7.5 30.92 1.3 > > > > as.interger is basically a .Internal call - which ought not to be > > recorded. But why does it then have such a "high" self.time? > > > > And finally, if a program (such as mine) have a high discrepancy > > between self.time and sampling.time, does this imply that the far > > majority of time is spent in .Primitives which are of type special? > > Hmm... You're using summaryRprof(), obviously. Do the results match up > with what you get from R CMD Rprof? Looking at the source for > summaryRprof, I see that it uses this construction > > selft <- sapply(ls(envir = self), function(f) get(f, envir > self)) > > which I suspect does not count function names that start with a dot...It does not. Runnig everything through R CMD Rprof gives a much moe sensible output. I have no further questions. Do I file a bug report on summaryRprof - it is certainly unintentional that it does not compute eg. .C calls when computing the self time? -- Kasper Daniel Hansen, Research Assistant Department of Biostatistics, University of Copenhagen
> From: Kasper Daniel Hansen > > On Tue, Jul 27, 2004 at 09:12:56AM +0200, Peter Dalgaard wrote: > > > > Hmm... You're using summaryRprof(), obviously. Do the > results match up > > with what you get from R CMD Rprof? Looking at the source for > > summaryRprof, I see that it uses this construction > > > > selft <- sapply(ls(envir = self), function(f) get(f, envir > > self)) > > > > which I suspect does not count function names that start > with a dot... > > It does not. Runnig everything through R CMD Rprof gives a much moe > sensible output. I have no further questions. > > Do I file a bug report on summaryRprof - it is certainly > unintentional > that it does not compute eg. .C calls when computing the self time?I thought I did see .C in the output of summaryRprof() at one time... Andy> -- > Kasper Daniel Hansen, Research Assistant > Department of Biostatistics, University of Copenhagen > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >