I am using Rprof() to help find ways to improve performance.
I found a function whose total seconds and self seconds were large. I 
replaced it with something else. The something else had a small 
number of total seconds and self seconds. But the total time did not 
decrease.
I don't understand how that could be, and would appreciate any suggestions.
Thanks
-Don
Details, unfortunately lengthy, are here:
The snippet of code I'm profiling is given below. 'inmat' is a 
numeric matrix with 128347 rows; I need to apply a function, sfun(), 
to each of 4191 subsets of rows, indexed by indat$grpi. sfun() is 
fairly complex, and requires information from all of the columns of 
inmat, so simple uses of, for example, aggregate() don't do the job.
Rprof('Rpf3n')
for (i in unique(indat$grpi)) {
  ssnew[i,] <- sfun(inmat[indat$grpi==i,,drop=FALSE])
}
Rprof(NULL)
----------------- Rprof output, v1 ----------------
Each sample represents 0.02 seconds.
Total run time: 404.999999999922 seconds.
Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.
    %       total       %       self
  total    seconds     self    seconds    name
100.00    405.00      0.00      0.00     "source"
100.00    405.00      0.11      0.46     "eval.with.vis"
  99.88    404.50      4.70     19.04     "sfun"
  83.98    340.10     41.25    167.06     "nrow"
  42.68    172.84     42.68    172.84     "=="
   3.08     12.48      0.50      2.04     "lm.fit"
   1.72      6.96      0.23      0.94     "median"
(remainder omitted)
nrow() has 340 total and 41 self seconds, a large portion of the 
total run time.
There is only one call to nrow() in my code,
   ntot <- nrow(dati)
where dati is a subset of the numeric matrix, indat.
I tried replacing nrow(dati) with length(dati[,1])
The profile is then:
----------------- Rprof output, v2 ----------------
Each sample represents 0.02 seconds.
Total run time: 403.699999999923 seconds.
Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.
    %       total       %       self
  total    seconds     self    seconds    name
100.00    403.70      0.11      0.46     "eval.with.vis"
100.00    403.70      0.00      0.00     "source"
  99.88    403.20     46.03    185.82     "sfun"
  42.37    171.06     42.37    171.06     "=="
   3.00     12.10      0.59      2.40     "lm.fit"
   1.70      6.88      0.22      0.90     "median"
(skip many lines until)
   0.01      0.04      0.01      0.04     "length"
(remainder omitted)
This is where I'm puzzled.
nrow() has disappeared as expected
length() has not taken its place
Yet the total run time is about the same.
As an aside, I had previously been using sapply() and split(), and 
that was far slower. Rprof() helped me find out why it was slow. 
Also, the matrix ssnew is created before the loop, with the right 
dimensions, so it's not being grown during the loop.
-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
--------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Tue, 11 Jun 2002, Don MacQueen wrote:> I am using Rprof() to help find ways to improve performance. > > I found a function whose total seconds and self seconds were large. I > replaced it with something else. The something else had a small > number of total seconds and self seconds. But the total time did not > decrease.This is strange. Are you sure that nrow isn't used by any of the functions you call? It's possible that your explicit use of nrow() took lots of total seconds but few self seconds, and that another hidden use did the reverse. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Don't know if it is an issue in this case but lazy evaluation can sometimes lead to difficulties in interpreting Rprof ouptut since the time used to evaluate arguments is attributed by the profiler to the function that forces the argument evaluation. luke On Tue, Jun 11, 2002 at 10:00:37AM -0700, Don MacQueen wrote:> I am using Rprof() to help find ways to improve performance. > > I found a function whose total seconds and self seconds were large. I > replaced it with something else. The something else had a small > number of total seconds and self seconds. But the total time did not > decrease. > > I don't understand how that could be, and would appreciate any suggestions. > > Thanks > -Don > > > Details, unfortunately lengthy, are here: > > The snippet of code I'm profiling is given below. 'inmat' is a > numeric matrix with 128347 rows; I need to apply a function, sfun(), > to each of 4191 subsets of rows, indexed by indat$grpi. sfun() is > fairly complex, and requires information from all of the columns of > inmat, so simple uses of, for example, aggregate() don't do the job. > > Rprof('Rpf3n') > for (i in unique(indat$grpi)) { > ssnew[i,] <- sfun(inmat[indat$grpi==i,,drop=FALSE]) > } > Rprof(NULL) > > > ----------------- Rprof output, v1 ---------------- > Each sample represents 0.02 seconds. > Total run time: 404.999999999922 seconds. > > Total seconds: time spent in function and callees. > Self seconds: time spent in function alone. > > % total % self > total seconds self seconds name > 100.00 405.00 0.00 0.00 "source" > 100.00 405.00 0.11 0.46 "eval.with.vis" > 99.88 404.50 4.70 19.04 "sfun" > 83.98 340.10 41.25 167.06 "nrow" > 42.68 172.84 42.68 172.84 "==" > 3.08 12.48 0.50 2.04 "lm.fit" > 1.72 6.96 0.23 0.94 "median" > (remainder omitted) > > nrow() has 340 total and 41 self seconds, a large portion of the > total run time. > > There is only one call to nrow() in my code, > ntot <- nrow(dati) > where dati is a subset of the numeric matrix, indat. > > I tried replacing nrow(dati) with length(dati[,1]) > > The profile is then: > > ----------------- Rprof output, v2 ---------------- > Each sample represents 0.02 seconds. > Total run time: 403.699999999923 seconds. > > Total seconds: time spent in function and callees. > Self seconds: time spent in function alone. > > % total % self > total seconds self seconds name > 100.00 403.70 0.11 0.46 "eval.with.vis" > 100.00 403.70 0.00 0.00 "source" > 99.88 403.20 46.03 185.82 "sfun" > 42.37 171.06 42.37 171.06 "==" > 3.00 12.10 0.59 2.40 "lm.fit" > 1.70 6.88 0.22 0.90 "median" > (skip many lines until) > 0.01 0.04 0.01 0.04 "length" > (remainder omitted) > > This is where I'm puzzled. > nrow() has disappeared as expected > length() has not taken its place > Yet the total run time is about the same. > > > As an aside, I had previously been using sapply() and split(), and > that was far slower. Rprof() helped me find out why it was slow. > Also, the matrix ssnew is created before the loop, with the right > dimensions, so it's not being grown during the loop. > > -- > -------------------------------------- > Don MacQueen > Environmental Protection Department > Lawrence Livermore National Laboratory > Livermore, CA, USA > -------------------------------------- > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._-- Luke Tierney University of Minnesota Phone: 612-625-7843 School of Statistics Fax: 612-624-8868 313 Ford Hall, 224 Church St. S.E. email: luke at stat.umn.edu Minneapolis, MN 55455 USA WWW: http://www.stat.umn.edu -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._