thr3ads.net - R devel - [Rd] profiler and loops [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Romain Francois

2009-Mar-03 12:33 UTC

[Rd] profiler and loops

Hello,

(This is follow up from this thread: 
nabble.com/execution-time-of-.packages-td22304833.html but 
with a different focus)

I am often confused by the result of the profiler, when a loop is 
involved. Consider these two scripts:

script1:

Rprof( )
x <- numeric( )
   for( i in 1:10000){
     x <- c( x, rnorm(10) )
   }
Rprof( NULL )
print( summaryRprof( ) )


script2:

Rprof( )
ffor <- function(){
   x <- numeric( )
   for( i in 1:10000){
     x <- c( x, rnorm(10) )
   }
}
ffor()
Rprof( NULL )
print( summaryRprof( ) )


[]$ time Rscript --vanilla script1.R
$by.self
       self.time self.pct total.time total.pct
"rnorm"      0.22      100       0.22       100

$by.total
       total.time total.pct self.time self.pct
"rnorm"       0.22       100      0.22      100

$sampling.time
[1] 0.22

real    0m7.786s
user    0m5.192s
sys     0m0.735s

[]$$ time Rscript --vanilla script2.R
$by.self
       self.time self.pct total.time total.pct
"ffor"       4.94     92.5       5.34     100.0
"rnorm"      0.40      7.5       0.40       7.5

$by.total
       total.time total.pct self.time self.pct
"ffor"        5.34     100.0      4.94     92.5
"rnorm"       0.40       7.5      0.40      7.5

$sampling.time
[1] 5.34


real    0m7.841s
user    0m5.152s
sys     0m0.712s



In the first one, I call a for loop from the top level and in the second 
one, the loop is wrapped in a function call. This shows the inability of 
the profiler to point loops as responsible for bottlenecks. The coder of 
script1 would not know what to do to improve on the script.

I have had a quick look in the code, and here are a few thoughts:

in the function "doprof" in eval.c,  this loop write the call stack on
the profiler file:

for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
   if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
       && TYPEOF(cptr->call) == LANGSXP) {
       SEXP fun = CAR(cptr->call);
       if (!newline) newline = 1;
       fprintf(R_ProfileOutfile, "\"%s\" ",
           TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
           "<Anonymous>");
   }
   }
  so we can see it only cares about context CTXT_FUNCTION and 
CTXT_BUILTIN, when for loops play with CTXT_LOOP (this is again in 
eval.c within the do_for function)

begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
        R_NilValue);
 
which as the name implies, begins the context of the for loop. The 
begincontext function looks like this :

void begincontext(RCNTXT * cptr, int flags,
         SEXP syscall, SEXP env, SEXP sysp,
         SEXP promargs, SEXP callfun)
{
   cptr->nextcontext = R_GlobalContext;
   cptr->cstacktop = R_PPStackTop;
   cptr->evaldepth = R_EvalDepth;
   cptr->callflag = flags;
   cptr->call = syscall;
   cptr->cloenv = env;
   cptr->sysparent = sysp;
   cptr->conexit = R_NilValue;
   cptr->cend = NULL;
   cptr->promargs = promargs;
   cptr->callfun = callfun;
   cptr->vmax = vmaxget();
   cptr->intsusp = R_interrupts_suspended;
   cptr->handlerstack = R_HandlerStack;
   cptr->restartstack = R_RestartStack;
   cptr->prstack = R_PendingPromises;
#ifdef BYTECODE
   cptr->nodestack = R_BCNodeStackTop;
# ifdef BC_INT_STACK
   cptr->intstack = R_BCIntStackTop;
# endif
#endif
   R_GlobalContext = cptr;
}


So it could be possible to set the last argument of the begincontext 
function to "for" and use this code in the doprof function:


for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
 if ( ( cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN ) )
       && TYPEOF(cptr->call) == LANGSXP) {
       SEXP fun = CAR(cptr->call);
       if (!newline) newline = 1;
       fprintf(R_ProfileOutfile, "\"%s\" ",
           TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
           "<Anonymous>");
   } else if( cptr->callflag & CTXT_LOOP){
     SEXP fun = CAR(cptr->syscall);
     if (!newline) newline = 1;
     fprintf(R_ProfileOutfile, "\"%s\" ",
CHAR(PRINTNAME(fun)) );
   }
}

so that we see for in the list of "functions" that appear in the 
profiler file.

Obviously I am taking some shortcuts here, because of the other loops, 
but I would like to make a formal patch with this. Before I do that, I'd 
like to know :
- is this has a chance of breaking something else (does the CTXT_LOOP 
being R_NilValue is used elsewhere)
- would this feature be welcome.
- Should I differentiate real functions with loops in the output file, 
maybe I can write "[for]" instead of for to emphacize this is not a 
function.

Romain

-- 
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
romainfrancois.blog.free.fr


-- 
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
romainfrancois.blog.free.fr

Romain Francois

2009-Mar-03 16:23 UTC

head link

[Rd] profiler and loops

Hello,

Please find attached a patch against svn implementing this proposal.

The part I don't fully understand is the part involving the function  
loopWithContect, so I've put "[loop]" in there instead of
"[for]",
"[while]" or "[repeat]" because I don't really know how
to extract the
information.

With the script1 from my previous post, summaryRprof produces this:

[]$ /home/romain/workspace/R-trunk/bin/Rscript script1.R
$by.self
        self.time self.pct total.time total.pct
"[for]"      5.32     98.9       5.38     100.0
"rnorm"      0.06      1.1       0.06       1.1

$by.total
        total.time total.pct self.time self.pct
"[for]"       5.38     100.0      5.32     98.9
"rnorm"       0.06       1.1      0.06      1.1

$sampling.time
[1] 5.38

Romain


Romain Francois wrote:> Hello,
>
> (This is follow up from this thread: 
> nabble.com/execution-time-of-.packages-td22304833.html but 
> with a different focus)
>
> I am often confused by the result of the profiler, when a loop is 
> involved. Consider these two scripts:
>
> script1:
>
> Rprof( )
> x <- numeric( )
>   for( i in 1:10000){
>     x <- c( x, rnorm(10) )
>   }
> Rprof( NULL )
> print( summaryRprof( ) )
>
>
> script2:
>
> Rprof( )
> ffor <- function(){
>   x <- numeric( )
>   for( i in 1:10000){
>     x <- c( x, rnorm(10) )
>   }
> }
> ffor()
> Rprof( NULL )
> print( summaryRprof( ) )
>
>
> []$ time Rscript --vanilla script1.R
> $by.self
>       self.time self.pct total.time total.pct
> "rnorm"      0.22      100       0.22       100
>
> $by.total
>       total.time total.pct self.time self.pct
> "rnorm"       0.22       100      0.22      100
>
> $sampling.time
> [1] 0.22
>
> real    0m7.786s
> user    0m5.192s
> sys     0m0.735s
>
> []$$ time Rscript --vanilla script2.R
> $by.self
>       self.time self.pct total.time total.pct
> "ffor"       4.94     92.5       5.34     100.0
> "rnorm"      0.40      7.5       0.40       7.5
>
> $by.total
>       total.time total.pct self.time self.pct
> "ffor"        5.34     100.0      4.94     92.5
> "rnorm"       0.40       7.5      0.40      7.5
>
> $sampling.time
> [1] 5.34
>
>
> real    0m7.841s
> user    0m5.152s
> sys     0m0.712s
>
>
>
> In the first one, I call a for loop from the top level and in the 
> second one, the loop is wrapped in a function call. This shows the 
> inability of the profiler to point loops as responsible for 
> bottlenecks. The coder of script1 would not know what to do to improve 
> on the script.
>
> I have had a quick look in the code, and here are a few thoughts:
>
> in the function "doprof" in eval.c,  this loop write the call
stack on
> the profiler file:
>
> for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
>   if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
>       && TYPEOF(cptr->call) == LANGSXP) {
>       SEXP fun = CAR(cptr->call);
>       if (!newline) newline = 1;
>       fprintf(R_ProfileOutfile, "\"%s\" ",
>           TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
>           "<Anonymous>");
>   }
>   }
>  so we can see it only cares about context CTXT_FUNCTION and 
> CTXT_BUILTIN, when for loops play with CTXT_LOOP (this is again in 
> eval.c within the do_for function)
>
> begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
>        R_NilValue);
>
> which as the name implies, begins the context of the for loop. The 
> begincontext function looks like this :
>
> void begincontext(RCNTXT * cptr, int flags,
>         SEXP syscall, SEXP env, SEXP sysp,
>         SEXP promargs, SEXP callfun)
> {
>   cptr->nextcontext = R_GlobalContext;
>   cptr->cstacktop = R_PPStackTop;
>   cptr->evaldepth = R_EvalDepth;
>   cptr->callflag = flags;
>   cptr->call = syscall;
>   cptr->cloenv = env;
>   cptr->sysparent = sysp;
>   cptr->conexit = R_NilValue;
>   cptr->cend = NULL;
>   cptr->promargs = promargs;
>   cptr->callfun = callfun;
>   cptr->vmax = vmaxget();
>   cptr->intsusp = R_interrupts_suspended;
>   cptr->handlerstack = R_HandlerStack;
>   cptr->restartstack = R_RestartStack;
>   cptr->prstack = R_PendingPromises;
> #ifdef BYTECODE
>   cptr->nodestack = R_BCNodeStackTop;
> # ifdef BC_INT_STACK
>   cptr->intstack = R_BCIntStackTop;
> # endif
> #endif
>   R_GlobalContext = cptr;
> }
>
>
> So it could be possible to set the last argument of the begincontext 
> function to "for" and use this code in the doprof function:
>
>
> for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
> if ( ( cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN ) )
>       && TYPEOF(cptr->call) == LANGSXP) {
>       SEXP fun = CAR(cptr->call);
>       if (!newline) newline = 1;
>       fprintf(R_ProfileOutfile, "\"%s\" ",
>           TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
>           "<Anonymous>");
>   } else if( cptr->callflag & CTXT_LOOP){
>     SEXP fun = CAR(cptr->syscall);
>     if (!newline) newline = 1;
>     fprintf(R_ProfileOutfile, "\"%s\" ",
CHAR(PRINTNAME(fun)) );
>   }
> }
>
> so that we see for in the list of "functions" that appear in the 
> profiler file.
>
> Obviously I am taking some shortcuts here, because of the other loops, 
> but I would like to make a formal patch with this. Before I do that, 
> I'd like to know :
> - is this has a chance of breaking something else (does the CTXT_LOOP 
> being R_NilValue is used elsewhere)
> - would this feature be welcome.
> - Should I differentiate real functions with loops in the output file, 
> maybe I can write "[for]" instead of for to emphacize this is not
a
> function.
>
> Romain
>

-- 
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
romainfrancois.blog.free.fr


-------------- next part --------------
A non-text attachment was scrubbed...
Name: profilerWithLoop.diff
Type: text/x-patch
Size: 1715 bytes
Desc: not available
URL:
<stat.ethz.ch/pipermail/r-devel/attachments/20090303/fcc638a4/attachment.bin>

Maybe Matching Threads

Search for more possibly parallel threads

R devel - Mar 2009 - profiler and loops

[Rd] profiler and loops

[Rd] profiler and loops

Maybe Matching Threads