Joel Reymont
2009-Sep-09 19:47 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique
I''m trying to time a function within Firefoxwith a pid$target probe [2]. e.g dtrace -Zqw -x dynvarsize=64m -x evaltime=preinit -p 13954 -s menu- construct.d -s sigcont.d elapsed: 12.7942481ms cpu : 7.7911194ms count : 40 times I''m also measuring Firefox startup time by running it arguments like this: ... file:///Users/joelr/work/mozilla/startup/startup.html#`python -c ''import time; print int(time.time() * 1000);''` startup.html [3] prints the time delta. I''m seing 0.9s without tracing with menu-construct.d and 17.9s when tracing with it. Now the question... Should I be looking at the elapsed time above in the context of 0.9s or 17.9s? Thanks, Joel --- [1] sigcont.d BEGIN { system("kill -CONT %d &\n", $target); } [2] menu-construct.d BEGIN { self->ts = 0; self->vts = 0; } pid$target::nsMenuX??MenuConstruct*:entry { self->ts = timestamp; self->vts = vtimestamp; } pid$target::nsMenuX??MenuConstruct*:return /self->ts/ { this->ts = timestamp - self->ts; this->vts = vtimestamp - self->vts; @tsint = sum(this->ts / 1000000); @tsfrac = sum(this->ts % 1000000); @vtsint = sum(this->vts / 1000000); @vtsfrac = sum(this->vts % 1000000); @n = count(); self->ts = 0; self->vts = 0; /*ustack();*/ } END { t = timestamp; printa("elapsed: %@u.%@06ums\n", @tsint, @tsfrac); printa("cpu : %@u.%@06ums\n", @vtsint, @vtsfrac); printa("count : %@u times\n", @n); } [3] startup.html <!DOCTYPE HTML> <html> <body> <script> var t = (new Date()).getTime(); dump("TIME " + t + "\n"); var url = document.location.toString(); if (url.indexOf("#") != -1) { var h = url.substring(url.indexOf("#")+1); dump("ELAPSED " + (t - parseInt(h)) + "\n"); document.write("<span>ELAPSED " + (t - parseInt(h)) + "</span>\n"); } dump("-----------\n"); dump("KILL\n"); window.close(); </script> </body> </html> --- fastest mac firefox! http://wagerlabs.com
Joel Reymont
2009-Sep-09 20:56 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique (simpler case)
This is another variation, a simpler one... ./d firefox-bin menu-construct.d startup.d stop-js.d dtrace -Zqw -x dynvarsize=64m -x evaltime=preinit -p 26656 -s menu- construct.d -s startup.d -s stop-js.d -s sigcont.d elapsed: 6.4350621ms cpu : 3.4218030ms count : 20 times Total: 19000.464337ms Total is printed by startup.d [1] whereas the rest is from menu- construct.d [2]. Should I evaluate elapsed time printed by menu-construct.d as a percentage of total elapsed time printed by startup.d? Thanks, Joel --- [1] startup.d BEGIN { start = timestamp; } END { this->total = timestamp - start; printf("Total: %u.%06ums\n", this->total / 1000000, this->total % 1000000); } [2] menu-construct.d BEGIN { self->ts = 0; self->vts = 0; } pid$target::nsMenuX??MenuConstruct*:entry { self->ts = timestamp; self->vts = vtimestamp; } pid$target::nsMenuX??MenuConstruct*:return /self->ts/ { this->ts = timestamp - self->ts; this->vts = vtimestamp - self->vts; @tsint = sum(this->ts / 1000000); @tsfrac = sum(this->ts % 1000000); @vtsint = sum(this->vts / 1000000); @vtsfrac = sum(this->vts % 1000000); @n = count(); self->ts = 0; self->vts = 0; /*ustack();*/ } END { t = timestamp; printa("elapsed: %@u.%@06ums\n", @tsint, @tsfrac); printa("cpu : %@u.%@06ums\n", @vtsint, @vtsfrac); printa("count : %@u times\n", @n); } --- fastest mac firefox! http://wagerlabs.com
James McIlree
2009-Sep-09 20:58 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique
On Sep 9, 2009, at 12:47 PM, Joel Reymont wrote:> I''m trying to time a function within Firefoxwith a pid$target probe > [2]. e.g > > dtrace -Zqw -x dynvarsize=64m -x evaltime=preinit -p 13954 -s menu- > construct.d -s sigcont.d > elapsed: 12.7942481ms > cpu : 7.7911194ms > count : 40 times > > I''m also measuring Firefox startup time by running it arguments like > this: > > ... file:///Users/joelr/work/mozilla/startup/startup.html#`python -c > ''import time; print int(time.time() * 1000);''` > > startup.html [3] prints the time delta. > > I''m seing 0.9s without tracing with menu-construct.d and 17.9s when > tracing with it. > > Now the question... Should I be looking at the elapsed time above in > the context of 0.9s or 17.9s? > > Thanks, JoelJoel, Given that you have only 40 hits, my guess is you should use the .9s context. As Firefox starts up, each dyld load is "paused" until a handshake with dtrace has taken place. This allows dtrace to place probes in the newly loaded library before any code has executed. I think the extra startup time you are seeing is related to dtrace processing the newly loaded libraries. If those libraries include debug info (for example, libraries you might have built yourself) it can take longer to process them. James M
Joel Reymont
2009-Sep-09 21:02 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique
James, On Sep 9, 2009, at 9:58 PM, James McIlree wrote:> I think the extra startup time you are seeing is related to dtrace > processing > the newly loaded libraries.Can this be worked around by using evaltime=postinit or, perhaps, evaltime=main? I''m using preinit at the moment. What do you think of the exclusively dtrace use case that I posted? Does that change your answer? Thanks, Joel --- fastest mac firefox! http://wagerlabs.com
Joel Reymont
2009-Sep-09 21:39 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique (simpler case)
This is my overwhelming concern: ./d firefox-bin startup.d stop-js.d dtrace -Zqw -x dynvarsize=64m -x evaltime=preinit -p 26713 -s startup.d -s stop-js.d -s sigcont.d Total: 5080.276518ms The only difference between the timing above and the one below is that I''m not using menu-construct.d. My apologies for repeating the question but should I view 6.4350621ms/ 3.4218030ms as a percentage of startup time without the overhead of the measuring script or as a percentage of total time using it? The difference to whether or not I should optimize the timed piece of code is, obviously, huge! Thanks, Joel On Sep 9, 2009, at 9:56 PM, Joel Reymont wrote:> This is another variation, a simpler one... > > ./d firefox-bin menu-construct.d startup.d stop-js.d > dtrace -Zqw -x dynvarsize=64m -x evaltime=preinit -p 26656 -s menu- > construct.d -s startup.d -s stop-js.d -s sigcont.d > elapsed: 6.4350621ms > cpu : 3.4218030ms > count : 20 times > Total: 19000.464337ms > > Total is printed by startup.d [1] whereas the rest is from menu- > construct.d [2]. > > Should I evaluate elapsed time printed by menu-construct.d as a > percentage of total elapsed time printed by startup.d?--- fastest mac firefox! http://wagerlabs.com
James McIlree
2009-Sep-09 22:03 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique
On Sep 9, 2009, at 2:02 PM, Joel Reymont wrote:> James, > > On Sep 9, 2009, at 9:58 PM, James McIlree wrote: > >> I think the extra startup time you are seeing is related to dtrace >> processing >> the newly loaded libraries. > > Can this be worked around by using evaltime=postinit or, perhaps, > evaltime=main?If your probes are invoked at a later time, then yes, this might be a partial solution. The fundamental issue is that looking at a library to see if it contains a function named "foo*" is expensive. I''m guessing that most OS X gui apps run 100+ libraries, so there is a basic overhead of "Get a list of functions from at least 100 libraries, and do a glob style string match on all of them" with any pid probe. You can pay this cost all at once (let dtrace do all the work up front, with the standard evaltime), or in small pieces as individual libraries are loaded (- xevaltime=preinit). But you''ve got to pay the cost either way. In this case, USDT probes are much much less expensive. USDT probes are "precompiled" into a fairly small blob of data which is baked into the library itself. The cost of finding and turning on usdt$target:::myfunc is much lower than the overhead of finding and turning on pid$target::myfunc:entry Of course, you pay a flexibility price, moving/changing a USDT probe is quite time consuming. James M
Joel Reymont
2009-Sep-09 22:18 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique
James, I thought preinit was the standard evaltime. Am I wrong? Is it different on Snow Leoapard? Thanks, Joel> You can pay this cost all at once (let dtrace do all the work up > front, with the standard evaltime), > or in small pieces as individual libraries are loaded (- > xevaltime=preinit). But you''ve got to pay the cost either way.--- fastest mac firefox! http://wagerlabs.com
James McIlree
2009-Sep-10 02:52 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique
On Sep 9, 2009, at 3:18 PM, Joel Reymont wrote:> James, > > I thought preinit was the standard evaltime. Am I wrong? > > Is it different on Snow Leoapard?I believe it is postinit on SnowLeopard. James M
Joel Reymont
2009-Sep-10 10:52 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique
On Sep 9, 2009, at 11:03 PM, James McIlree wrote:> You can pay this cost all at once (let dtrace do all the work up > front, with the standard evaltime), > or in small pieces as individual libraries are loaded (- > xevaltime=preinit). But you''ve got to pay the cost either way.Why is using evaltime=postinit or main a partial solution? Assuming that the probes are involved at a later time, wouldn''t this exlcude library initialization time from elapsed time, as measured by timestamp? Thanks, Joel --- fastest mac firefox! http://wagerlabs.com
Joel Reymont
2009-Sep-10 11:14 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique
On Sep 9, 2009, at 9:58 PM, James McIlree wrote:> Given that you have only 40 hits, my guess is you should use the .9s > context.May I ask why? Can this be generalized into saying that function run time as measured by timestamp should _always_ be evaluated against total eval time measured by excluding the function probe? I still don''t understand whether nsMenuX::MenuConstruct really takes 12ms of elapsed time as DTrace is claiming. I will try to measure this using mach_absolute_time, without involving dtrace, to see if it makes a difference. Thanks, Joel --- fastest mac firefox! http://wagerlabs.com
Joel Reymont
2009-Sep-10 14:07 UTC
[dtrace-discuss] dtrace overhead and proper measuring technique
I decided to dig deeper, with the help of a totally contrived C program [6] that uses mach_absolute_time to measure elapsed time. It''s the same method used by DTrace on Mac OSX, I believe. My conclusion is that elapsed time has to be taken in the context of the startup time that DOES include dtrace overhead, not in the context that excludes it. My pipeline consists of a driver shell script that allows me to put several dtrace scripts together (d) [1] and the following dtrace scripts: b.d : times C function b [2] main.d : times main [3] trace.d : times all functions using function entry/return probes [4] sigcont.d : sends SIGCONT to $target [5] I compile foo.c using ''gcc foo.c -std=c99 -O0 -o foo'' and run dtrace and foo in different windows, starting foo after dtrace. Without dtrace --- ./foo elapsed time in b: 2468196181ns elapsed time in main 2961539846ns --- Timing both main() and b() using 2 scripts --- elapsed time in b: 2499433877ns elapsed time in main 2998021720ns ./d foo b.d main.d dtrace -Zqw -x dynvarsize=64m -x evaltime=postinit -p 37583 -s b.d -s main.d -s sigcont.d b: elapsed: 2491.8609689ms b: cpu : 2457.9077335ms b: count : 20 times main: 2998.110291ms --- vs timing just main() using dtrace --- elapsed time in b: 2471918169ns elapsed time in main 2967909333ns ./d foo main.d dtrace -Zqw -x dynvarsize=64m -x evaltime=postinit -p 37592 -s main.d -s sigcont.d main: 2967.973019ms --- Times are very comparable so there''s not a lot of overhead in main.d and b.d. Now the same but with a VERY expensive trace.d which causes a kernel trap for every user function --- elapsed time in b: 4048801437ns elapsed time in main 4861803951ns ./d foo main.d b.d trace.d dtrace -Zqw -x dynvarsize=64m -x evaltime=postinit -p 37622 -s main.d -s b.d -s trace.d -s sigcont.d b: elapsed: 4037.12010634ms b: cpu : 3064.10182894ms b: count : 20 times main: 4863.032899ms --- Elapsed time is proportionally increased both in the app (foo) and in the output from dtrace scripts. This tells me that you should always view the timings in the context of the same dtrace run (b.d + main.d), not in some other context. Is this conclusive proof? Let me know! Thanks, Joel --- [1] d #!/bin/bash progname=$1 shift scripts="" for i in $*; do scripts="$scripts -s $i"; done scripts="$scripts -s sigcont.d" opts="-Zqw -x dynvarsize=64m -x evaltime=postinit" dtrace='' inline string progname="''$progname''"; inline string scripts="''$scripts''"; inline string opts="''$opts''"; proc:::exec-success /execname == progname/ { stop(); printf("dtrace %s -p %d %s\n", opts, pid, scripts); system("dtrace %s -p %d %s\n", opts, pid, scripts); exit(0); } '' sync && purge && dtrace $opts -n "$dtrace" [2] b.d BEGIN { self->t = 0; self->vt = 0; } pid$target::b:entry { self->t = timestamp; self->vt = vtimestamp; } pid$target::b:return /self->t/ { this->t = timestamp - self->t; this->vt = vtimestamp - self->vt; @tsint = sum(this->t / 1000000); @tsfrac = sum(this->t % 1000000); @vtsint = sum(this->vt / 1000000); @vtsfrac = sum(this->vt % 1000000); @n = count(); self->t = 0; self->vt = 0; } END { t = timestamp; printa("b: elapsed: %@u.%@06ums\n", @tsint, @tsfrac); printa("b: cpu : %@u.%@06ums\n", @vtsint, @vtsfrac); printa("b: count : %@u times\n", @n); } [3] main.d pid$target::main:entry { start = timestamp; } pid$target::main:return { this->total = timestamp - start; printf("main: %u.%06ums\n", this->total / 1000000, this->total % 1000000); exit(0); } [4] trace.d BEGIN { trace_start = timestamp; } pid$target:::entry { self->trace_ts = timestamp; } pid$target:::return /self->trace_ts/ { this->trace_delta = timestamp - self->trace_ts; @trace_elapsed1[probefunc] = sum(this->trace_delta / 1000000); @trace_elapsed2[probefunc] = sum(this->trace_delta % 1000000); @trace_total1 = sum(this->trace_delta / 1000000); @trace_total2 = sum(this->trace_delta % 1000000); @trace_count[probefunc] = count(); self->trace_ts = 0; } END { this->trace_t = timestamp; trunc(@trace_elapsed1, 25); trunc(@trace_elapsed2, 25); trunc(@trace_count, 25); this->trace_total = this->trace_t - trace_start; printf("Top user functions by elapsed time:\n"); printa("%@ 10u.%@06ums for %s\n", @trace_elapsed1, @trace_elapsed2); printf("---------------\n"); printa("= %@u.%@06ums\n", @trace_total1, @trace_total2); printf("Total: %u.%06ums\n", this->trace_total / 1000000, this- >trace_total % 1000000); } [5] sigcont.d BEGIN { system("kill -CONT %d &\n", $target); } [6] foo.c #include <stdio.h> #include <stdlib.h> #include <stdarg.h> #include <mach/mach.h> #include <mach/mach_time.h> #include <unistd.h> uint64_t nanoseconds(uint64_t elapsed) { mach_timebase_info_data_t info; mach_timebase_info(&info); uint64_t nanoSeconds = elapsed * info.numer / info.denom; return nanoSeconds; } uint64_t a(int y) { FILE* fp; uint32_t x; if((fp = fopen("/dev/urandom", "r")) == NULL) { printf("bummer!\n"); exit(1); } for (int i = 0; i < 512; i++) fread(&x, sizeof(uint32_t), 1, fp); fclose(fp); return x * x * x * y; } uint64_t b() { uint64_t start = mach_absolute_time(); for(int i = 0; i < 20; i++) a(i); uint64_t end = mach_absolute_time(); return end - start; } void c() { for(int i = 0; i < 40; i++) a(i); } int main() { uint64_t elapsed = 0; uint64_t start = mach_absolute_time(); c(); for (int i = 0; i < 20; i++) elapsed += b(); c(); //and here where we take the 2nd time measurement... uint64_t end = mach_absolute_time(); printf ("elapsed time in b: %lldns\n", nanoseconds(elapsed)); printf ("elapsed time in main %lldns\n", nanoseconds(end - start)); return 0; } --- fastest mac firefox! http://wagerlabs.com