G''Day Folks, There has been a number of items on my DTrace ToDo list that have been sitting there for a while. For each of them I''ve reached the point where I''m reading through kernel/dtrace source to sort them out, which is not in itself a bad thing, however it may be dumb if I''ve missed something obvious. I''ll post them gradually to this list to see if anyone has knowledge of these issues. ------------------------------------------- 1. Measuring Ecache hit rate for user funcs ------------------------------------------- I think this was first suggested to me by Tony Shoumack of Adelaide, Australia. It''s an interesting idea. It would require initialising the PICs on the CPUs to measure the Ecache events, and to then read the PIC registers on the entry and exit to user functions. Ok, so initialising the PICs can be done by running cpustat, # cpustat -c pic0=EC_ref,pic1=EC_hit 1 5 time cpu event pic0 pic1 1.012 0 tick 15610037 14302458 2.012 0 tick 12827455 11731318 3.012 0 tick 14063838 13054689 4.012 0 tick 14706825 13509602 5.012 0 tick 14708909 13530250 5.012 1 total 71917064 66128317 Wheee. This is from an UltraSPARC-IIi CPU. So, lets say we cheat and run cpustat > /dev/null via a system() in a destructive dtrace script to initialise those counters, then it''s only a matter of reading those registers on the entry and return of user functions. DTrace lets us read plenty of registers through uregs[]. But I have not found the PIC data registers via uregs[] (I''ve run scripts to print them all out). I think they either haven''t been mapped, or something far more insurmountable - such as this saved set of user-mode of registers simply doesn''t contain the PIC registers - and calls to uregs[] is reading this saved set, not the registers directly. Of course, to even get this close assumes that reading the counters is simple and doesn''t require any more PIC control register (PCR) writes, which should be the case if there were 2 PIC data registers and 1 PIC control register; This appears to be the case for an UltraSPARC-II anyway, as the following is from the User Manual - 802-7220-02.pdf, "Up to two performance events can be measured simultaneously in UltraSPARC. The Performance Control Register (PCR) controls event selection and filtering (that is, counting user and/or system level events) for a pair of 32-bit Perfor-mance Instrumentation Counters (PICs)." A pair a PICs. So, lets find the routine that actually reads them (much dtracing later*), # mdb -k [...] > ultra_getpic::dis ultra_getpic: retl ultra_getpic+4: rd %pic, %o0 > The routine only reads one PIC counter, but this turns out to be a 64-bit register that contains both 32-bit PIC data registers joined at the hip. (* before source was released, I had to find ultra_getpic using dtrace alone - not too hard, just do 17 PIC reads, aggregate everything that looks vaguely familiar, and see what was fired 17 times) ultra_getpic is written in assembly in usr/src/uts/sun4/ml/cpc_hwreg.s, and is called from the us_pcbe_sample() C function - whose stack backtrace is, # dtrace -n ''fbt::us_pcbe_sample:entry { stack(); }'' dtrace: description ''fbt::us_pcbe_sample:entry '' matched 1 probe CPU ID FUNCTION:NAME 0 36657 us_pcbe_sample:entry genunix`kcpc_sample+0xf0 cpc`kcpc_ioctl+0x270 genunix`ioctl+0x184 unix`syscall_trap32+0xcc Ok, so it''s safe to say on UltraSPARC that there is one PIC data register to read. I haven''t found it in uregs[] (and don''t really expect it to be there). A DTrace function to return an ultra_getpic as arg0 would do the trick here, and could be called on user func entry and return. but. We should try to accomodate x86 as well (in all it''s variants). :) Digging around my x86 server, The function appears to be rmdsr, > rdmsr::dis rdmsr: movl 0x4(%esp),%ecx rdmsr+4: rdmsr rdmsr+6: movl 0x8(%esp),%ecx rdmsr+0xa: movl %eax,(%ecx) rdmsr+0xc: movl %edx,0x4(%ecx) rdmsr+0xf: ret > and is called from ptm_pcbe_sample() which is in usr/src/uts/intel/pcbe/p123_pcbe.c, dtrace -n ''fbt::ptm_pcbe_sample:entry { stack(); }'' dtrace: description ''fbt::ptm_pcbe_sample:entry '' matched 1 probe CPU ID FUNCTION:NAME 0 35917 ptm_pcbe_sample:entry genunix`kcpc_sample+0xaf cpc`kcpc_ioctl+0xf5 genunix`cdev_ioctl+0x2b specfs`spec_ioctl+0x62 genunix`fop_ioctl+0x1e genunix`ioctl+0x199 unix`sys_sysenter+0xdc Holy smokes Batman! Someone has been here before - the following is from usr/src/uts/intel/pcbe/p123_pcbe.c, DTRACE_PROBE1(ptm__curpic0, uint64_t, curpic[0]); DTRACE_PROBE1(ptm__curpic1, uint64_t, curpic[1]); Hmmm. And there are others that probe the PCRs. That''s handy. So if I run cpustat through a destructive dtrace system(), then I can read PICs at 100 Hz - the limit of cpustat, and drag the counters into DTrace using the above sdt probes (which let me write a wonky E$ by PID tool, which only gives results if the processes spend a loooong time on the CPU. Unless I think of a more elegant way, that tool won''t see the light of day).. The stack backtraces did show something useful - kcpc_sample is common to all architectures, so it may be easier to call this from DTrace... ... so, that''s where I''m at. I don''t have a solution yet, still reading up on things... It''s nice to have source, I should add :) Brendan
Hi Brendan, There is something along these lines in-progress at the moment - stitching a performance-counter provider into Dtrace. Last update I saw was a few months back saying to keep an eye open for a prototype in the next few months, so it should be getting close now. For CPC on sparc you need access to the individual implemetation supplements for each chip variant - they''ve not been opened up (hopefully they will be). Cheers Gavin On 07/01/05 08:13, Brendan Gregg wrote:> G''Day Folks, > > There has been a number of items on my DTrace ToDo list that have been > sitting there for a while. For each of them I''ve reached the point where > I''m reading through kernel/dtrace source to sort them out, which is not > in itself a bad thing, however it may be dumb if I''ve missed something > obvious. I''ll post them gradually to this list to see if anyone has > knowledge of these issues. > > ------------------------------------------- > 1. Measuring Ecache hit rate for user funcs > ------------------------------------------- > > I think this was first suggested to me by Tony Shoumack of Adelaide, > Australia. It''s an interesting idea. > > It would require initialising the PICs on the CPUs to measure the Ecache > events, and to then read the PIC registers on the entry and exit to user > functions. > > Ok, so initialising the PICs can be done by running cpustat, > > # cpustat -c pic0=EC_ref,pic1=EC_hit 1 5 > time cpu event pic0 pic1 > 1.012 0 tick 15610037 14302458 > 2.012 0 tick 12827455 11731318 > 3.012 0 tick 14063838 13054689 > 4.012 0 tick 14706825 13509602 > 5.012 0 tick 14708909 13530250 > 5.012 1 total 71917064 66128317 > > Wheee. This is from an UltraSPARC-IIi CPU. > > So, lets say we cheat and run cpustat > /dev/null via a system() in a > destructive dtrace script to initialise those counters, then it''s only a > matter of reading those registers on the entry and return of user > functions. > > DTrace lets us read plenty of registers through uregs[]. But I have not > found the PIC data registers via uregs[] (I''ve run scripts to > print them all out). I think they either haven''t been mapped, or something > far more insurmountable - such as this saved set of user-mode of > registers simply doesn''t contain the PIC registers - and calls to uregs[] > is reading this saved set, not the registers directly. > > Of course, to even get this close assumes that reading the counters is > simple and doesn''t require any more PIC control register (PCR) writes, > which should be the case if there were 2 PIC data registers and 1 PIC > control register; > > This appears to be the case for an UltraSPARC-II anyway, as the > following is from the User Manual - 802-7220-02.pdf, > > "Up to two performance events can be measured simultaneously in > UltraSPARC. The Performance Control Register (PCR) controls event > selection and filtering (that is, counting user and/or system level > events) for a pair of 32-bit Perfor-mance Instrumentation Counters > (PICs)." > > A pair a PICs. So, lets find the routine that actually reads them > (much dtracing later*), > > # mdb -k > [...] > > ultra_getpic::dis > ultra_getpic: retl > ultra_getpic+4: rd %pic, %o0 > > > > The routine only reads one PIC counter, but this turns out to be a 64-bit > register that contains both 32-bit PIC data registers joined at the hip. > > (* before source was released, I had to find ultra_getpic using dtrace > alone - not too hard, just do 17 PIC reads, aggregate everything that > looks vaguely familiar, and see what was fired 17 times) > > ultra_getpic is written in assembly in usr/src/uts/sun4/ml/cpc_hwreg.s, > and is called from the us_pcbe_sample() C function - whose stack backtrace > is, > > # dtrace -n ''fbt::us_pcbe_sample:entry { stack(); }'' > dtrace: description ''fbt::us_pcbe_sample:entry '' matched 1 probe > CPU ID FUNCTION:NAME > 0 36657 us_pcbe_sample:entry > genunix`kcpc_sample+0xf0 > cpc`kcpc_ioctl+0x270 > genunix`ioctl+0x184 > unix`syscall_trap32+0xcc > > Ok, so it''s safe to say on UltraSPARC that there is one PIC data register > to read. I haven''t found it in uregs[] (and don''t really expect it to be > there). A DTrace function to return an ultra_getpic as arg0 would do the > trick here, and could be called on user func entry and return. > > but. We should try to accomodate x86 as well (in all it''s variants). :) > > Digging around my x86 server, > > The function appears to be rmdsr, > > > rdmsr::dis > rdmsr: movl 0x4(%esp),%ecx > rdmsr+4: rdmsr > rdmsr+6: movl 0x8(%esp),%ecx > rdmsr+0xa: movl %eax,(%ecx) > rdmsr+0xc: movl %edx,0x4(%ecx) > rdmsr+0xf: ret > > > > and is called from ptm_pcbe_sample() which is in > usr/src/uts/intel/pcbe/p123_pcbe.c, > > dtrace -n ''fbt::ptm_pcbe_sample:entry { stack(); }'' > dtrace: description ''fbt::ptm_pcbe_sample:entry '' matched 1 probe > CPU ID FUNCTION:NAME > 0 35917 ptm_pcbe_sample:entry > genunix`kcpc_sample+0xaf > cpc`kcpc_ioctl+0xf5 > genunix`cdev_ioctl+0x2b > specfs`spec_ioctl+0x62 > genunix`fop_ioctl+0x1e > genunix`ioctl+0x199 > unix`sys_sysenter+0xdc > > Holy smokes Batman! Someone has been here before - the following is from > usr/src/uts/intel/pcbe/p123_pcbe.c, > > DTRACE_PROBE1(ptm__curpic0, uint64_t, curpic[0]); > DTRACE_PROBE1(ptm__curpic1, uint64_t, curpic[1]); > > Hmmm. And there are others that probe the PCRs. That''s handy. So if I run > cpustat through a destructive dtrace system(), then I can read PICs at 100 > Hz - the limit of cpustat, and drag the counters into DTrace using the > above sdt probes (which let me write a wonky E$ by PID tool, which only > gives results if the processes spend a loooong time on the CPU. Unless I > think of a more elegant way, that tool won''t see the light of day).. > > The stack backtraces did show something useful - kcpc_sample is common to > all architectures, so it may be easier to call this from DTrace... > > ... > > so, that''s where I''m at. I don''t have a solution yet, still reading up on > things... It''s nice to have source, I should add :) > > Brendan > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
G''Day Gavin, On Fri, 1 Jul 2005, Gavin Maltby wrote:> Hi Brendan, > > There is something along these lines in-progress at the moment - stitching > a performance-counter provider into Dtrace. Last update I saw > was a few months back saying to keep an eye open for a prototype in the > next few months, so it should be getting close now. For CPC on sparc > you need access to the individual implemetation supplements for each > chip variant - they''ve not been opened up (hopefully they will be).Thanks, yes, I should have said that I was looking for a workaround in the meantime - often there are workarounds with DTrace, but probably not this time. cheers, Brendan> Cheers > > Gavin > > On 07/01/05 08:13, Brendan Gregg wrote: > > G''Day Folks, > > > > There has been a number of items on my DTrace ToDo list that have been > > sitting there for a while. For each of them I''ve reached the point where > > I''m reading through kernel/dtrace source to sort them out, which is not > > in itself a bad thing, however it may be dumb if I''ve missed something > > obvious. I''ll post them gradually to this list to see if anyone has > > knowledge of these issues. > > > > ------------------------------------------- > > 1. Measuring Ecache hit rate for user funcs > > -------------------------------------------[...]