G''Day Folks,
There has been a number of items on my DTrace ToDo list that have been
sitting there for a while. For each of them I''ve reached the point
where
I''m reading through kernel/dtrace source to sort them out, which is not
in itself a bad thing, however it may be dumb if I''ve missed something
obvious. I''ll post them gradually to this list to see if anyone has
knowledge of these issues.
-------------------------------------------
1. Measuring Ecache hit rate for user funcs
-------------------------------------------
I think this was first suggested to me by Tony Shoumack of Adelaide,
Australia. It''s an interesting idea.
It would require initialising the PICs on the CPUs to measure the Ecache
events, and to then read the PIC registers on the entry and exit to user
functions.
Ok, so initialising the PICs can be done by running cpustat,
   # cpustat -c pic0=EC_ref,pic1=EC_hit 1 5
      time cpu event      pic0      pic1
     1.012   0  tick  15610037  14302458
     2.012   0  tick  12827455  11731318
     3.012   0  tick  14063838  13054689
     4.012   0  tick  14706825  13509602
     5.012   0  tick  14708909  13530250
     5.012   1 total  71917064  66128317
Wheee. This is from an UltraSPARC-IIi CPU.
So, lets say we cheat and run cpustat > /dev/null via a system() in a
destructive dtrace script to initialise those counters, then it''s only
a
matter of reading those registers on the entry and return of user
functions.
DTrace lets us read plenty of registers through uregs[]. But I have not
found the PIC data registers via uregs[] (I''ve run scripts to
print them all out). I think they either haven''t been mapped, or
something
far more insurmountable - such as this saved set of user-mode of
registers simply doesn''t contain the PIC registers - and calls to
uregs[]
is reading this saved set, not the registers directly.
Of course, to even get this close assumes that reading the counters is
simple and doesn''t require any more PIC control register (PCR) writes,
which should be the case if there were 2 PIC data registers and 1 PIC
control register;
This appears to be the case for an UltraSPARC-II anyway, as the
following is from the User Manual - 802-7220-02.pdf,
   "Up to two performance events can be measured simultaneously in
   UltraSPARC. The Performance Control Register (PCR) controls event
   selection and filtering (that is, counting user and/or system level
   events) for a pair of 32-bit Perfor-mance Instrumentation Counters
   (PICs)."
A pair a PICs. So, lets find the routine that actually reads them
(much dtracing later*),
   # mdb -k
   [...]
   > ultra_getpic::dis
   ultra_getpic:                   retl
   ultra_getpic+4:                 rd        %pic, %o0
   >
The routine only reads one PIC counter, but this turns out to be a 64-bit
register that contains both 32-bit PIC data registers joined at the hip.
(* before source was released, I had to find ultra_getpic using dtrace
alone - not too hard, just do 17 PIC reads, aggregate everything that
looks vaguely familiar, and see what was fired 17 times)
ultra_getpic is written in assembly in usr/src/uts/sun4/ml/cpc_hwreg.s,
and is called from the us_pcbe_sample() C function - whose stack backtrace
is,
   # dtrace -n ''fbt::us_pcbe_sample:entry { stack(); }''
   dtrace: description ''fbt::us_pcbe_sample:entry '' matched 1
probe
   CPU     ID                    FUNCTION:NAME
     0  36657             us_pcbe_sample:entry
                 genunix`kcpc_sample+0xf0
                 cpc`kcpc_ioctl+0x270
                 genunix`ioctl+0x184
                 unix`syscall_trap32+0xcc
Ok, so it''s safe to say on UltraSPARC that there is one PIC data
register
to read. I haven''t found it in uregs[] (and don''t really
expect it to be
there). A DTrace function to return an ultra_getpic as arg0 would do the
trick here, and could be called on user func entry and return.
but. We should try to accomodate x86 as well (in all it''s variants). :)
Digging around my x86 server,
The function appears to be rmdsr,
   > rdmsr::dis
   rdmsr:                          movl   0x4(%esp),%ecx
   rdmsr+4:                        rdmsr
   rdmsr+6:                        movl   0x8(%esp),%ecx
   rdmsr+0xa:                      movl   %eax,(%ecx)
   rdmsr+0xc:                      movl   %edx,0x4(%ecx)
   rdmsr+0xf:                      ret
   >
and is called from ptm_pcbe_sample() which is in
usr/src/uts/intel/pcbe/p123_pcbe.c,
   dtrace -n ''fbt::ptm_pcbe_sample:entry { stack(); }''
   dtrace: description ''fbt::ptm_pcbe_sample:entry '' matched 1
probe
   CPU     ID                    FUNCTION:NAME
     0  35917            ptm_pcbe_sample:entry
                 genunix`kcpc_sample+0xaf
                 cpc`kcpc_ioctl+0xf5
                 genunix`cdev_ioctl+0x2b
                 specfs`spec_ioctl+0x62
                 genunix`fop_ioctl+0x1e
                 genunix`ioctl+0x199
                 unix`sys_sysenter+0xdc
Holy smokes Batman! Someone has been here before - the following is from
usr/src/uts/intel/pcbe/p123_pcbe.c,
        DTRACE_PROBE1(ptm__curpic0, uint64_t, curpic[0]);
        DTRACE_PROBE1(ptm__curpic1, uint64_t, curpic[1]);
Hmmm. And there are others that probe the PCRs. That''s handy. So if I
run
cpustat through a destructive dtrace system(), then I can read PICs at 100
Hz - the limit of cpustat, and drag the counters into DTrace using the
above sdt probes (which let me write a wonky E$ by PID tool, which only
gives results if the processes spend a loooong time on the CPU. Unless I
think of a more elegant way, that tool won''t see the light of day)..
The stack backtraces did show something useful - kcpc_sample is common to
all architectures, so it may be easier to call this from DTrace...
...
so, that''s where I''m at. I don''t have a solution yet,
still reading up on
things... It''s nice to have source, I should add :)
Brendan
Hi Brendan, There is something along these lines in-progress at the moment - stitching a performance-counter provider into Dtrace. Last update I saw was a few months back saying to keep an eye open for a prototype in the next few months, so it should be getting close now. For CPC on sparc you need access to the individual implemetation supplements for each chip variant - they''ve not been opened up (hopefully they will be). Cheers Gavin On 07/01/05 08:13, Brendan Gregg wrote:> G''Day Folks, > > There has been a number of items on my DTrace ToDo list that have been > sitting there for a while. For each of them I''ve reached the point where > I''m reading through kernel/dtrace source to sort them out, which is not > in itself a bad thing, however it may be dumb if I''ve missed something > obvious. I''ll post them gradually to this list to see if anyone has > knowledge of these issues. > > ------------------------------------------- > 1. Measuring Ecache hit rate for user funcs > ------------------------------------------- > > I think this was first suggested to me by Tony Shoumack of Adelaide, > Australia. It''s an interesting idea. > > It would require initialising the PICs on the CPUs to measure the Ecache > events, and to then read the PIC registers on the entry and exit to user > functions. > > Ok, so initialising the PICs can be done by running cpustat, > > # cpustat -c pic0=EC_ref,pic1=EC_hit 1 5 > time cpu event pic0 pic1 > 1.012 0 tick 15610037 14302458 > 2.012 0 tick 12827455 11731318 > 3.012 0 tick 14063838 13054689 > 4.012 0 tick 14706825 13509602 > 5.012 0 tick 14708909 13530250 > 5.012 1 total 71917064 66128317 > > Wheee. This is from an UltraSPARC-IIi CPU. > > So, lets say we cheat and run cpustat > /dev/null via a system() in a > destructive dtrace script to initialise those counters, then it''s only a > matter of reading those registers on the entry and return of user > functions. > > DTrace lets us read plenty of registers through uregs[]. But I have not > found the PIC data registers via uregs[] (I''ve run scripts to > print them all out). I think they either haven''t been mapped, or something > far more insurmountable - such as this saved set of user-mode of > registers simply doesn''t contain the PIC registers - and calls to uregs[] > is reading this saved set, not the registers directly. > > Of course, to even get this close assumes that reading the counters is > simple and doesn''t require any more PIC control register (PCR) writes, > which should be the case if there were 2 PIC data registers and 1 PIC > control register; > > This appears to be the case for an UltraSPARC-II anyway, as the > following is from the User Manual - 802-7220-02.pdf, > > "Up to two performance events can be measured simultaneously in > UltraSPARC. The Performance Control Register (PCR) controls event > selection and filtering (that is, counting user and/or system level > events) for a pair of 32-bit Perfor-mance Instrumentation Counters > (PICs)." > > A pair a PICs. So, lets find the routine that actually reads them > (much dtracing later*), > > # mdb -k > [...] > > ultra_getpic::dis > ultra_getpic: retl > ultra_getpic+4: rd %pic, %o0 > > > > The routine only reads one PIC counter, but this turns out to be a 64-bit > register that contains both 32-bit PIC data registers joined at the hip. > > (* before source was released, I had to find ultra_getpic using dtrace > alone - not too hard, just do 17 PIC reads, aggregate everything that > looks vaguely familiar, and see what was fired 17 times) > > ultra_getpic is written in assembly in usr/src/uts/sun4/ml/cpc_hwreg.s, > and is called from the us_pcbe_sample() C function - whose stack backtrace > is, > > # dtrace -n ''fbt::us_pcbe_sample:entry { stack(); }'' > dtrace: description ''fbt::us_pcbe_sample:entry '' matched 1 probe > CPU ID FUNCTION:NAME > 0 36657 us_pcbe_sample:entry > genunix`kcpc_sample+0xf0 > cpc`kcpc_ioctl+0x270 > genunix`ioctl+0x184 > unix`syscall_trap32+0xcc > > Ok, so it''s safe to say on UltraSPARC that there is one PIC data register > to read. I haven''t found it in uregs[] (and don''t really expect it to be > there). A DTrace function to return an ultra_getpic as arg0 would do the > trick here, and could be called on user func entry and return. > > but. We should try to accomodate x86 as well (in all it''s variants). :) > > Digging around my x86 server, > > The function appears to be rmdsr, > > > rdmsr::dis > rdmsr: movl 0x4(%esp),%ecx > rdmsr+4: rdmsr > rdmsr+6: movl 0x8(%esp),%ecx > rdmsr+0xa: movl %eax,(%ecx) > rdmsr+0xc: movl %edx,0x4(%ecx) > rdmsr+0xf: ret > > > > and is called from ptm_pcbe_sample() which is in > usr/src/uts/intel/pcbe/p123_pcbe.c, > > dtrace -n ''fbt::ptm_pcbe_sample:entry { stack(); }'' > dtrace: description ''fbt::ptm_pcbe_sample:entry '' matched 1 probe > CPU ID FUNCTION:NAME > 0 35917 ptm_pcbe_sample:entry > genunix`kcpc_sample+0xaf > cpc`kcpc_ioctl+0xf5 > genunix`cdev_ioctl+0x2b > specfs`spec_ioctl+0x62 > genunix`fop_ioctl+0x1e > genunix`ioctl+0x199 > unix`sys_sysenter+0xdc > > Holy smokes Batman! Someone has been here before - the following is from > usr/src/uts/intel/pcbe/p123_pcbe.c, > > DTRACE_PROBE1(ptm__curpic0, uint64_t, curpic[0]); > DTRACE_PROBE1(ptm__curpic1, uint64_t, curpic[1]); > > Hmmm. And there are others that probe the PCRs. That''s handy. So if I run > cpustat through a destructive dtrace system(), then I can read PICs at 100 > Hz - the limit of cpustat, and drag the counters into DTrace using the > above sdt probes (which let me write a wonky E$ by PID tool, which only > gives results if the processes spend a loooong time on the CPU. Unless I > think of a more elegant way, that tool won''t see the light of day).. > > The stack backtraces did show something useful - kcpc_sample is common to > all architectures, so it may be easier to call this from DTrace... > > ... > > so, that''s where I''m at. I don''t have a solution yet, still reading up on > things... It''s nice to have source, I should add :) > > Brendan > > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org
G''Day Gavin, On Fri, 1 Jul 2005, Gavin Maltby wrote:> Hi Brendan, > > There is something along these lines in-progress at the moment - stitching > a performance-counter provider into Dtrace. Last update I saw > was a few months back saying to keep an eye open for a prototype in the > next few months, so it should be getting close now. For CPC on sparc > you need access to the individual implemetation supplements for each > chip variant - they''ve not been opened up (hopefully they will be).Thanks, yes, I should have said that I was looking for a workaround in the meantime - often there are workarounds with DTrace, but probably not this time. cheers, Brendan> Cheers > > Gavin > > On 07/01/05 08:13, Brendan Gregg wrote: > > G''Day Folks, > > > > There has been a number of items on my DTrace ToDo list that have been > > sitting there for a while. For each of them I''ve reached the point where > > I''m reading through kernel/dtrace source to sort them out, which is not > > in itself a bad thing, however it may be dumb if I''ve missed something > > obvious. I''ll post them gradually to this list to see if anyone has > > knowledge of these issues. > > > > ------------------------------------------- > > 1. Measuring Ecache hit rate for user funcs > > -------------------------------------------[...]