iMx
2010-Jun-19 09:36 UTC
[dtrace-discuss] Using dtrace to pin point a kernel driver problem
Good Morning,
Im some what new to the dtrace world, but very keen to learn more! I have a
small home NAS box that uses a silicon image 3124 RAID card, as purely an
interface card, however ive been noticing some undesirable side affects when
under relatively heavy IO - there is an acknowledged bug in the si3124 kernel
driver. Im running b134 Opensolaris, from checking the source code there hasnt
been a change in the si3124 driver for some time, so this is likely across a lot
of Solaris releases.
There are 5 disks in 1 zpool, 3 are on the si3124 card and 2 are on the
motherboard - of the 3 disks on the si3124, under load ie a scrub, one disk will
hit 100 w% when monitoring using ''iostat -xcn 1'' and block for
4-5 seconds - iostat output at bottom of this thread. This has been confirmed
as a bug on the Oracle bug site (i can add myself as an interested party, but
dont seem to be able to add comments?) and also kindly confirmed by Mark Logan
at Oracle on the storage-discuss mailing list.
http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=622a4828714cd0382000b9764e83?bug_id=6796802
However, can someone help me to compile some dtrace statements/scripts to help
pinpoint this further? Ive been playing around with a few in particular,
however what i ideally need is when it hits 100 w% to then capture what the
driver is doing, otherwise my data gained is clogged up with information from
when the card is running ok; not during the 4-5 seconds blocking.
root at boss:~# dtrace -n si_\*:entry''{@[probefunc]=count();}''
dtrace: description ''si_*:entry'' matched 71 probes
dtrace: aggregation size lowered to 2m
^C
si_watchdog_handler 8
si_claim_free_slot 57
si_deliver_satapkt 57
si_intr 57
si_intr_command_complete 57
si_tran_start 57
Any one have any ideas for how to move forward? I keen to be able to help
others with this si3124 card, and who also see the problem, pin pointing whats
causing this would be a start....
Many thanks,
iMx
#####
cpu
us sy wt id
20 12 0 68
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
32.9 56.8 1085.1 6107.6 0.3 0.2 3.3 1.8 7 10 c0d0
22.9 98.7 765.9 6011.8 0.1 0.1 0.6 1.2 3 10 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
25.9 17.0 829.7 1735.3 0.6 0.1 14.1 2.8 9 12 c3t1d0
27.9 18.0 925.5 1703.4 0.6 0.1 13.5 2.9 10 13 c3t2d0
20.9 19.9 765.9 1898.8 0.7 0.2 16.4 4.2 12 17 c3t3d0
cpu
us sy wt id
3 27 0 70
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2.0 287.8 64.2 23927.6 0.3 0.3 1.2 1.1 12 18 c0d0
2.0 268.8 64.2 24024.2 0.5 0.3 1.7 1.3 14 19 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 25.1 0.0 3100.7 9.6 0.2 381.2 6.0 100 15 c3t1d0
0.0 23.1 0.0 2844.0 9.6 0.1 417.1 6.3 100 14 c3t2d0
1.0 229.6 64.2 28160.5 4.3 0.5 18.5 2.3 53 52 c3t3d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
0 0 0 100
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
1 1 0 98
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
1 6 0 93
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 19.0 64.0 63.5 0.0 0.0 0.1 1.1 0 2 c0d0
1.0 17.0 64.0 63.0 0.0 0.0 0.2 1.1 0 2 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
1.0 222.0 64.0 25216.9 8.4 0.8 37.5 3.7 95 81 c3t1d0
1.0 225.0 64.0 25504.9 8.4 0.8 36.9 3.6 93 81 c3t2d0
0.0 17.0 0.0 63.0 0.0 0.0 0.4 1.8 0 3 c3t3d0
cpu
us sy wt id
21 4 0 75
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
38.0 77.0 1216.1 274.0 0.2 0.1 1.7 1.2 4 8 c0d0
37.0 57.0 1216.1 274.0 0.2 0.1 2.6 1.1 4 7 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
21.0 66.0 672.0 274.0 0.2 0.1 2.6 0.9 3 8 c3t1d0
16.0 60.0 544.0 274.0 0.3 0.1 3.4 1.5 6 11 c3t2d0
39.0 59.0 1280.1 274.0 0.2 0.1 2.3 0.9 6 9 c3t3d0
Ryan Johnson
2010-Jun-21 08:25 UTC
[dtrace-discuss] Using dtrace to pin point a kernel driver problem
On 6/19/2010 11:36 AM, iMx wrote:> Ive been playing around with a few in particular, however what i ideally need is when it hits 100 w% to then capture what the driver is doing, otherwise my data gained is clogged up with information from when the card is running ok; not during the 4-5 seconds blocking. >As a starting point, it seems like you''d want to know which underlying operation was taking so much time. There are three possibilities: 1. A single function could be taking forever to complete 2. A pair of functions (acquire/release, start/finish) could take a long time to complete 3. A set of functions is being called too many times, and the cumulative time is long Normal profiling should identify #3 fairly quickly (count up how often a profile probe catches functions and output/reset the result every few seconds) #1 (and #2 if you know what pair to look for...) can be identified by saving a timestamp at the start and spamming stdout whenever finish-start is over some threshold. If you''ve got a #2 case on your hands, there''s almost certainly some #1 which contains it (even if in userspace), so you could start with that and drill down. Start with that and see what suggests itself from there. You could also imagine all kinds of mixing and matching... track profile information whenever a suspected function is in-progress... use speculative tracing to print info only when it turns out that your threshold conditions have been met, etc. Hope that helps, Ryan