iMx
2010-Jun-19 09:36 UTC
[dtrace-discuss] Using dtrace to pin point a kernel driver problem
Good Morning, Im some what new to the dtrace world, but very keen to learn more! I have a small home NAS box that uses a silicon image 3124 RAID card, as purely an interface card, however ive been noticing some undesirable side affects when under relatively heavy IO - there is an acknowledged bug in the si3124 kernel driver. Im running b134 Opensolaris, from checking the source code there hasnt been a change in the si3124 driver for some time, so this is likely across a lot of Solaris releases. There are 5 disks in 1 zpool, 3 are on the si3124 card and 2 are on the motherboard - of the 3 disks on the si3124, under load ie a scrub, one disk will hit 100 w% when monitoring using ''iostat -xcn 1'' and block for 4-5 seconds - iostat output at bottom of this thread. This has been confirmed as a bug on the Oracle bug site (i can add myself as an interested party, but dont seem to be able to add comments?) and also kindly confirmed by Mark Logan at Oracle on the storage-discuss mailing list. http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=622a4828714cd0382000b9764e83?bug_id=6796802 However, can someone help me to compile some dtrace statements/scripts to help pinpoint this further? Ive been playing around with a few in particular, however what i ideally need is when it hits 100 w% to then capture what the driver is doing, otherwise my data gained is clogged up with information from when the card is running ok; not during the 4-5 seconds blocking. root at boss:~# dtrace -n si_\*:entry''{@[probefunc]=count();}'' dtrace: description ''si_*:entry'' matched 71 probes dtrace: aggregation size lowered to 2m ^C si_watchdog_handler 8 si_claim_free_slot 57 si_deliver_satapkt 57 si_intr 57 si_intr_command_complete 57 si_tran_start 57 Any one have any ideas for how to move forward? I keen to be able to help others with this si3124 card, and who also see the problem, pin pointing whats causing this would be a start.... Many thanks, iMx ##### cpu us sy wt id 20 12 0 68 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 32.9 56.8 1085.1 6107.6 0.3 0.2 3.3 1.8 7 10 c0d0 22.9 98.7 765.9 6011.8 0.1 0.1 0.6 1.2 3 10 c4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0 25.9 17.0 829.7 1735.3 0.6 0.1 14.1 2.8 9 12 c3t1d0 27.9 18.0 925.5 1703.4 0.6 0.1 13.5 2.9 10 13 c3t2d0 20.9 19.9 765.9 1898.8 0.7 0.2 16.4 4.2 12 17 c3t3d0 cpu us sy wt id 3 27 0 70 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 2.0 287.8 64.2 23927.6 0.3 0.3 1.2 1.1 12 18 c0d0 2.0 268.8 64.2 24024.2 0.5 0.3 1.7 1.3 14 19 c4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0 0.0 25.1 0.0 3100.7 9.6 0.2 381.2 6.0 100 15 c3t1d0 0.0 23.1 0.0 2844.0 9.6 0.1 417.1 6.3 100 14 c3t2d0 1.0 229.6 64.2 28160.5 4.3 0.5 18.5 2.3 53 52 c3t3d0 cpu us sy wt id 0 1 0 99 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0 cpu us sy wt id 0 0 0 100 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0 cpu us sy wt id 0 1 0 99 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0 cpu us sy wt id 1 1 0 98 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0 cpu us sy wt id 1 6 0 93 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1.0 19.0 64.0 63.5 0.0 0.0 0.1 1.1 0 2 c0d0 1.0 17.0 64.0 63.0 0.0 0.0 0.2 1.1 0 2 c4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0 1.0 222.0 64.0 25216.9 8.4 0.8 37.5 3.7 95 81 c3t1d0 1.0 225.0 64.0 25504.9 8.4 0.8 36.9 3.6 93 81 c3t2d0 0.0 17.0 0.0 63.0 0.0 0.0 0.4 1.8 0 3 c3t3d0 cpu us sy wt id 21 4 0 75 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 38.0 77.0 1216.1 274.0 0.2 0.1 1.7 1.2 4 8 c0d0 37.0 57.0 1216.1 274.0 0.2 0.1 2.6 1.1 4 7 c4d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0 21.0 66.0 672.0 274.0 0.2 0.1 2.6 0.9 3 8 c3t1d0 16.0 60.0 544.0 274.0 0.3 0.1 3.4 1.5 6 11 c3t2d0 39.0 59.0 1280.1 274.0 0.2 0.1 2.3 0.9 6 9 c3t3d0
Ryan Johnson
2010-Jun-21 08:25 UTC
[dtrace-discuss] Using dtrace to pin point a kernel driver problem
On 6/19/2010 11:36 AM, iMx wrote:> Ive been playing around with a few in particular, however what i ideally need is when it hits 100 w% to then capture what the driver is doing, otherwise my data gained is clogged up with information from when the card is running ok; not during the 4-5 seconds blocking. >As a starting point, it seems like you''d want to know which underlying operation was taking so much time. There are three possibilities: 1. A single function could be taking forever to complete 2. A pair of functions (acquire/release, start/finish) could take a long time to complete 3. A set of functions is being called too many times, and the cumulative time is long Normal profiling should identify #3 fairly quickly (count up how often a profile probe catches functions and output/reset the result every few seconds) #1 (and #2 if you know what pair to look for...) can be identified by saving a timestamp at the start and spamming stdout whenever finish-start is over some threshold. If you''ve got a #2 case on your hands, there''s almost certainly some #1 which contains it (even if in userspace), so you could start with that and drill down. Start with that and see what suggests itself from there. You could also imagine all kinds of mixing and matching... track profile information whenever a suspected function is in-progress... use speculative tracing to print info only when it turns out that your threshold conditions have been met, etc. Hope that helps, Ryan