Robert Alatalo
2008-Dec-05 21:45 UTC
[dtrace-discuss] Trying to identify writer and/or reason for iowrite.
Hello, Running the iotop dtrace script, we see many entries like.... UID PID PPID CMD DEVICE MAJ MIN D BYTES 0 3 0 fsflush md3 85 3 W 512 ... ... 0 0 0 sched ssd22 118 176 W 80538112 0 0 0 sched ssd18 118 144 W 80585728 0 0 0 sched ssd24 118 192 W 80730624 0 0 0 sched ssd19 118 152 W 80762880 0 0 0 sched ssd23 118 184 W 80764928 0 0 0 sched ssd25 118 200 W 80965632 0 0 0 sched ssd20 118 160 W 81029120 0 0 0 sched ssd21 118 168 W 81132032 In the iostat we see things like device r/s w/s kr/s kw/s wait actv svc_t %w %b ... ... ssd18 0.0 157.2 0.0 17914.5 9.2 13.3 142.9 63 75 ssd19 0.0 161.4 0.0 17887.7 9.6 13.7 144.6 65 76 ssd20 0.0 166.4 0.0 17922.0 8.3 12.7 126.1 58 74 ssd21 0.0 157.6 0.0 17880.0 9.3 13.4 144.1 64 75 ssd22 0.0 153.3 0.0 17867.7 9.2 13.4 147.3 63 75 ssd23 0.0 154.3 0.0 17905.6 8.8 13.0 141.5 61 74 ssd24 0.0 160.4 2.1 17915.4 9.2 13.4 141.3 63 75 ssd25 0.0 160.7 0.0 17934.5 9.7 13.8 145.8 66 76 Can anyone suggest a different script, or help modify the iotop as included in the tookkit so as to either print stack traces or better yet, print stack traces when the %busy goes above some threshhold, or other way of limiting the data to go though? thanks in advance, Robert
Jim Mauro
2008-Dec-05 22:15 UTC
[dtrace-discuss] Trying to identify writer and/or reason for iowrite.
The problem you''re running into is disk IO operations tend to occur asynchronously to the thread that initiated the IO, so when the IO provider probe fires, execname shows the process name for PID 0. This is not uncommon when chasing disk and network IOs. You need to capture the write further up the stack. The easiest method for determining which process(es) are writing is to use either the syscall provider, or the fsinfo provider (depending on which release of Solaris you''re running, fsinfo may not be there). Use the syscall provider and see which system calls are being used to generate disk writes - it''s probably write(2), but it may be any of; nv98> dtrace -l -P syscall | grep write 76295 syscall write entry 76296 syscall write return 76497 syscall writev entry 76498 syscall writev return 76597 syscall pwrite entry 76598 syscall pwrite return 76691 syscall pwrite64 entry 76692 syscall pwrite64 return #!/usr/sbin/dtrace -s syscall::write:entry,syscall::writev:entry,syscall::pwrite:entry,syscall::pwrite64:entry { @[pid,probefunc] = count(); } Once you have the correct system call and process name(s), fine tune the DTrace and grab a user stack(); #!/usr/sbin/dtrace -s syscall::write:entry / pid == <PID_OF_INTEREST> / { @[ustack()] = count(); } The above assumes it''s all write(2) system calls. You can determine which files by grabbing arg0 when the probe fires. Depending on which release of Solaris you''re running, you can use the fds array to get the file path; #!/usr/sbin/dtrace -s syscall::write:entry { @[execname, fds[arg0].fi_pathname] = count(); } If your version of Solaris is older, and does not have fds available, just track arg0 (the file descriptor), and run pfiles on the process to map the file descriptor to the file. HTH, /jim Robert Alatalo wrote:> Hello, > > Running the iotop dtrace script, we see many entries like.... > > UID PID PPID CMD DEVICE MAJ MIN D BYTES > 0 3 0 fsflush md3 85 3 W 512 > ... > ... > 0 0 0 sched ssd22 118 176 W 80538112 > 0 0 0 sched ssd18 118 144 W 80585728 > 0 0 0 sched ssd24 118 192 W 80730624 > 0 0 0 sched ssd19 118 152 W 80762880 > 0 0 0 sched ssd23 118 184 W 80764928 > 0 0 0 sched ssd25 118 200 W 80965632 > 0 0 0 sched ssd20 118 160 W 81029120 > 0 0 0 sched ssd21 118 168 W 81132032 > > In the iostat we see things like > > device r/s w/s kr/s kw/s wait actv svc_t %w %b > ... > ... > ssd18 0.0 157.2 0.0 17914.5 9.2 13.3 142.9 63 75 > ssd19 0.0 161.4 0.0 17887.7 9.6 13.7 144.6 65 76 > ssd20 0.0 166.4 0.0 17922.0 8.3 12.7 126.1 58 74 > ssd21 0.0 157.6 0.0 17880.0 9.3 13.4 144.1 64 75 > ssd22 0.0 153.3 0.0 17867.7 9.2 13.4 147.3 63 75 > ssd23 0.0 154.3 0.0 17905.6 8.8 13.0 141.5 61 74 > ssd24 0.0 160.4 2.1 17915.4 9.2 13.4 141.3 63 75 > ssd25 0.0 160.7 0.0 17934.5 9.7 13.8 145.8 66 76 > > > Can anyone suggest a different script, or help modify the iotop as > included in the tookkit so as to either print stack traces or better > yet, print stack traces when the %busy goes above some threshhold, or > other way of limiting the data to go though? > > thanks in advance, > Robert > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss at opensolaris.org >
Brendan Gregg - Sun Microsystems
2008-Dec-09 19:41 UTC
[dtrace-discuss] Trying to identify writer and/or reason for iowrite.
On Fri, Dec 05, 2008 at 05:15:09PM -0500, Jim Mauro wrote:> The problem you''re running into is disk IO operations tend to occur > asynchronously to the thread that initiated the IO, so when the IO > provider probe fires, execname shows the process name for PID 0. > This is not uncommon when chasing disk and network IOs. You > need to capture the write further up the stack.Yes; This happens in particular with ZFS, which uses a pipeline and task queues to dispatch writes. It''s a great FS design, but makes debugging more difficult. # dtrace -n ''io:::start { @[stack()] = count(); }'' [...] genunix`ldi_strategy+0x54 zfs`vdev_disk_io_start+0x1a5 zfs`zio_vdev_io_start+0x148 zfs`zio_execute+0x7f zfs`vdev_queue_io_done+0x98 zfs`vdev_disk_io_done+0x2b zfs`zio_vdev_io_done+0x22 zfs`zio_execute+0x7f genunix`taskq_thread+0x1a7 unix`thread_start+0x8 1500 Brendan -- Brendan Gregg, Sun Microsystems Fishworks. http://blogs.sun.com/brendan