thr3ads.net - dtrace discuss - [dtrace-discuss] Trying to identify writer and/or reason for iowrite. [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Robert Alatalo

2008-Dec-05 21:45 UTC

[dtrace-discuss] Trying to identify writer and/or reason for iowrite.

Hello,

	Running the iotop dtrace script, we see many entries like....

UID    PID   PPID CMD              DEVICE  MAJ MIN D            BYTES
0      3      0 fsflush          md3      85   3 W              512
...
...
0      0      0 sched            ssd22   118 176 W         80538112
0      0      0 sched            ssd18   118 144 W         80585728
0      0      0 sched            ssd24   118 192 W         80730624
0      0      0 sched            ssd19   118 152 W         80762880
0      0      0 sched            ssd23   118 184 W         80764928
0      0      0 sched            ssd25   118 200 W         80965632
0      0      0 sched            ssd20   118 160 W         81029120
0      0      0 sched            ssd21   118 168 W         81132032

In the iostat we see things like

device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
...
...
ssd18        0.0  157.2    0.0 17914.5  9.2 13.3  142.9  63  75
ssd19        0.0  161.4    0.0 17887.7  9.6 13.7  144.6  65  76
ssd20        0.0  166.4    0.0 17922.0  8.3 12.7  126.1  58  74
ssd21        0.0  157.6    0.0 17880.0  9.3 13.4  144.1  64  75
ssd22        0.0  153.3    0.0 17867.7  9.2 13.4  147.3  63  75
ssd23        0.0  154.3    0.0 17905.6  8.8 13.0  141.5  61  74
ssd24        0.0  160.4    2.1 17915.4  9.2 13.4  141.3  63  75
ssd25        0.0  160.7    0.0 17934.5  9.7 13.8  145.8  66  76


	Can anyone suggest a different script, or help modify the iotop as 
included in the tookkit so as to either print stack traces or better 
yet, print stack traces when the %busy goes above some threshhold, or 
other way of limiting the data to go though?

thanks in advance,
Robert

Jim Mauro

2008-Dec-05 22:15 UTC

head link

[dtrace-discuss] Trying to identify writer and/or reason for iowrite.

The problem you''re running into is disk IO operations tend to occur
asynchronously to the thread that initiated the IO, so when the IO
provider probe fires, execname shows the process name for PID 0.
This is not uncommon when chasing disk and network IOs. You
need to capture the write further up the stack.

The easiest method for determining which process(es) are writing is
to use either the syscall provider, or the fsinfo provider (depending
on which release of Solaris you''re running, fsinfo may not be there).

Use the syscall provider and see which system calls are being used to
generate disk writes - it''s probably write(2), but it may be any of;

nv98> dtrace -l -P syscall | grep write
76295    syscall                                               write entry
76296    syscall                                               write return
76497    syscall                                              writev entry
76498    syscall                                              writev return
76597    syscall                                              pwrite entry
76598    syscall                                              pwrite return
76691    syscall                                            pwrite64 entry
76692    syscall                                            pwrite64 return

#!/usr/sbin/dtrace -s

syscall::write:entry,syscall::writev:entry,syscall::pwrite:entry,syscall::pwrite64:entry
{
    @[pid,probefunc] = count();
}

Once you have the correct system call and process name(s), fine tune
the DTrace and grab a user stack();

#!/usr/sbin/dtrace -s

syscall::write:entry
/ pid == <PID_OF_INTEREST> /
{
    @[ustack()] = count();
}

The above assumes it''s all write(2) system calls.

You can determine which files by grabbing arg0 when the
probe fires. Depending on which release of Solaris you''re
running, you can use the fds array to get the file path;

#!/usr/sbin/dtrace -s

syscall::write:entry
{
    @[execname, fds[arg0].fi_pathname] = count();
}


If your version of Solaris is older, and does not have fds available, 
just track arg0
(the file descriptor), and run pfiles on the process to map the file 
descriptor to
the file.

HTH,
/jim



Robert Alatalo wrote:> Hello,
>
> 	Running the iotop dtrace script, we see many entries like....
>
> UID    PID   PPID CMD              DEVICE  MAJ MIN D            BYTES
> 0      3      0 fsflush          md3      85   3 W              512
> ...
> ...
> 0      0      0 sched            ssd22   118 176 W         80538112
> 0      0      0 sched            ssd18   118 144 W         80585728
> 0      0      0 sched            ssd24   118 192 W         80730624
> 0      0      0 sched            ssd19   118 152 W         80762880
> 0      0      0 sched            ssd23   118 184 W         80764928
> 0      0      0 sched            ssd25   118 200 W         80965632
> 0      0      0 sched            ssd20   118 160 W         81029120
> 0      0      0 sched            ssd21   118 168 W         81132032
>
> In the iostat we see things like
>
> device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
> ...
> ...
> ssd18        0.0  157.2    0.0 17914.5  9.2 13.3  142.9  63  75
> ssd19        0.0  161.4    0.0 17887.7  9.6 13.7  144.6  65  76
> ssd20        0.0  166.4    0.0 17922.0  8.3 12.7  126.1  58  74
> ssd21        0.0  157.6    0.0 17880.0  9.3 13.4  144.1  64  75
> ssd22        0.0  153.3    0.0 17867.7  9.2 13.4  147.3  63  75
> ssd23        0.0  154.3    0.0 17905.6  8.8 13.0  141.5  61  74
> ssd24        0.0  160.4    2.1 17915.4  9.2 13.4  141.3  63  75
> ssd25        0.0  160.7    0.0 17934.5  9.7 13.8  145.8  66  76
>
>
> 	Can anyone suggest a different script, or help modify the iotop as 
> included in the tookkit so as to either print stack traces or better 
> yet, print stack traces when the %busy goes above some threshhold, or 
> other way of limiting the data to go though?
>
> thanks in advance,
> Robert
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

Brendan Gregg - Sun Microsystems

2008-Dec-09 19:41 UTC

head link

[dtrace-discuss] Trying to identify writer and/or reason for iowrite.

On Fri, Dec 05, 2008 at 05:15:09PM -0500, Jim Mauro
wrote:> The problem you''re running into is disk IO operations tend to
occur
> asynchronously to the thread that initiated the IO, so when the IO
> provider probe fires, execname shows the process name for PID 0.
> This is not uncommon when chasing disk and network IOs. You
> need to capture the write further up the stack.
Yes;  This happens in particular with ZFS, which uses a pipeline and task
queues to dispatch writes.  It''s a great FS design, but makes debugging
more difficult.

# dtrace -n ''io:::start { @[stack()] = count(); }''
[...]

              genunix`ldi_strategy+0x54
              zfs`vdev_disk_io_start+0x1a5
              zfs`zio_vdev_io_start+0x148
              zfs`zio_execute+0x7f
              zfs`vdev_queue_io_done+0x98
              zfs`vdev_disk_io_done+0x2b
              zfs`zio_vdev_io_done+0x22
              zfs`zio_execute+0x7f
              genunix`taskq_thread+0x1a7
              unix`thread_start+0x8
             1500

Brendan

-- 
Brendan Gregg, Sun Microsystems Fishworks.    http://blogs.sun.com/brendan

dtrace discuss - Dec 2008 - Trying to identify writer and/or reason for iowrite.

[dtrace-discuss] Trying to identify writer and/or reason for iowrite.

[dtrace-discuss] Trying to identify writer and/or reason for iowrite.

[dtrace-discuss] Trying to identify writer and/or reason for iowrite.