Amer Ather
2008-Nov-24 01:51 UTC
[zfs-discuss] Long delays in txg_wait_open() - Waiting for transaction group to open
IHAC who is seeing very slow NFS transactions over ZFS. rfs3_write(), rfs3_create() and others are taking in the order of 17-20 seconds to complete. Profiling these transactions showing most of the time is spent in txg_wait_open() - waiting for transaction group to open. We tried "zfs_nocacheflush: but it did n''t help. iostat showing good service time (<10ms). System is running Solaris 10 - 137137-09 00000301b616ec00 3000813e168 3017b9e3798 2 59 6002296f966 PC: cv_wait+0x38 CMD: mv camshot_081118_000029.jpg camshot_081118_000059.jpg camshot_081118_000129.j p stack pointer for thread 301b616ec00: 2a104a9ceb1 [ 000002a104a9ceb1 cv_wait+0x38() ] txg_wait_open+0x54() zfs_write+0x34c() fop_write+0x20() write+0x268() syscall_trap32+0xcc() Timing txg_wait_open() shows: txg_wait_open delay DELAY value ------------- Distribution ------------- count 4294967296 | 0 8589934592 |@ 1 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 33 34359738368 | 0 txg_wait_open delay DELAY value ------------- Distribution ------------- count 134217728 | 0 268435456 |@@ 2 536870912 | 0 1073741824 | 0 2147483648 |@ 1 4294967296 | 0 8589934592 |@@@@@@@ 7 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29 34359738368 | 0 DTrace script: #!/usr/sbin/dtrace -qs fbt::txg_wait_open:entry { self->t = timestamp; } fbt::txg_wait_open:return /self->t/ { @a["DELAY"] = quantize(timestamp - self->t); } tick-5sec { printf("\ntxg_wait_open delay"); printa(@a); trunc(@a); } I have also profiled rfs3_write(), rfs3_create() and other using DTrace and taken time delta and stack at each frame on all functions called by rfs3_* and see txg_wait_open() is the one taking majority of the time. Thanks, Amer.
Prabahar Jeyaram
2008-Nov-24 02:29 UTC
[zfs-discuss] Long delays in txg_wait_open() - Waiting for transaction group to open
The reason rfs3_{write|create} waiting longer in txg_wait_open is because there is a syncing txg taking longer to complete. You may want to trace and track the syncing txg to get the reason for the delay. -- Prabahar. On Sun, Nov 23, 2008 at 05:51:44PM -0800, Amer Ather wrote:> IHAC who is seeing very slow NFS transactions over ZFS. rfs3_write(), > rfs3_create() and others are taking in the order of 17-20 seconds to > complete. Profiling these transactions showing most of the time is spent > in txg_wait_open() - waiting for transaction group to open. > > We tried "zfs_nocacheflush: but it did n''t help. iostat showing good > service time (<10ms). > > System is running Solaris 10 - 137137-09 > > 00000301b616ec00 3000813e168 3017b9e3798 2 59 6002296f966 > PC: cv_wait+0x38 CMD: > mv camshot_081118_000029.jpg camshot_081118_000059.jpg > camshot_081118_000129.j > p > stack pointer for thread 301b616ec00: 2a104a9ceb1 > [ 000002a104a9ceb1 cv_wait+0x38() ] > txg_wait_open+0x54() > zfs_write+0x34c() > fop_write+0x20() > write+0x268() > syscall_trap32+0xcc() > > Timing txg_wait_open() shows: > > txg_wait_open delay > DELAY > value ------------- Distribution ------------- count > 4294967296 | 0 > 8589934592 |@ 1 > 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 33 > 34359738368 | 0 > > txg_wait_open delay > DELAY > value ------------- Distribution ------------- count > 134217728 | 0 > 268435456 |@@ 2 > 536870912 | 0 > 1073741824 | 0 > 2147483648 |@ 1 > 4294967296 | 0 > 8589934592 |@@@@@@@ 7 > 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29 > 34359738368 | 0 > > DTrace script: > #!/usr/sbin/dtrace -qs > fbt::txg_wait_open:entry > { > self->t = timestamp; > } > fbt::txg_wait_open:return > /self->t/ > { > @a["DELAY"] = quantize(timestamp - self->t); > } > tick-5sec { > printf("\ntxg_wait_open delay"); > printa(@a); > trunc(@a); > } > > > I have also profiled rfs3_write(), rfs3_create() and other using DTrace > and taken time delta and stack at each frame on all functions called by > rfs3_* and see txg_wait_open() is the one taking majority of the time. > > Thanks, > Amer. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Amer Ather
2008-Nov-24 06:25 UTC
[zfs-discuss] Long delays in txg_wait_open() - Waiting for transaction group to open
In kernel threadlist (guds), I see txg_sync_thread is doing spa_sync()-> dsl_pool_sync() - responsible for writing all dirty datasets. Due to NFS operation, it may be large number of synchronous writes resulting in frequent spa_sync(). Customer noticed that even disabling zfs zil did n''t show any improvement. System is almost unusable considering nfs operations such as: rfs3_write(), rfs3_create(), rfs3_setattr() are taking in the order of 30 seconds to complete. We requested customer to provide a core dump when problem is occurring. If you have additional comments on how to best debug this issue, please let us know. 000002a102203ca0 185b240 0 0 60 3001b1062c8 PC: cv_wait+0x38 THREAD: txg_sync_thread() stack pointer for thread 2a102203ca0: 2a102202e91 [ 000002a102202e91 cv_wait+0x38() ] zio_wait+0x30() dsl_pool_sync+0xb4() spa_sync+0x208() txg_sync_thread+0x1e4() thread_start+4() On 11/23/08 06:29 PM, Prabahar Jeyaram wrote:> The reason rfs3_{write|create} waiting longer in txg_wait_open is because > there is a syncing txg taking longer to complete. > > You may want to trace and track the syncing txg to get the reason for the > delay. > > -- > Prabahar. > > On Sun, Nov 23, 2008 at 05:51:44PM -0800, Amer Ather wrote: >> IHAC who is seeing very slow NFS transactions over ZFS. rfs3_write(), >> rfs3_create() and others are taking in the order of 17-20 seconds to >> complete. Profiling these transactions showing most of the time is spent >> in txg_wait_open() - waiting for transaction group to open. >> >> We tried "zfs_nocacheflush: but it did n''t help. iostat showing good >> service time (<10ms). >> >> System is running Solaris 10 - 137137-09 >> >> 00000301b616ec00 3000813e168 3017b9e3798 2 59 6002296f966 >> PC: cv_wait+0x38 CMD: >> mv camshot_081118_000029.jpg camshot_081118_000059.jpg >> camshot_081118_000129.j >> p >> stack pointer for thread 301b616ec00: 2a104a9ceb1 >> [ 000002a104a9ceb1 cv_wait+0x38() ] >> txg_wait_open+0x54() >> zfs_write+0x34c() >> fop_write+0x20() >> write+0x268() >> syscall_trap32+0xcc() >> >> Timing txg_wait_open() shows: >> >> txg_wait_open delay >> DELAY >> value ------------- Distribution ------------- count >> 4294967296 | 0 >> 8589934592 |@ 1 >> 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 33 >> 34359738368 | 0 >> >> txg_wait_open delay >> DELAY >> value ------------- Distribution ------------- count >> 134217728 | 0 >> 268435456 |@@ 2 >> 536870912 | 0 >> 1073741824 | 0 >> 2147483648 |@ 1 >> 4294967296 | 0 >> 8589934592 |@@@@@@@ 7 >> 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29 >> 34359738368 | 0 >> >> DTrace script: >> #!/usr/sbin/dtrace -qs >> fbt::txg_wait_open:entry >> { >> self->t = timestamp; >> } >> fbt::txg_wait_open:return >> /self->t/ >> { >> @a["DELAY"] = quantize(timestamp - self->t); >> } >> tick-5sec { >> printf("\ntxg_wait_open delay"); >> printa(@a); >> trunc(@a); >> } >> >> >> I have also profiled rfs3_write(), rfs3_create() and other using DTrace >> and taken time delta and stack at each frame on all functions called by >> rfs3_* and see txg_wait_open() is the one taking majority of the time. >> >> Thanks, >> Amer. >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss