Amer Ather
2008-Nov-24 01:51 UTC
[zfs-discuss] Long delays in txg_wait_open() - Waiting for transaction group to open
IHAC who is seeing very slow NFS transactions over ZFS. rfs3_write(),
rfs3_create() and others are taking in the order of 17-20 seconds to
complete. Profiling these transactions showing most of the time is spent
in txg_wait_open() - waiting for transaction group to open.
We tried "zfs_nocacheflush: but it did n''t help. iostat showing
good
service time (<10ms).
System is running Solaris 10 - 137137-09
00000301b616ec00 3000813e168 3017b9e3798 2 59 6002296f966
PC: cv_wait+0x38 CMD:
mv camshot_081118_000029.jpg camshot_081118_000059.jpg
camshot_081118_000129.j
p
stack pointer for thread 301b616ec00: 2a104a9ceb1
[ 000002a104a9ceb1 cv_wait+0x38() ]
txg_wait_open+0x54()
zfs_write+0x34c()
fop_write+0x20()
write+0x268()
syscall_trap32+0xcc()
Timing txg_wait_open() shows:
txg_wait_open delay
DELAY
value ------------- Distribution ------------- count
4294967296 | 0
8589934592 |@ 1
17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 33
34359738368 | 0
txg_wait_open delay
DELAY
value ------------- Distribution ------------- count
134217728 | 0
268435456 |@@ 2
536870912 | 0
1073741824 | 0
2147483648 |@ 1
4294967296 | 0
8589934592 |@@@@@@@ 7
17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29
34359738368 | 0
DTrace script:
#!/usr/sbin/dtrace -qs
fbt::txg_wait_open:entry
{
self->t = timestamp;
}
fbt::txg_wait_open:return
/self->t/
{
@a["DELAY"] = quantize(timestamp - self->t);
}
tick-5sec {
printf("\ntxg_wait_open delay");
printa(@a);
trunc(@a);
}
I have also profiled rfs3_write(), rfs3_create() and other using DTrace
and taken time delta and stack at each frame on all functions called by
rfs3_* and see txg_wait_open() is the one taking majority of the time.
Thanks,
Amer.
Prabahar Jeyaram
2008-Nov-24 02:29 UTC
[zfs-discuss] Long delays in txg_wait_open() - Waiting for transaction group to open
The reason rfs3_{write|create} waiting longer in txg_wait_open is because
there is a syncing txg taking longer to complete.
You may want to trace and track the syncing txg to get the reason for the
delay.
--
Prabahar.
On Sun, Nov 23, 2008 at 05:51:44PM -0800, Amer Ather
wrote:> IHAC who is seeing very slow NFS transactions over ZFS. rfs3_write(),
> rfs3_create() and others are taking in the order of 17-20 seconds to
> complete. Profiling these transactions showing most of the time is spent
> in txg_wait_open() - waiting for transaction group to open.
>
> We tried "zfs_nocacheflush: but it did n''t help. iostat
showing good
> service time (<10ms).
>
> System is running Solaris 10 - 137137-09
>
> 00000301b616ec00 3000813e168 3017b9e3798 2 59 6002296f966
> PC: cv_wait+0x38 CMD:
> mv camshot_081118_000029.jpg camshot_081118_000059.jpg
> camshot_081118_000129.j
> p
> stack pointer for thread 301b616ec00: 2a104a9ceb1
> [ 000002a104a9ceb1 cv_wait+0x38() ]
> txg_wait_open+0x54()
> zfs_write+0x34c()
> fop_write+0x20()
> write+0x268()
> syscall_trap32+0xcc()
>
> Timing txg_wait_open() shows:
>
> txg_wait_open delay
> DELAY
> value ------------- Distribution ------------- count
> 4294967296 | 0
> 8589934592 |@ 1
> 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 33
> 34359738368 | 0
>
> txg_wait_open delay
> DELAY
> value ------------- Distribution ------------- count
> 134217728 | 0
> 268435456 |@@ 2
> 536870912 | 0
> 1073741824 | 0
> 2147483648 |@ 1
> 4294967296 | 0
> 8589934592 |@@@@@@@ 7
> 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29
> 34359738368 | 0
>
> DTrace script:
> #!/usr/sbin/dtrace -qs
> fbt::txg_wait_open:entry
> {
> self->t = timestamp;
> }
> fbt::txg_wait_open:return
> /self->t/
> {
> @a["DELAY"] = quantize(timestamp - self->t);
> }
> tick-5sec {
> printf("\ntxg_wait_open delay");
> printa(@a);
> trunc(@a);
> }
>
>
> I have also profiled rfs3_write(), rfs3_create() and other using DTrace
> and taken time delta and stack at each frame on all functions called by
> rfs3_* and see txg_wait_open() is the one taking majority of the time.
>
> Thanks,
> Amer.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Amer Ather
2008-Nov-24 06:25 UTC
[zfs-discuss] Long delays in txg_wait_open() - Waiting for transaction group to open
In kernel threadlist (guds), I see txg_sync_thread is doing spa_sync()->
dsl_pool_sync() - responsible for writing all dirty datasets. Due to NFS
operation, it may be large number of synchronous writes resulting in
frequent spa_sync(). Customer noticed that even disabling zfs zil did
n''t show any improvement. System is almost unusable considering nfs
operations such as: rfs3_write(), rfs3_create(), rfs3_setattr() are
taking in the order of 30 seconds to complete.
We requested customer to provide a core dump when problem is occurring.
If you have additional comments on how to best debug this issue, please
let us know.
000002a102203ca0 185b240 0 0 60 3001b1062c8
PC: cv_wait+0x38 THREAD: txg_sync_thread()
stack pointer for thread 2a102203ca0: 2a102202e91
[ 000002a102202e91 cv_wait+0x38() ]
zio_wait+0x30()
dsl_pool_sync+0xb4()
spa_sync+0x208()
txg_sync_thread+0x1e4()
thread_start+4()
On 11/23/08 06:29 PM, Prabahar Jeyaram wrote:> The reason rfs3_{write|create} waiting longer in txg_wait_open is because
> there is a syncing txg taking longer to complete.
>
> You may want to trace and track the syncing txg to get the reason for the
> delay.
>
> --
> Prabahar.
>
> On Sun, Nov 23, 2008 at 05:51:44PM -0800, Amer Ather wrote:
>> IHAC who is seeing very slow NFS transactions over ZFS. rfs3_write(),
>> rfs3_create() and others are taking in the order of 17-20 seconds to
>> complete. Profiling these transactions showing most of the time is
spent
>> in txg_wait_open() - waiting for transaction group to open.
>>
>> We tried "zfs_nocacheflush: but it did n''t help. iostat
showing good
>> service time (<10ms).
>>
>> System is running Solaris 10 - 137137-09
>>
>> 00000301b616ec00 3000813e168 3017b9e3798 2 59
6002296f966
>> PC: cv_wait+0x38 CMD:
>> mv camshot_081118_000029.jpg camshot_081118_000059.jpg
>> camshot_081118_000129.j
>> p
>> stack pointer for thread 301b616ec00: 2a104a9ceb1
>> [ 000002a104a9ceb1 cv_wait+0x38() ]
>> txg_wait_open+0x54()
>> zfs_write+0x34c()
>> fop_write+0x20()
>> write+0x268()
>> syscall_trap32+0xcc()
>>
>> Timing txg_wait_open() shows:
>>
>> txg_wait_open delay
>> DELAY
>> value ------------- Distribution ------------- count
>> 4294967296 | 0
>> 8589934592 |@ 1
>> 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 33
>> 34359738368 | 0
>>
>> txg_wait_open delay
>> DELAY
>> value ------------- Distribution ------------- count
>> 134217728 | 0
>> 268435456 |@@ 2
>> 536870912 | 0
>> 1073741824 | 0
>> 2147483648 |@ 1
>> 4294967296 | 0
>> 8589934592 |@@@@@@@ 7
>> 17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 29
>> 34359738368 | 0
>>
>> DTrace script:
>> #!/usr/sbin/dtrace -qs
>> fbt::txg_wait_open:entry
>> {
>> self->t = timestamp;
>> }
>> fbt::txg_wait_open:return
>> /self->t/
>> {
>> @a["DELAY"] = quantize(timestamp - self->t);
>> }
>> tick-5sec {
>> printf("\ntxg_wait_open delay");
>> printa(@a);
>> trunc(@a);
>> }
>>
>>
>> I have also profiled rfs3_write(), rfs3_create() and other using DTrace
>> and taken time delta and stack at each frame on all functions called by
>> rfs3_* and see txg_wait_open() is the one taking majority of the time.
>>
>> Thanks,
>> Amer.
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss