Amer Ather
2008-Nov-24  01:51 UTC
[zfs-discuss] Long delays in txg_wait_open() - Waiting for transaction group to open
IHAC who is seeing very slow NFS transactions over ZFS. rfs3_write(), 
rfs3_create() and others are taking in the order of 17-20 seconds to 
complete. Profiling these transactions showing most of the time is spent 
in txg_wait_open() - waiting for transaction group to open.
We tried "zfs_nocacheflush: but it did n''t help. iostat showing
good
service time (<10ms).
System is running Solaris 10 - 137137-09
00000301b616ec00      3000813e168      3017b9e3798   2  59      6002296f966
   PC: cv_wait+0x38    CMD:
   mv camshot_081118_000029.jpg camshot_081118_000059.jpg 
camshot_081118_000129.j
   p
   stack pointer for thread 301b616ec00: 2a104a9ceb1
   [ 000002a104a9ceb1 cv_wait+0x38() ]
     txg_wait_open+0x54()
     zfs_write+0x34c()
     fop_write+0x20()
     write+0x268()
     syscall_trap32+0xcc()
Timing txg_wait_open() shows:
txg_wait_open delay
   DELAY
            value  ------------- Distribution ------------- count
       4294967296 |                                         0
       8589934592 |@                                        1
      17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  33
      34359738368 |                                         0
txg_wait_open delay
   DELAY
            value  ------------- Distribution ------------- count
        134217728 |                                         0
        268435456 |@@                                       2
        536870912 |                                         0
       1073741824 |                                         0
       2147483648 |@                                        1
       4294967296 |                                         0
       8589934592 |@@@@@@@                                  7
      17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           29
      34359738368 |                                         0
DTrace script:
#!/usr/sbin/dtrace -qs
fbt::txg_wait_open:entry
{
  self->t = timestamp;
}
fbt::txg_wait_open:return
/self->t/
{
   @a["DELAY"] = quantize(timestamp - self->t);
}
tick-5sec {
       printf("\ntxg_wait_open delay");
       printa(@a);
       trunc(@a);
}
I have also profiled rfs3_write(), rfs3_create() and other using DTrace 
and taken time delta and stack at each frame on all functions called by 
rfs3_* and see txg_wait_open() is the one taking majority of the time.
Thanks,
Amer.
Prabahar Jeyaram
2008-Nov-24  02:29 UTC
[zfs-discuss] Long delays in txg_wait_open() - Waiting for transaction group to open
The reason rfs3_{write|create} waiting longer in txg_wait_open is because
there is a syncing txg taking longer to complete.
You may want to trace and track the syncing txg to get the reason for the
delay.
--
Prabahar.
On Sun, Nov 23, 2008 at 05:51:44PM -0800, Amer Ather
wrote:> IHAC who is seeing very slow NFS transactions over ZFS. rfs3_write(), 
> rfs3_create() and others are taking in the order of 17-20 seconds to 
> complete. Profiling these transactions showing most of the time is spent 
> in txg_wait_open() - waiting for transaction group to open.
> 
> We tried "zfs_nocacheflush: but it did n''t help. iostat
showing good
> service time (<10ms).
> 
> System is running Solaris 10 - 137137-09
> 
> 00000301b616ec00      3000813e168      3017b9e3798   2  59      6002296f966
>    PC: cv_wait+0x38    CMD:
>    mv camshot_081118_000029.jpg camshot_081118_000059.jpg 
> camshot_081118_000129.j
>    p
>    stack pointer for thread 301b616ec00: 2a104a9ceb1
>    [ 000002a104a9ceb1 cv_wait+0x38() ]
>      txg_wait_open+0x54()
>      zfs_write+0x34c()
>      fop_write+0x20()
>      write+0x268()
>      syscall_trap32+0xcc()
> 
> Timing txg_wait_open() shows:
> 
> txg_wait_open delay
>    DELAY
>             value  ------------- Distribution ------------- count
>        4294967296 |                                         0
>        8589934592 |@                                        1
>       17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  33
>       34359738368 |                                         0
> 
> txg_wait_open delay
>    DELAY
>             value  ------------- Distribution ------------- count
>         134217728 |                                         0
>         268435456 |@@                                       2
>         536870912 |                                         0
>        1073741824 |                                         0
>        2147483648 |@                                        1
>        4294967296 |                                         0
>        8589934592 |@@@@@@@                                  7
>       17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           29
>       34359738368 |                                         0
> 
> DTrace script:
> #!/usr/sbin/dtrace -qs
> fbt::txg_wait_open:entry
> {
>   self->t = timestamp;
> }
> fbt::txg_wait_open:return
> /self->t/
> {
>    @a["DELAY"] = quantize(timestamp - self->t);
> }
> tick-5sec {
>        printf("\ntxg_wait_open delay");
>        printa(@a);
>        trunc(@a);
> }
> 
> 
> I have also profiled rfs3_write(), rfs3_create() and other using DTrace 
> and taken time delta and stack at each frame on all functions called by 
> rfs3_* and see txg_wait_open() is the one taking majority of the time.
> 
> Thanks,
> Amer.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Amer Ather
2008-Nov-24  06:25 UTC
[zfs-discuss] Long delays in txg_wait_open() - Waiting for transaction group to open
In kernel threadlist (guds), I see txg_sync_thread is doing spa_sync()-> 
dsl_pool_sync() - responsible for writing all dirty datasets. Due to NFS 
operation, it may be large number of synchronous writes resulting in 
frequent spa_sync(). Customer noticed that even disabling zfs zil did 
n''t show any improvement. System is almost unusable considering nfs 
operations such as: rfs3_write(), rfs3_create(), rfs3_setattr() are 
taking in the order of 30 seconds to complete.
We requested customer to provide a core dump when problem is occurring. 
If you have additional comments on how to best debug this issue, please 
let us know.
000002a102203ca0          185b240                0   0  60      3001b1062c8
   PC: cv_wait+0x38    THREAD: txg_sync_thread()
   stack pointer for thread 2a102203ca0: 2a102202e91
   [ 000002a102202e91 cv_wait+0x38() ]
     zio_wait+0x30()
     dsl_pool_sync+0xb4()
     spa_sync+0x208()
     txg_sync_thread+0x1e4()
     thread_start+4()
On 11/23/08 06:29 PM, Prabahar Jeyaram wrote:> The reason rfs3_{write|create} waiting longer in txg_wait_open is because
> there is a syncing txg taking longer to complete.
> 
> You may want to trace and track the syncing txg to get the reason for the
> delay.
> 
> --
> Prabahar.
> 
> On Sun, Nov 23, 2008 at 05:51:44PM -0800, Amer Ather wrote:
>> IHAC who is seeing very slow NFS transactions over ZFS. rfs3_write(), 
>> rfs3_create() and others are taking in the order of 17-20 seconds to 
>> complete. Profiling these transactions showing most of the time is
spent
>> in txg_wait_open() - waiting for transaction group to open.
>>
>> We tried "zfs_nocacheflush: but it did n''t help. iostat
showing good
>> service time (<10ms).
>>
>> System is running Solaris 10 - 137137-09
>>
>> 00000301b616ec00      3000813e168      3017b9e3798   2  59     
6002296f966
>>    PC: cv_wait+0x38    CMD:
>>    mv camshot_081118_000029.jpg camshot_081118_000059.jpg 
>> camshot_081118_000129.j
>>    p
>>    stack pointer for thread 301b616ec00: 2a104a9ceb1
>>    [ 000002a104a9ceb1 cv_wait+0x38() ]
>>      txg_wait_open+0x54()
>>      zfs_write+0x34c()
>>      fop_write+0x20()
>>      write+0x268()
>>      syscall_trap32+0xcc()
>>
>> Timing txg_wait_open() shows:
>>
>> txg_wait_open delay
>>    DELAY
>>             value  ------------- Distribution ------------- count
>>        4294967296 |                                         0
>>        8589934592 |@                                        1
>>       17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  33
>>       34359738368 |                                         0
>>
>> txg_wait_open delay
>>    DELAY
>>             value  ------------- Distribution ------------- count
>>         134217728 |                                         0
>>         268435456 |@@                                       2
>>         536870912 |                                         0
>>        1073741824 |                                         0
>>        2147483648 |@                                        1
>>        4294967296 |                                         0
>>        8589934592 |@@@@@@@                                  7
>>       17179869184 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           29
>>       34359738368 |                                         0
>>
>> DTrace script:
>> #!/usr/sbin/dtrace -qs
>> fbt::txg_wait_open:entry
>> {
>>   self->t = timestamp;
>> }
>> fbt::txg_wait_open:return
>> /self->t/
>> {
>>    @a["DELAY"] = quantize(timestamp - self->t);
>> }
>> tick-5sec {
>>        printf("\ntxg_wait_open delay");
>>        printa(@a);
>>        trunc(@a);
>> }
>>
>>
>> I have also profiled rfs3_write(), rfs3_create() and other using DTrace
>> and taken time delta and stack at each frame on all functions called by
>> rfs3_* and see txg_wait_open() is the one taking majority of the time.
>>
>> Thanks,
>> Amer.
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss