thr3ads.net - zfs discuss - [zfs-discuss] NFS/ZFS performance problems

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2007-Feb-12 11:45 UTC

[zfs-discuss] NFS/ZFS performance problems - txg_wait_open() deadlocks?

Hi.

   System is snv_56 sun4u sparc SUNW,Sun-Fire-V440, zil_disable=1

We see many operation on nfs clients to that server really slow (like 90 seconds
for unlink()).
It''s not a problem with network, there''s also plenty oc CPU
available.
Storage isn''t saturated either.

First strange thing - normally on that server nfsd has about 1500-2500 number of
threads.
I did ''svcadm disable nfs/server'' while watching nfsd with
prstat. It quickly exited many threads but 651 of them were still there. Then it
took almost 3 minutes to completely exit nfsd - during all those 3 minutes all
651 threads were there and then suddenly in seconds it exited all threads. While
it was hanging I did:
> ::ps!grep nfsd
> 000006000a729068::walk thread|::findstack -v
with output logged to a file. I attached output as nfsd.txt.gz

You can see that one thread was in:

stack pointer for thread 30003621660: 2a1038c7021
[ 000002a1038c7021 cv_wait+0x3c() ]
000002a1038c70d1 exitlwps+0x10c(0, 100000, 42100002, 6000a729068, 6000a72912e,
42000002)
000002a1038c7181 proc_exit+0x1c(1, 0, ff000000, 1, 0, e965)
000002a1038c7231 exit+8(1, 0, ffbff9d0, 1, ff164770, 16419)
000002a1038c72e1 syscall_trap32+0xcc(0, 0, ffbff9d0, 1, ff164770, 16419)
stack pointer for thread 3000351b620: 2a101f864b1 

Which is ok. However all the other threads (650 of them) were in cv_wait:

bash-3.00# cat nfsd.txt | grep -c "nfssys+"
650
bash-3.00# cat nfsd.txt | grep -c "cv_wait+"
651
bash-3.00#

bash-3.00# cat nfsd.txt | grep -c "txg_wait_open+"
360

bash-3.00# cat nfsd.txt | grep "txg_wait_open+" | grep -c
"337d1e"
360

So all those txg_wait_open are for the same transaction group -
txg_wait_open+0x58(6000ad0c910, 337d1e,

All the other threads were in:

bash-3.00# cat nfsd.txt | grep -c "zfs_range_lock_reader+"
289

Several different dnodes (however several threads to the same dnode).
Also using ''zpool iostat 1'' while nfsd was shuting down I
could see mostly reads and some writes every 40-60s.

Why nfsd was closing for almost 3 minutes? Because we were sitting in
txg_wait_open() for so long - why?


Now, during "normal" operation when the server is serving data via
nfsd ''zpool iostat 1'' looks like:

bash-3.00# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
f3-2 466G 758G 562 148 31.9M 1.68M
f3-2 466G 758G 1.38K 0 82.1M 0
f3-2 466G 758G 1.27K 0 77.4M 0
f3-2 466G 758G 1.26K 0 76.7M 0
f3-2 466G 758G 1.23K 0 68.7M 0
f3-2 466G 758G 1.06K 0 52.2M 0
f3-2 466G 758G 931 0 31.2M 0
f3-2 466G 758G 1013 0 30.6M 0
f3-2 466G 758G 938 0 32.7M 0
f3-2 466G 758G 942 0 16.5M 0
f3-2 466G 758G 1.13K 0 41.7M 0
f3-2 466G 758G 818 1005 3.86M 10.7M
f3-2 466G 758G 1.02K 26 35.6M 489K
f3-2 466G 758G 1.06K 0 41.9M 0
f3-2 466G 758G 555 0 13.7M 0
f3-2 466G 758G 943 0 44.3M 0
f3-2 466G 758G 1K 0 26.3M 0
f3-2 466G 758G 928 0 16.1M 0
f3-2 466G 758G 854 0 31.3M 0
f3-2 466G 758G 903 0 29.7M 0
f3-2 466G 758G 917 0 26.2M 0
f3-2 466G 758G 1.01K 0 33.4M 0
f3-2 466G 758G 918 0 20.2M 0
f3-2 466G 758G 1.07K 0 41.6M 0
f3-2 466G 758G 967 0 12.6M 0
f3-2 466G 758G 973 0 33.8M 0
f3-2 466G 758G 871 0 13.6M 0
f3-2 466G 758G 1.01K 0 28.7M 0
f3-2 466G 758G 894 0 21.3M 0
f3-2 466G 758G 902 0 25.3M 0
f3-2 466G 758G 1.11K 0 33.3M 0
f3-2 466G 758G 935 0 27.4M 0
f3-2 466G 758G 930 0 28.5M 0
f3-2 466G 758G 774 0 12.9M 0
f3-2 466G 758G 900 0 31.7M 0
f3-2 466G 758G 902 0 23.0M 0
f3-2 466G 758G 867 0 20.3M 0
f3-2 466G 758G 999 0 39.1M 0
f3-2 466G 758G 774 0 12.9M 0
^C 

Why I don''t see writes every 5s? There''s definitely something
to write all the time.
I''ve migrated (zfs send|recv) one of a file systems in f3-2 to another
server (x4500) and there''re no problems so far and I can see regular
writes every 5s. Maybe there''s a problem with multiple file systems in
a pool or maybe data became so fragmented that is causing the problem?


bash-3.00# dtrace -q -n BEGIN''{gt=timestamp;}'' -n
fbt::txg_wait_open:entry''/execname=="nfsd"/{txg_c++;self->t1=timestamp;self->arg1=arg1;}''
-n
fbt::txg_wait_open:return''/self->t1/{txg_c--;@[self->arg1]=quantize((timestamp-self->t1)/1000000000);self->t1=0;self->arg1=0;}''
-n tick-10s''{printa(@);printf("txg_count: %d\ntotal time in
seconds: %d\n",txg_c,(timestamp-gt)/1000000000);}''

txg_count: 0
total time in seconds: 9

txg_count: 0
total time in seconds: 19

txg_count: 0
total time in seconds: 29

txg_count: 0
total time in seconds: 39

txg_count: 0
total time in seconds: 49

txg_count: 487
total time in seconds: 59

txg_count: 568
total time in seconds: 69

txg_count: 595
total time in seconds: 79

txg_count: 821
total time in seconds: 89

          3374426
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@@@@@@@@@@                              226      
               8 |                                         0        
              16 |@                                        27       
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             568      
              64 |                                         0        

txg_count: 742
total time in seconds: 99

          3374427
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 744      
              16 |                                         0        

          3374426
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@@@@@@@@@@                              226      
               8 |                                         0        
              16 |@                                        27       
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             568      
              64 |                                         0        

txg_count: 843
total time in seconds: 109

          3374427
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 744      
              16 |                                         0        

          3374426
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@@@@@@@@@@                              226      
               8 |                                         0        
              16 |@                                        27       
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             568      
              64 |                                         0        

txg_count: 890
total time in seconds: 119

          3374427
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 744      
              16 |                                         0        

          3374426
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@@@@@@@@@@                              226      
               8 |                                         0        
              16 |@                                        27       
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             568      
              64 |                                         0        

txg_count: 890
total time in seconds: 129

          3374427
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 744      
              16 |                                         0        

          3374426
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@@@@@@@@@@                              226      
               8 |                                         0        
              16 |@                                        27       
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             568      
              64 |                                         0        

txg_count: 890
total time in seconds: 139

          3374427
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 744      
              16 |                                         0        

          3374426
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@@@@@@@@@@                              226      
               8 |                                         0        
              16 |@                                        27       
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             568      
              64 |                                         0        

txg_count: 890
total time in seconds: 149

          3374427
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 744      
              16 |                                         0        

          3374426
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@@@@@@@@@@                              226      
               8 |                                         0        
              16 |@                                        27       
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             568      
              64 |                                         0        

txg_count: 890
total time in seconds: 159

          3374427
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 744      
              16 |                                         0        

          3374426
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@@@@@@@@@@                              226      
               8 |                                         0        
              16 |@                                        27       
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             568      
              64 |                                         0        

          3374428
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 895      
              64 |                                         0        

txg_count: 804
total time in seconds: 169

          3374427
           value  ------------- Distribution ------------- count    
               4 |                                         0        
               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 744      
              16 |                                         0        

          3374426
           value  ------------- Distribution ------------- count    
               2 |                                         0        
               4 |@@@@@@@@@@@                              226      
               8 |                                         0        
              16 |@                                        27       
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             568      
              64 |                                         0        

          3374428
           value  ------------- Distribution ------------- count    
              16 |                                         0        
              32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 895      
              64 |                                         0        

txg_count: 804
total time in seconds: 179
^C


We''re stuck in txg_wait_open() even for over 64 seconds! with hundreds
of threads waiting for the same transaction. Why is it? See also in pstack for
nfsd I attached that no thread is in zio_wait() so we''re not waiting
for disks.


bash-3.00# cat zfs_check.d 
#!/usr/sbin/dtrace -s 

fbt::zfs_range_lock:entry
/execname=="nfsd"/
{
  self->t=timestamp;
} 
fbt::zfs_range_lock:return
/self->t/
{ 
  @range=quantize((timestamp-self->t)/1000000); 
} 


fbt::txg_wait_open:entry
/execname=="nfsd"/
{
  self->t2=timestamp;
} 
fbt::txg_wait_open:return
/self->t2/
{ 
  @txg = quantize((timestamp-self->t2)/1000000); 
}


fbt::zio_wait:entry
/execname=="nfsd"/
{
  self->t3=timestamp;
}
fbt::zio_wait:return
/self->t3/
{
  @zio = quantize((timestamp-self->t3)/1000000);
}

tick-30s
{
  printf("\nrange_lock +++++++++++++++++++");
  printa(@range);
  printf("\ntxg wait   +++++++++++++++++++");
  printa(@txg);
  printf("\nzio wait   +++++++++++++++++++");
  printa(@zio);
}


bash-3.00# 

bash-3.00# ./zfs_check.d
[...]
  3  51875                        :tick-30s 
range_lock +++++++++++++++++++

           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3974     
               1 |                                         2        
               2 |                                         4        
               4 |                                         2        
               8 |                                         1        
              16 |                                         1        
              32 |                                         1        
              64 |                                         5        
             128 |                                         2        
             256 |                                         0        
             512 |                                         0        
            1024 |                                         0        
            2048 |                                         0        
            4096 |                                         2        
            8192 |                                         1        
           16384 |                                         2        
           32768 |                                         5        
           65536 |                                         9        
          131072 |                                         0        


txg wait   +++++++++++++++++++

           value  ------------- Distribution ------------- count    
           32768 |                                         0        
           65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 808      
          131072 |                                         0        


zio wait   +++++++++++++++++++

           value  ------------- Distribution ------------- count    
              -1 |                                         0        
               0 |@@@@@@@@@@@@@@@@@@@@@@@@                 24494    
               1 |                                         88       
               2 |                                         112      
               4 |@                                        1431     
               8 |@@@@@@                                   5805     
              16 |@@                                       2211     
              32 |@@                                       2419     
              64 |@@@                                      2692     
             128 |@                                        1357     
             256 |@                                        952      
             512 |                                         7        
            1024 |                                         0        


^C



So, again - 64 seconds in txg_wait_open() but IOs were serviced quite quickly.

Trying to dd to a file system locally also shows the same performance problem.

We''re experiencing big performance problems due this - any help would
be appreciated.


ps. txg_wait_open() of course doesn''t deadlocks but looks suspicious.
 
 
This message posted from opensolaris.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nfsd.txt.gz
Type: application/x-gzip
Size: 92507 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070212/cf1dc5d6/attachment.bin>

Robert Milkowski

2007-Feb-12 13:16 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

bash-3.00# dtrace -n fbt::txg_quiesce:return''{printf("%Y
",walltimestamp);}''
dtrace: description ''fbt::txg_quiesce:return'' matched 1 probe
CPU     ID                    FUNCTION:NAME
  3  38168               txg_quiesce:return 2007 Feb 12 14:08:15 
  0  38168               txg_quiesce:return 2007 Feb 12 14:12:14 
  3  38168               txg_quiesce:return 2007 Feb 12 14:15:05 
^C



Why I do not see it exactly every 5s?
On other server I get output exactly every 5s.
 
 
This message posted from opensolaris.org

Roch - PAE

2007-Feb-12 14:19 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Robert Milkowski writes:
 > bash-3.00# dtrace -n fbt::txg_quiesce:return''{printf("%Y
",walltimestamp);}''
 > dtrace: description ''fbt::txg_quiesce:return'' matched 1
probe
 > CPU     ID                    FUNCTION:NAME
 >   3  38168               txg_quiesce:return 2007 Feb 12 14:08:15 
 >   0  38168               txg_quiesce:return 2007 Feb 12 14:12:14 
 >   3  38168               txg_quiesce:return 2007 Feb 12 14:15:05 
 > ^C
 > 
 > 
 > 
 > Why I do not see it exactly every 5s?
 > On other server I get output exactly every 5s.
 >  
 >  

I am not sure about this specific funtion but if the
question is the same as why is the pool synching more often
than 5sec, then that can be because of low memory condition
(if we have too much dirty memory to sync we don''t wait the
5 seconds.). See arc_tempreserve_space around (ERESTART).

-r




 > This message posted from opensolaris.org
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Robert Milkowski

2007-Feb-12 14:37 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Hello Roch,

Monday, February 12, 2007, 3:19:23 PM, you wrote:

RP> Robert Milkowski writes:
 >> bash-3.00# dtrace -n fbt::txg_quiesce:return''{printf("%Y
",walltimestamp);}''
 >> dtrace: description ''fbt::txg_quiesce:return''
matched 1 probe
 >> CPU     ID                    FUNCTION:NAME
 >>   3  38168               txg_quiesce:return 2007 Feb 12 14:08:15 
 >>   0  38168               txg_quiesce:return 2007 Feb 12 14:12:14 
 >>   3  38168               txg_quiesce:return 2007 Feb 12 14:15:05 
 >> ^C
 >> 
 >> 
 >> 
 >> Why I do not see it exactly every 5s?
 >> On other server I get output exactly every 5s.
 >>  
 >>  

RP> I am not sure about this specific funtion but if the
RP> question is the same as why is the pool synching more often
RP> than 5sec, then that can be because of low memory condition
RP> (if we have too much dirty memory to sync we don''t wait the
RP> 5 seconds.). See arc_tempreserve_space around (ERESTART).

The opposite - why it''s not syncing every 5s and rather every
few minutes on that server.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Roch - PAE

2007-Feb-12 14:54 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Duh!.

Long sync (which delays the next  sync) are also possible on
a write intensive workloads. Throttling heavy writters, I
think, is the key to fixing this.

Robert Milkowski writes:
 > Hello Roch,
 > 
 > Monday, February 12, 2007, 3:19:23 PM, you wrote:
 > 
 > RP> Robert Milkowski writes:
 >  >> bash-3.00# dtrace -n
fbt::txg_quiesce:return''{printf("%Y
",walltimestamp);}''
 >  >> dtrace: description ''fbt::txg_quiesce:return''
matched 1 probe
 >  >> CPU     ID                    FUNCTION:NAME
 >  >>   3  38168               txg_quiesce:return 2007 Feb 12 14:08:15
 >  >>   0  38168               txg_quiesce:return 2007 Feb 12 14:12:14
 >  >>   3  38168               txg_quiesce:return 2007 Feb 12 14:15:05
 >  >> ^C
 >  >> 
 >  >> 
 >  >> 
 >  >> Why I do not see it exactly every 5s?
 >  >> On other server I get output exactly every 5s.
 >  >>  
 >  >>  
 > 
 > RP> I am not sure about this specific funtion but if the
 > RP> question is the same as why is the pool synching more often
 > RP> than 5sec, then that can be because of low memory condition
 > RP> (if we have too much dirty memory to sync we don''t wait
the
 > RP> 5 seconds.). See arc_tempreserve_space around (ERESTART).
 > 
 > The opposite - why it''s not syncing every 5s and rather every
 > few minutes on that server.
 > 
 > 
 > -- 
 > Best regards,
 >  Robert                            mailto:rmilkowski at task.gda.pl
 >                                        http://milek.blogspot.com
 > 
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Robert Milkowski

2007-Feb-12 15:52 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Hello Roch,

Monday, February 12, 2007, 3:54:30 PM, you wrote:

RP> Duh!.

RP> Long sync (which delays the next  sync) are also possible on
RP> a write intensive workloads. Throttling heavy writters, I
RP> think, is the key to fixing this.

Well, then maybe it''s not the cause to our problems.
Nevertheless 60-90s for unlink() is just plain wrong especially when
you''ve got <10ms IOs to array, almost zero writes, plenty of CPU
free,
etc.

Definitely something is wrong here.


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

eric kustarz

2007-Feb-12 18:08 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

On Feb 12, 2007, at 7:52 AM, Robert Milkowski wrote:
> Hello Roch,
>
> Monday, February 12, 2007, 3:54:30 PM, you wrote:
>
> RP> Duh!.
>
> RP> Long sync (which delays the next  sync) are also possible on
> RP> a write intensive workloads. Throttling heavy writters, I
> RP> think, is the key to fixing this.
>
> Well, then maybe it''s not the cause to our problems.
> Nevertheless 60-90s for unlink() is just plain wrong especially when
> you''ve got <10ms IOs to array, almost zero writes, plenty of
CPU free,
> etc.
>
> Definitely something is wrong here.
Looks like spa_sync() via the txg_sync_thread thread is taking way  
too long, which is causing new (NFS) requests to be delayed (such as  
unlink).

Is this just a NFS server, or is there local activity as well?

A complete threadlist would be interesting, as would memory usage.

Have you increased the load on this machine?  I have seen a similar  
situation (new requests being blocked waiting for the sync thread to  
finish), but that''s only been when either 1) the hardware is broken  
and taking too long or 2) the server is way overloaded.

eric

Robert Milkowski

2007-Feb-13 14:21 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Hello eric,

Monday, February 12, 2007, 7:08:20 PM, you wrote:

ek> On Feb 12, 2007, at 7:52 AM, Robert Milkowski wrote:
>> Hello Roch,
>>
>> Monday, February 12, 2007, 3:54:30 PM, you wrote:
>>
>> RP> Duh!.
>>
>> RP> Long sync (which delays the next  sync) are also possible on
>> RP> a write intensive workloads. Throttling heavy writters, I
>> RP> think, is the key to fixing this.
>>
>> Well, then maybe it''s not the cause to our problems.
>> Nevertheless 60-90s for unlink() is just plain wrong especially when
>> you''ve got <10ms IOs to array, almost zero writes, plenty
of CPU free,
>> etc.
>>
>> Definitely something is wrong here.
ek> Looks like spa_sync() via the txg_sync_thread thread is taking way  
ek> too long, which is causing new (NFS) requests to be delayed (such as  
ek> unlink).

ek> Is this just a NFS server, or is there local activity as well?

No local activity, no other applications - just NFS server.

ek> A complete threadlist would be interesting, as would memory usage.

System is not paging at all and free memory reported by vmstat stays
at about 1GB.

All processes in a sysytem:

bash-3.00# ps -ef
     UID   PID  PPID   C    STIME TTY         TIME CMD
    root     0     0   0   Feb 09 ?           0:44 sched
    root     1     0   0   Feb 09 ?           0:12 /sbin/init -v
    root     2     0   0   Feb 09 ?           0:00 pageout
    root     3     0   0   Feb 09 ?          42:09 fsflush
    root   245     1   0   Feb 09 ?           0:00 /usr/sbin/mdmonitord
    root     7     1   0   Feb 09 ?           0:18 /lib/svc/bin/svc.startd
    root     9     1   0   Feb 09 ?           0:49 /lib/svc/bin/svc.configd
  daemon   296     1   0   Feb 09 ?          14:20 /usr/lib/nfs/lockd
  daemon   283     1   0   Feb 09 ?           0:00 /usr/lib/nfs/statd
    root   403     1   0   Feb 09 ?           0:00 /usr/sbin/nsrexecd
    root   286     7   0   Feb 09 ?           0:00 /usr/lib/saf/sac -t 300
  daemon   282     1   0   Feb 09 ?           0:02 /usr/lib/nfs/nfsmapid
    root   159     1   0   Feb 09 ?           0:46 /usr/lib/picl/picld
    root   161     1   0   Feb 09 ?           0:30 /usr/sbin/nscd
    root 24699   380   0 16:41:19 ?           0:00 /usr/lib/ssh/sshd
  daemon   278     1   0   Feb 09 ?           0:17 /usr/sbin/rpcbind
  daemon   173     1   0   Feb 09 ?           0:01 /usr/lib/crypto/kcfd
    root   156     1   0   Feb 09 ?           0:00 /usr/lib/sysevent/syseventd
    root   295     1   0   Feb 09 ?           0:01 /usr/lib/utmpd
    root   233     1   0   Feb 09 ?           0:42 /usr/lib/inet/xntpd
    root   274     1   0   Feb 09 ?           0:03 /usr/sbin/cron
    root   328     1   0   Feb 09 ?           0:01 /usr/sbin/syslogd
    root   404   403   0   Feb 09 ?           0:01 /usr/sbin/nsrexecd
    root   187     1   0   Feb 09 ?           3:14 /usr/lib/inet/in.mpathd -a
    root   179     1   0   Feb 09 ?           0:00 devfsadmd
    root   298     1   0   Feb 09 ?           0:07 /usr/lib/inet/inetd start
    root   299     7   0   Feb 09 console     0:00 /usr/lib/saf/ttymon -g -d
/dev/console -l console -T sun -m ldterm,ttcompat -h
    root   301   286   0   Feb 09 ?           0:00 /usr/lib/saf/ttymon
    root   378     1   0   Feb 09 ?           0:07 /usr/lib/fm/fmd/fmd
    root   380     1   0   Feb 09 ?           0:00 /usr/lib/ssh/sshd
    root   588     1   0   Feb 09 ?          23:46 /usr/jdk/jdk1.5.0_09/bin/java
-Xms4M -Xmx64M -classpath /usr/share/lib/jdmk/jdm
    root   801     1   0   Feb 09 ?           0:01 /usr/lib/nfs/mountd
    root 24702 24699   0 16:41:20 ?           0:00 /usr/lib/ssh/sshd
    root 19793     1   0   Feb 10 ?           4:36 /usr/sfw/sbin/snmpd
    root  8598   380   0 14:49:02 ?           0:00 /usr/lib/ssh/sshd
    root 22686 22680   0   Feb 12 pts/1       0:00 bash
    root 22678 22671   0   Feb 12 ?           0:02 /usr/lib/ssh/sshd
    root 24704 24702   0 16:41:20 pts/2       0:00 -bash
    root  8638  8636   0 14:49:05 pts/3       0:00 -sh
    root  8647  8645   0 14:49:11 pts/3       0:00 ps -ef
    root  8645  8638   0 14:49:06 pts/3       0:00 bash
    root 22680 22678   0   Feb 12 pts/1       0:00 -sh
    root  8636  8598   0 14:49:05 ?           0:00 /usr/lib/ssh/sshd
    root 22671   380   0   Feb 12 ?           0:00 /usr/lib/ssh/sshd
  daemon 28297     1   0 22:17:32 ?         173:25 /usr/lib/nfs/nfsd
bash-3.00#

ek> Have you increased the load on this machine?  I have seen a similar  
ek> situation (new requests being blocked waiting for the sync thread to  
ek> finish), but that''s only been when either 1) the hardware is
broken
ek> and taking too long or 2) the server is way overloaded.

I don''t think HW is broken - the same situation on two v440 servers
and T2000 server. iostat doesn''t show any problems accessing disks (no
queues, short service times, etc.).

We moved workload from v440 with 8GB of RAM to T2000 with 32GB of RAM
and for many hours it was working just great. We did try to stop nfsd
on T2000 and it exited within a second or two - looked almost like it
was working great. But then next day (today) we''ve started
experiencing the same problems - long IOs (dozen of seconds) so we
decided to spot nfsd - this time it took over 20 minutes for all
threads to complete. Then we did ''zpool export f3-2'' and it
took 59
minutes to complete!! See other thread here I''ve just started
("[zfs-discuss] zpool export consumes whole CPU and takes more than 30
minutes to complete").

Looks like for some reason ZFS isn''t able to complete all writes to
disks. More memory just delayed the problem and zil_disable set to 1
mitigates the problem for some time until zfs has filled up all memory
and has to wait for data being written to disk and then nfs operations
starts to take 30-90s, sometimes even much more. Then you''ve got
problem with stopping nfsd, or exporting a pool (different thing is
why during export entire one cpu is consumed by zfs which is the
limiting factor).

The same on S10U3 and snv_54.

Something is really broken here.
Maybe the hardware...

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

eric kustarz

2007-Feb-14 08:50 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

>
> ek> Have you increased the load on this machine?  I have seen a  
> similar
> ek> situation (new requests being blocked waiting for the sync  
> thread to
> ek> finish), but that''s only been when either 1) the hardware
is
> broken
> ek> and taking too long or 2) the server is way overloaded.
>
> I don''t think HW is broken - the same situation on two v440
servers
> and T2000 server. iostat doesn''t show any problems accessing disks
(no
> queues, short service times, etc.).
>
> We moved workload from v440 with 8GB of RAM to T2000 with 32GB of RAM
> and for many hours it was working just great. We did try to stop nfsd
> on T2000 and it exited within a second or two - looked almost like it
> was working great. But then next day (today) we''ve started
> experiencing the same problems - long IOs (dozen of seconds) so we
> decided to spot nfsd - this time it took over 20 minutes for all
> threads to complete. Then we did ''zpool export f3-2'' and
it took 59
> minutes to complete!! See other thread here I''ve just started
> ("[zfs-discuss] zpool export consumes whole CPU and takes more than 30
> minutes to complete").
>
> Looks like for some reason ZFS isn''t able to complete all writes
to
> disks. More memory just delayed the problem and zil_disable set to 1
> mitigates the problem for some time until zfs has filled up all memory
> and has to wait for data being written to disk and then nfs operations
> starts to take 30-90s, sometimes even much more. Then you''ve got
> problem with stopping nfsd, or exporting a pool (different thing is
> why during export entire one cpu is consumed by zfs which is the
> limiting factor).
>
> The same on S10U3 and snv_54.
>
So dtracing metaslab_ff_alloc() would be a good way to know if you''re  
hitting:
6495013 Loops and recursion in metaslab_ff_alloc can kill  
performance, even on a pool with lots of free data

A dscript mentioned in the bug report (i believe the non-public part)  
is:
"
#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
         self->in_metaslab = 0;
}
fbt::metaslab_ff_alloc:entry
/self->in_metaslab == 0/
{
         self->in_metaslab = 1;
         self->loopcount = 0;
}
fbt::avl_walk:entry
/self->in_metaslab/
{
         self->loopcount++;
}
fbt::metaslab_ff_alloc:return
/self->in_metaslab/
{
         self->in_metaslab = 0;
         @loops["Loops count"] = quantize(self->loopcount);
         self->loopcount = 0;
}
"

Now, note, this dscript isn''t perfect as it doesn''t take into
recursion, but feel free to tweak it if/as you like.  If you''re  
seeing lots of avl_walk() calls per metaslab_ff_alloc() call then its  
the above bug.

eric

Robert Milkowski

2007-Feb-14 09:57 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Hello eric,

Wednesday, February 14, 2007, 9:50:29 AM, you wrote:
>>
>> ek> Have you increased the load on this machine?  I have seen a  
>> similar
>> ek> situation (new requests being blocked waiting for the sync  
>> thread to
>> ek> finish), but that''s only been when either 1) the
hardware is
>> broken
>> ek> and taking too long or 2) the server is way overloaded.
>>
>> I don''t think HW is broken - the same situation on two v440
servers
>> and T2000 server. iostat doesn''t show any problems accessing
disks (no
>> queues, short service times, etc.).
>>
>> We moved workload from v440 with 8GB of RAM to T2000 with 32GB of RAM
>> and for many hours it was working just great. We did try to stop nfsd
>> on T2000 and it exited within a second or two - looked almost like it
>> was working great. But then next day (today) we''ve started
>> experiencing the same problems - long IOs (dozen of seconds) so we
>> decided to spot nfsd - this time it took over 20 minutes for all
>> threads to complete. Then we did ''zpool export f3-2''
and it took 59
>> minutes to complete!! See other thread here I''ve just started
>> ("[zfs-discuss] zpool export consumes whole CPU and takes more
than 30
>> minutes to complete").
>>
>> Looks like for some reason ZFS isn''t able to complete all
writes to
>> disks. More memory just delayed the problem and zil_disable set to 1
>> mitigates the problem for some time until zfs has filled up all memory
>> and has to wait for data being written to disk and then nfs operations
>> starts to take 30-90s, sometimes even much more. Then you''ve
got
>> problem with stopping nfsd, or exporting a pool (different thing is
>> why during export entire one cpu is consumed by zfs which is the
>> limiting factor).
>>
>> The same on S10U3 and snv_54.
>>
ek> So dtracing metaslab_ff_alloc() would be a good way to know if
you''re
ek> hitting:
ek> 6495013 Loops and recursion in metaslab_ff_alloc can kill  
ek> performance, even on a pool with lots of free data

ek> A dscript mentioned in the bug report (i believe the non-public part)
ek> is:
ek> "
ek> #!/usr/sbin/dtrace -s

ek> #pragma D option quiet

ek> BEGIN
ek> {
ek>          self->in_metaslab = 0;
ek> }
ek> fbt::metaslab_ff_alloc:entry
/self->>in_metaslab == 0/
ek> {
ek>          self->in_metaslab = 1;
ek>          self->loopcount = 0;
ek> }
ek> fbt::avl_walk:entry
/self->>in_metaslab/
ek> {
ek>          self->loopcount++;
ek> }
ek> fbt::metaslab_ff_alloc:return
/self->>in_metaslab/
ek> {
ek>          self->in_metaslab = 0;
ek>          @loops["Loops count"] = quantize(self->loopcount);
ek>          self->loopcount = 0;
ek> }
ek> "

ek> Now, note, this dscript isn''t perfect as it doesn''t
take into
ek> recursion, but feel free to tweak it if/as you like.  If you''re
ek> seeing lots of avl_walk() calls per metaslab_ff_alloc() call then its
ek> the above bug.

I''ve been using it in another CR where destroying one of a snapshots
was helping the performance. Nevertheless here it''s on that server:

Short period of time:

bash-3.00# ./metaslab-6495013.d

^C

  Loops count
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@               17674
               1 |@@@@@@@                                  4418
               2 |@@@                                      2123
               4 |@@                                       1257
               8 |@                                        753
              16 |@                                        416
              32 |                                         220
              64 |                                         103
             128 |                                         58
             256 |                                         38
             512 |                                         21
            1024 |                                         13
            2048 |                                         10
            4096 |                                         8
            8192 |                                         3
           16384 |                                         3
           32768 |                                         2
           65536 |                                         1
          131072 |                                         26
          262144 |                                         7
          524288 |                                         0

bash-3.00#


Looks like that''s it.

Here''s another server with similar problem.

bash-3.00# ./metaslab-6495013.d
^C

  Loops count
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@             1648
               1 |@@@                                      197
               2 |@@                                       93
               4 |@                                        56
               8 |@                                        45
              16 |@                                        36
              32 |                                         28
              64 |@                                        32
             128 |                                         18
             256 |                                         28
             512 |                                         21
            1024 |                                         20
            2048 |                                         10
            4096 |                                         25
            8192 |                                         9
           16384 |                                         9
           32768 |                                         10
           65536 |                                         4
          131072 |                                         11
          262144 |@                                        47
          524288 |                                         0

bash-3.00#


Also right now these results are not during peak hours so it will get
much worse later... :((


IIRC you''ve written before that someone is actively working on it
right now, right? Any update? Any approx. ETA? I would like to test it
ASAP even before putback.


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

eric kustarz

2007-Feb-14 16:04 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

>
> I''ve been using it in another CR where destroying one of a
snapshots
> was helping the performance. Nevertheless here it''s on that
server:
>
> Short period of time:
>
> bash-3.00# ./metaslab-6495013.d
>
> ^C
>
>   Loops count
>            value  ------------- Distribution ------------- count
>               -1 |                                         0
>                0 |@@@@@@@@@@@@@@@@@@@@@@@@@@               17674
>                1 |@@@@@@@                                  4418
>                2 |@@@                                      2123
>                4 |@@                                       1257
>                8 |@                                        753
>               16 |@                                        416
>               32 |                                         220
>               64 |                                         103
>              128 |                                         58
>              256 |                                         38
>              512 |                                         21
>             1024 |                                         13
>             2048 |                                         10
>             4096 |                                         8
>             8192 |                                         3
>            16384 |                                         3
>            32768 |                                         2
>            65536 |                                         1
>           131072 |                                         26
>           262144 |                                         7
>           524288 |                                         0
>
> bash-3.00#
>
>
> Looks like that''s it.
Yeah, sometimes doing over 200,000 avl_walks isn''t going to be good.

...>
>
> IIRC you''ve written before that someone is actively working on it
> right now, right? Any update? Any approx. ETA? I would like to test it
> ASAP even before putback.
I believe this is next on George''s list (he''s got a couple of
fixes
he needs to putback first).

So since you''re in this state, would you mind seeing the  
''size'' (arg1) distribution instead of avl_walk()s for  
metaslab_ff_claim()?  Something like:
"
#!/usr/sbin/dtrace -s

#pragma D option quiet

BEGIN
{
         self->in_metaslab = 0;
}

fbt::metaslab_ff_alloc:entry
/self->in_metaslab == 0/
  {
           self->in_metaslab = 1;
           @sizes["metaslab sizes"] = quantize(arg1);
}

fbt::metaslab_ff_alloc:return
/self->in_metaslab/
{
         self->in_metaslab = 0;
}
"

I''m wondering if we can just lower the amount of space we''re
trying
to alloc as the pool becomes more fragmented - we''ll lose a little I/ 
O performance, but it should limit this bug.

eric

Robert Milkowski

2007-Feb-15 13:47 UTC

head link

[zfs-discuss] Re: Re[8]: Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

bash-3.00# ./metaslab-6495013-2.d 
^C

  metaslab sizes                                    
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |@@@@@@@@@@@@                             9031     
            1024 |@@@@@@@@@@                               7462     
            2048 |@@@@@@@@@                                6832     
            4096 |@@@                                      2242     
            8192 |@                                        783      
           16384 |@                                        934      
           32768 |@                                        422      
           65536 |@@                                       1463     
          131072 |@@                                       1467     
          262144 |                                         0        

bash-3.00#
 
 
This message posted from opensolaris.org

Robert Milkowski

2007-Feb-15 14:08 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Hello eric,

Wednesday, February 14, 2007, 5:04:01 PM, you wrote:

ek> I''m wondering if we can just lower the amount of space
we''re trying
ek> to alloc as the pool becomes more fragmented - we''ll lose a
little I/
ek> O performance, but it should limit this bug.

Do you think that zfs send|recv for such file systems could help for a
while? (would it defragment data?)


-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2007-Feb-15 14:25 UTC

head link

[zfs-discuss] Re: Re[8]: Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

To confirm again if it''s avl_walk I stopped nfsd (which took a while)
then issued sync() which is still running after 10 minutes. I observer that for
many seconds zpool iostta (or iostta) do not show any or almost any disk
traffic, then some megabytes, etc. However all the time one CPU is 100% in SYS.
Roght now cpu id=24 is 100% in SYS so:

bash-3.00# dtrace -n profile-1003''/cpu ==
24/{@[cpu,stack()]=count();}'' -n
tick-10s''{trunc(@,33);printa(@);exit(0);}''
dtrace: description ''profile-1003'' matched 1 probe
dtrace: description ''tick-10s'' matched 1 probe
CPU     ID                    FUNCTION:NAME
 30  49714                        :tick-10s 
       24
              zfs`metaslab_ff_alloc+0xac
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               11
       24
              genunix`avl_walk+0x8
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               12
       24
              genunix`avl_walk+0x3c
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               12
       24
              genunix`avl_walk
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               14
       24
              genunix`avl_walk+0x70
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               14
       24
              genunix`avl_walk+0x88
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               14
       24
              genunix`avl_walk+0x98
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               16
       24
              genunix`avl_walk+0x1c
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               17
       24
              genunix`avl_walk+0x68
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               17
       24
              zfs`metaslab_ff_alloc+0xa8
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               17
       24
              genunix`avl_walk+0x54
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               18
       24
              genunix`avl_walk+0xac
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               18
       24
              zfs`metaslab_ff_alloc+0xa4
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               21
       24
              genunix`avl_walk+0x78
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               22
       24
              genunix`avl_walk+0x84
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               22
       24
              zfs`metaslab_ff_alloc+0x8c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               24
       24
              genunix`avl_walk+0x80
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               26
       24
              zfs`metaslab_ff_alloc+0xa0
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               37
       24
              genunix`avl_walk+0x2c
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               39
       24
              zfs`metaslab_ff_alloc+0x98
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               40
       24
              genunix`avl_walk+0x4
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               41
       24
              genunix`avl_walk+0xc
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               43
       24
              genunix`avl_walk+0x90
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               48
       24
              genunix`avl_walk+0xa4
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               50
       24
              zfs`metaslab_ff_alloc+0xb0
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               69
       24
              genunix`avl_walk+0xa0
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
               81
       24
              genunix`avl_walk+0x24
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
              172
       24
              zfs`metaslab_ff_alloc+0x84
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
              322
       24
              zfs`metaslab_ff_alloc+0x80
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
              749
       24
              genunix`avl_walk+0x64
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
              758
       24
              genunix`avl_walk+0x48
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
              761
       24
              genunix`avl_walk+0x60
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
             2746
       24
              genunix`avl_walk+0x44
              zfs`metaslab_ff_alloc+0x9c
              zfs`space_map_alloc+0x10
              zfs`metaslab_group_alloc+0x1e0
              zfs`metaslab_alloc_dva+0x114
              zfs`metaslab_alloc+0x2c
              zfs`zio_dva_allocate+0x4c
              zfs`zio_write_compress+0x1ec
              zfs`arc_write+0xe4
              zfs`dbuf_sync+0x6c0
              zfs`dnode_sync+0x35c
              zfs`dmu_objset_sync_dnodes+0x6c
              zfs`dmu_objset_sync+0x54
              zfs`dsl_dataset_sync+0xc
              zfs`dsl_pool_sync+0x64
              zfs`spa_sync+0x1b0
              zfs`txg_sync_thread+0x134
              unix`thread_start+0x4
             2796


bash-3.00# dtrace -n profile-1003''/cpu ==
24/{@[cpu,stack()]=count();}'' -n
tick-10s''{trunc(@,33);printa(@);exit(0);}''
bash-3.00#
 
 
This message posted from opensolaris.org

eric kustarz

2007-Feb-20 16:55 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

On Feb 15, 2007, at 6:08 AM, Robert Milkowski wrote:
> Hello eric,
>
> Wednesday, February 14, 2007, 5:04:01 PM, you wrote:
>
> ek> I''m wondering if we can just lower the amount of space
we''re
> trying
> ek> to alloc as the pool becomes more fragmented - we''ll lose a
> little I/
> ek> O performance, but it should limit this bug.
>
> Do you think that zfs send|recv for such file systems could help for a
> while? (would it defragment data?)
>
>
If you were able to send over your complete pool, destroy the  
existing one and re-create a new one using recv, then that should  
help with fragmentation.  That said, that''s a very poor man''s
defragger.  The defragmentation should happen automatically or at  
least while the pool is online.

In the absence of a built-in defragger and without a fix for 6495013,  
i think the best thing you could do is either add more storage or  
remove some data (such as removing some old snapshots or move some  
unneeded storage to another system/backup).  Not sure if either of  
those are applicable to you.

Should note, ZFS isn''t the only COW filesystem that hits this  
problem.  In WAFL''s case, i believe they hit this same problem around  
90% capacity but are very nice about detecting it and reporting back  
to the admin.

eric

Robert Milkowski

2007-Feb-20 21:12 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Hello eric,

Tuesday, February 20, 2007, 5:55:47 PM, you wrote:

ek> On Feb 15, 2007, at 6:08 AM, Robert Milkowski wrote:
>> Hello eric,
>>
>> Wednesday, February 14, 2007, 5:04:01 PM, you wrote:
>>
>> ek> I''m wondering if we can just lower the amount of space
we''re
>> trying
>> ek> to alloc as the pool becomes more fragmented - we''ll
lose a
>> little I/
>> ek> O performance, but it should limit this bug.
>>
>> Do you think that zfs send|recv for such file systems could help for a
>> while? (would it defragment data?)
>>
>>
ek> If you were able to send over your complete pool, destroy the  
ek> existing one and re-create a new one using recv, then that should  
ek> help with fragmentation.  That said, that''s a very poor
man''s
ek> defragger.  The defragmentation should happen automatically or at  
ek> least while the pool is online.

I was rather thinking about sending all file systems to another
server/pool (I''m in a middle of the process), then deleting source
file systems and send the file systems back. Of course no problem with
destroying the pool but I wonder why do you think it''s needed? Just
deleting file systems won''t be enough?

btw: I''ve already migrated three file systems that way to x4500 and
     so far they are working great - no cpu usage, much less read IOs
     (both in # of IO and in volume) and everything is committed
     exactly every 5s. So I guess there''s high degree of probability
     it will stay the same once I migrate them back to cluster.

ek> In the absence of a built-in defragger and without a fix for 6495013,
ek> i think the best thing you could do is either add more storage or  
ek> remove some data (such as removing some old snapshots or move some  
ek> unneeded storage to another system/backup).  Not sure if either of  
ek> those are applicable to you.

Some time ago removing snapshots helped. Then we stopped creating them
and not there''s nothing really to remove left so I''m doing
above.

btw: 6495013 - what is it exactly (bugs.opensolaris.org doesn''t show
it either sunsolve).

ek> Should note, ZFS isn''t the only COW filesystem that hits this  
ek> problem.  In WAFL''s case, i believe they hit this same problem
around
ek> 90% capacity but are very nice about detecting it and reporting back  
ek> to the admin.

Yeah, but with ZFS I hit really big problem when pool has still more
than 50% free space (which is almost 700GB of free space).
And it''s not only that writes are slow it''s also that during
the
problem one whole core is eaten up to 100% all the time. Which rises a
suspicion that perhaps something could go here in parallel and speed
up thing a little bit on T2000 or v440. Crude workaround but still
most of other cores just do nothing and one core is 100% in sys doing
all those avl_walks (long journey).

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

eric kustarz

2007-Feb-20 22:29 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

>
> ek> If you were able to send over your complete pool, destroy the
> ek> existing one and re-create a new one using recv, then that should
> ek> help with fragmentation.  That said, that''s a very poor
man''s
> ek> defragger.  The defragmentation should happen automatically or at
> ek> least while the pool is online.
>
> I was rather thinking about sending all file systems to another
> server/pool (I''m in a middle of the process), then deleting source
> file systems and send the file systems back. Of course no problem with
> destroying the pool but I wonder why do you think it''s needed?
Just
> deleting file systems won''t be enough?
Yeah, destroying the filesystems should be enough (i was equating  
destroying all filesystems to a zpool destroy).
>
> btw: I''ve already migrated three file systems that way to x4500
and
>      so far they are working great - no cpu usage, much less read IOs
>      (both in # of IO and in volume) and everything is committed
>      exactly every 5s. So I guess there''s high degree of
probability
>      it will stay the same once I migrate them back to cluster.
>
>
> ek> In the absence of a built-in defragger and without a fix for  
> 6495013,
> ek> i think the best thing you could do is either add more storage or
> ek> remove some data (such as removing some old snapshots or move some
> ek> unneeded storage to another system/backup).  Not sure if either of
> ek> those are applicable to you.
>
> Some time ago removing snapshots helped. Then we stopped creating them
> and not there''s nothing really to remove left so I''m
doing above.
>
> btw: 6495013 - what is it exactly (bugs.opensolaris.org doesn''t
show
> it either sunsolve).
Hmm, should be working now:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6495013

6495013 Loops and recursion in metaslab_ff_alloc can kill  
performance, even on a pool with lots of free data

eric

Robert Milkowski

2007-Feb-20 22:38 UTC

head link

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Hello eric,

Tuesday, February 20, 2007, 11:29:41 PM, you wrote:
>>
>> ek> If you were able to send over your complete pool, destroy the
>> ek> existing one and re-create a new one using recv, then that
should
>> ek> help with fragmentation.  That said, that''s a very poor
man''s
>> ek> defragger.  The defragmentation should happen automatically or
at
>> ek> least while the pool is online.
>>
>> I was rather thinking about sending all file systems to another
>> server/pool (I''m in a middle of the process), then deleting
source
>> file systems and send the file systems back. Of course no problem with
>> destroying the pool but I wonder why do you think it''s needed?
Just
>> deleting file systems won''t be enough?
ek> Yeah, destroying the filesystems should be enough (i was equating  
ek> destroying all filesystems to a zpool destroy).
>>
>> btw: I''ve already migrated three file systems that way to
x4500 and
>>      so far they are working great - no cpu usage, much less read IOs
>>      (both in # of IO and in volume) and everything is committed
>>      exactly every 5s. So I guess there''s high degree of
probability
>>      it will stay the same once I migrate them back to cluster.
>>
>>
>> ek> In the absence of a built-in defragger and without a fix for  
>> 6495013,
>> ek> i think the best thing you could do is either add more storage
or
>> ek> remove some data (such as removing some old snapshots or move
some
>> ek> unneeded storage to another system/backup).  Not sure if either
of
>> ek> those are applicable to you.
>>
>> Some time ago removing snapshots helped. Then we stopped creating them
>> and not there''s nothing really to remove left so I''m
doing above.
>>
>> btw: 6495013 - what is it exactly (bugs.opensolaris.org
doesn''t show
>> it either sunsolve).
ek> Hmm, should be working now:
ek> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6495013

ek> 6495013 Loops and recursion in metaslab_ff_alloc can kill  
ek> performance, even on a pool with lots of free data

thanks

I''ll ask my case engineer to add a call record for us.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Reasonably Related Threads

Search for more possibly parallel threads

zfs discuss - Feb 2007 - NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: Re[8]: Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: Re[8]: Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

Reasonably Related Threads