thr3ads.net - zfs discuss - [zfs-discuss] slow sync on zfs [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Robert Milkowski

2007-Apr-22 23:55 UTC

[zfs-discuss] slow sync on zfs

Hello zfs-discuss,



Relatively low traffic to the pool but sync takes too long to complete
and other operations are also not that fast.

Disks are on 3510 array. zil_disable=1.


bash-3.00# ptime sync

real     1:21.569
user        0.001
sys         0.027


During sync zpool iostat and vmstat look like:

f3-1         504G   720G    370    859   995K  10.2M
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G    697    929  2.91M  10.5M
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G  1.21K     90  6.33M  1.57M
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G  1.38K      6  6.83M   256K
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G  1.29K      0  4.10M   127K
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G  1.35K      0  6.98M   127K
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G   1012    229  3.06M   631K
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G    683  1.74K  7.00M  13.8M
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G    966    722  3.00M  6.63M
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G    702    134  1.85M  1.96M
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G     1K     78  3.05M   880K
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G    899    154  2.59M  1.45M
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
f3-1         504G   720G  1.00K      0  4.35M      0
misc        20.6M  52.0G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
^C

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr m0 m1 m2 m1   in   sy   cs us sy id
 0 0 0 8266008 1100560 0  0  0  0  0  0  0  0  0  0  0 2392  589 9592  0 21 79
 1 0 0 8266008 1100560 0  0  0  0  0  0  0  0  0  0  0 3909 1458 13330 0 39 61
 0 0 0 8265400 1099952 0  0  0  0  0  0  0  0  0  0  0 6892 1104 21023 0 47 53
 0 0 0 8262648 1097200 0  0  0  0  0  0  0 65 64 65  0 7904 1327 22531 0 50 50
 0 0 0 8259496 1094048 0  0  0  0  0  0  0 16 16 16  0 7037  986 20123 0 50 50
 1 0 0 8258536 1093088 0  0  0  0  0  0  0  0  0  0  0 4363 1084 12107 0 39 61
 0 0 0 8250856 1085408 0  0  0  0  0  0  0  0  0  0  0 4378  414 16436 0 30 70
 0 0 0 8247888 1080736 580 1048 0 0 0 0  0  0  0  0  0 7283 2409 21480 4 35 61
 0 0 0 8248600 1083152 0  0  0  0  0  0  0  0  0  0  0 3045 1184 10368 0 36 64
 0 0 0 8248600 1083152 0  0  0  0  0  0  0  0  0  0  0 1659 1543 5847  0 34 66
 0 0 0 8248600 1083152 0  0  0  0  0  0  0  0  0  0  0 1755 1743 6639  0 35 65
 1 0 0 8248600 1083152 0  0  0  0  0  0  0  0  0  0  0 2723 1259 7973  0 36 64
 0 0 0 8250280 1085040 0  0  0  0  0  0  0  0  0  0  0 1104 1308 3944  0 30 69
 0 0 0 8250280 1085040 0  0  0  0  0  0  0  0  0  0  0 2348  705 9212  0 29 70
 0 0 0 8250016 1084776 0  0  0  0  0  0  0  0  0  0  0 5152  384 17753 0 22 78
 1 0 0 8249928 1084688 0  0  0  0  0  0  0  0  0  0  0 2397 1193 7311  0 30 70
^C






bash-3.00# uname -a
SunOS nfs-10-1.srv 5.10 Generic_125100-04 sun4u sparc SUNW,Sun-Fire-V440
bash-3.00# showrev -p|grep IDR
Patch: IDR126199-01 Obsoletes:  Requires: 120473-05 Incompatibles: 120473-06
Packages: SUNWzfskr
bash-3.00#


-- 
Best regards,
 Robert                          mailto:rmilkowski at task.gda.pl
                                     http://milek.blogspot.com

Peter Tribble

2007-Apr-23 19:27 UTC

head link

[zfs-discuss] slow sync on zfs

On 4/23/07, Robert Milkowski <rmilkowski at task.gda.pl>
wrote:>
> Relatively low traffic to the pool but sync takes too long to complete
> and other operations are also not that fast.
>
> Disks are on 3510 array. zil_disable=1.
>
>
> bash-3.00# ptime sync
>
> real     1:21.569
> user        0.001
> sys         0.027
Hey, that is *quick*!

On Friday afternoon I typed sync mid-afternoon. Nothing had happened
a couple of hours later when I went home. It looked as though it had finished
by 11pm, when I checked in from home.

This was on a thumper running S10U3. As far as I could tell, all writes
to the pool stopped completely. There were applications trying to write,
but they had just stopped (and picked up later in the evening). A fairly
consistent few hundred K per second of reads; no writes; and pretty low
system load.

It did recover, but write latencies of a few hours is rather undesirable.

What on earth was it doing?

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

Robert Milkowski

2007-Apr-23 20:44 UTC

head link

[zfs-discuss] slow sync on zfs

Hello Peter,

Monday, April 23, 2007, 9:27:56 PM, you wrote:

PT> On 4/23/07, Robert Milkowski <rmilkowski at task.gda.pl>
wrote:>>
>> Relatively low traffic to the pool but sync takes too long to complete
>> and other operations are also not that fast.
>>
>> Disks are on 3510 array. zil_disable=1.
>>
>>
>> bash-3.00# ptime sync
>>
>> real     1:21.569
>> user        0.001
>> sys         0.027
PT> Hey, that is *quick*!

PT> On Friday afternoon I typed sync mid-afternoon. Nothing had happened
PT> a couple of hours later when I went home. It looked as though it had
finished
PT> by 11pm, when I checked in from home.

PT> This was on a thumper running S10U3. As far as I could tell, all writes
PT> to the pool stopped completely. There were applications trying to write,
PT> but they had just stopped (and picked up later in the evening). A fairly
PT> consistent few hundred K per second of reads; no writes; and pretty low
PT> system load.

PT> It did recover, but write latencies of a few hours is rather undesirable.

PT> What on earth was it doing?

I''ve seen it too :(

Other that that I can see that while I can observe reads and writes
zfs is issuing write cache flush commands even in minutes instead of
5s default. And nfsd goes crazy then.

Then zfs commands like zpool status, zfs list, etc. can hung for
hours... nothing unusual with iostat.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Dickon Hood

2007-Apr-23 20:51 UTC

head link

[zfs-discuss] slow sync on zfs

On Mon, Apr 23, 2007 at 20:27:56 +0100, Peter Tribble wrote:
: On 4/23/07, Robert Milkowski <rmilkowski at task.gda.pl> wrote:

: >Relatively low traffic to the pool but sync takes too long to complete
: >and other operations are also not that fast.

: >Disks are on 3510 array. zil_disable=1.

: >bash-3.00# ptime sync

: >real     1:21.569
: >user        0.001
: >sys         0.027

: Hey, that is *quick*!

: On Friday afternoon I typed sync mid-afternoon. Nothing had happened
: a couple of hours later when I went home. It looked as though it had 
: finished
: by 11pm, when I checked in from home.

: This was on a thumper running S10U3. As far as I could tell, all writes
: to the pool stopped completely. There were applications trying to write,
: but they had just stopped (and picked up later in the evening). A fairly
: consistent few hundred K per second of reads; no writes; and pretty low
: system load.

I''m glad I''m not the only one to have seen this.

I''m currently playing with ZFS on a T2000 with 24x500GB SATA discs in
an
external array that presents as SCSI.  After having much ''fun''
with the
Solaris SCSI driver not handling LUNs >2TB, I reconfigured the array to
present as one target with 24 LUNs, one per disc, and threw ZFS at it in a
raidz2 configuration.  I admit this isn''t optimal, but it has the
behaviour I wanted: namely lots of space with a little redundancy for
safety.

Having had said ''fun'' with the SD driver I thought
I''d thoroughly check
large object handling, and started eight ''dd if=/dev/zero''s
before
retiring to the pub and leaving it overnight.

The next morning, I discovered a bunch of rather large files.  340GB in
size.  Everything seemed OK, so I issued an ''rm *'', expecting
it to return
rather quickly.  How wrong I was.

It took a minute (61s from memory) to delete a single 320GB file, which
flattened the SCSI bus issuing 4.5MB/s/disc reads (as reported by iostat
-x), during which time all writes were suspended.  This is not good.  Once
that had finished, a ''ptime sync'' sat for 25 minutes running
at about
1MB/s/disc.  Again, all reads.

Given what I intend to use this filesystem for -- dropping all the
BBC''s
Freeview muxes to disc in 24-hour chunks -- performance on large objects
is rather important to me.  I''ve reconfigured to 3x(7+1) raidz, and
this
has helped a lot (as I expected it would), but it''s still not great
having
multi-second write locks when deleting 16GB objects.

100MB/s write speed and 200MB/s read speed isn''t bad, though.  Quite
impressed with that.

: It did recover, but write latencies of a few hours is rather undesirable.

To put it mildly.

: What on earth was it doing?

I wish I knew.  Anyone any ideas on how to optimise it further?  I''m
using
the defaults (whatever''s created by a 8GB RAM T2000 with 8 1GHz cores);
no
compression, no nothing.

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn''t bother looking.

Robert Milkowski

2007-Apr-23 21:12 UTC

head link

[zfs-discuss] slow sync on zfs

Hello Robert,

Monday, April 23, 2007, 10:44:00 PM, you wrote:

RM> Hello Peter,

RM> Monday, April 23, 2007, 9:27:56 PM, you wrote:

PT>> On 4/23/07, Robert Milkowski <rmilkowski at task.gda.pl>
wrote:>>>
>>> Relatively low traffic to the pool but sync takes too long to
complete
>>> and other operations are also not that fast.
>>>
>>> Disks are on 3510 array. zil_disable=1.
>>>
>>>
>>> bash-3.00# ptime sync
>>>
>>> real     1:21.569
>>> user        0.001
>>> sys         0.027
PT>> Hey, that is *quick*!

PT>> On Friday afternoon I typed sync mid-afternoon. Nothing had happened
PT>> a couple of hours later when I went home. It looked as though it had
finished
PT>> by 11pm, when I checked in from home.

PT>> This was on a thumper running S10U3. As far as I could tell, all
writes
PT>> to the pool stopped completely. There were applications trying to
write,
PT>> but they had just stopped (and picked up later in the evening). A
fairly
PT>> consistent few hundred K per second of reads; no writes; and pretty
low
PT>> system load.

PT>> It did recover, but write latencies of a few hours is rather
undesirable.

PT>> What on earth was it doing?

RM> I''ve seen it too :(

RM> Other that that I can see that while I can observe reads and writes
RM> zfs is issuing write cache flush commands even in minutes instead of
RM> 5s default. And nfsd goes crazy then.

RM> Then zfs commands like zpool status, zfs list, etc. can hung for
RM> hours... nothing unusual with iostat.

Also stopping nfsd can take dozen of minutes to complete.
I''ve never observed this with nfsd/ufs.

-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Robert Milkowski

2007-Apr-23 21:40 UTC

head link

[zfs-discuss] slow sync on zfs

Hello Robert,

Monday, April 23, 2007, 11:12:39 PM, you wrote:

RM> Hello Robert,

RM> Monday, April 23, 2007, 10:44:00 PM, you wrote:

RM>> Hello Peter,

RM>> Monday, April 23, 2007, 9:27:56 PM, you wrote:

PT>>> On 4/23/07, Robert Milkowski <rmilkowski at task.gda.pl>
wrote:>>>>
>>>> Relatively low traffic to the pool but sync takes too long to
complete
>>>> and other operations are also not that fast.
>>>>
>>>> Disks are on 3510 array. zil_disable=1.
>>>>
>>>>
>>>> bash-3.00# ptime sync
>>>>
>>>> real     1:21.569
>>>> user        0.001
>>>> sys         0.027
PT>>> Hey, that is *quick*!

PT>>> On Friday afternoon I typed sync mid-afternoon. Nothing had
happened
PT>>> a couple of hours later when I went home. It looked as though it
had finished
PT>>> by 11pm, when I checked in from home.

PT>>> This was on a thumper running S10U3. As far as I could tell, all
writes
PT>>> to the pool stopped completely. There were applications trying to
write,
PT>>> but they had just stopped (and picked up later in the evening). A
fairly
PT>>> consistent few hundred K per second of reads; no writes; and
pretty low
PT>>> system load.

PT>>> It did recover, but write latencies of a few hours is rather
undesirable.

PT>>> What on earth was it doing?

RM>> I''ve seen it too :(

RM>> Other that that I can see that while I can observe reads and writes
RM>> zfs is issuing write cache flush commands even in minutes instead of
RM>> 5s default. And nfsd goes crazy then.

RM>> Then zfs commands like zpool status, zfs list, etc. can hung for
RM>> hours... nothing unusual with iostat.


RM> Also stopping nfsd can take dozen of minutes to complete.
RM> I''ve never observed this with nfsd/ufs.


Run on server itself.

ZFS:

bash-3.00# dtrace -n fbt::fop_*:entry''{self->t=timestamp;}''
-n
fbt::fop_*:return''/self->t/{@[probefunc]=quantize((timestamp-self->t)/1000000000);self->t=0;}''
-n tick-10s''{printa(@);}''
[after some time]
[only longer ops]
  fop_readdir
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 35895
               1 |                                         81
               2 |                                         4
               4 |                                         0

  fop_mkdir
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  864
               1 |                                         9
               2 |                                         5
               4 |                                         0
               8 |                                         0
              16 |                                         1
              32 |                                         2
              64 |                                         2
             128 |                                         0

  fop_space
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 426
               1 |                                         0
               2 |                                         0
               4 |                                         0
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |                                         0
             128 |                                         3
             256 |                                         0

  fop_lookup
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1181242
               1 |                                         311
               2 |                                         47
               4 |                                         3
               8 |                                         0

  fop_read
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 100799
               1 |                                         26
               2 |                                         1
               4 |                                         3
               8 |                                         5
              16 |                                         5
              32 |                                         9
              64 |                                         3
             128 |                                         3
             256 |                                         3
             512 |                                         0

  fop_remove
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  16085
               1 |                                         43
               2 |                                         6
               4 |                                         0
               8 |                                         0
              16 |                                         1
              32 |                                         29
              64 |                                         54
             128 |                                         75
             256 |                                         0

  fop_create
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   21883
               1 |@                                        300
               2 |                                         243
               4 |                                         118
               8 |                                         31
              16 |                                         15
              32 |                                         69
              64 |                                         228
             128 |@                                        359
             256 |                                         1
             512 |                                         0

  fop_symlink
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       8067
               1 |@                                        215
               2 |@                                        183
               4 |                                         114
               8 |                                         47
              16 |                                         6
              32 |                                         35
              64 |@                                        180
             128 |@@@                                      689
             256 |                                         2
             512 |                                         0

  fop_write
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 134052
               1 |                                         174
               2 |                                         20
               4 |                                         1
               8 |                                         3
              16 |                                         179
              32 |                                         148
              64 |                                         412
             128 |                                         632
             256 |                                         0


^C


And the same environment but on UFS (both are nfs servers, the same
HW):

bash-3.00# dtrace -n fbt::fop_*:entry''{self->t=timestamp;}''
-n
fbt::fop_*:return''/self->t/{@[probefunc]=quantize((timestamp-self->t)/1000000000);self->t=0;}''
-n tick-10s''{printa(@);}''
bash-3.00#
[after some time]
[only ops over 1s]
  fop_putpage
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 540731
               1 |                                         1
               2 |                                         0

  fop_read
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 122344
               1 |                                         4
               2 |                                         6
               4 |                                         0
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |                                         0
             128 |                                         0
             256 |                                         1
             512 |                                         0


^C


Well, looks much better on ufs/nfsd than zfs/nfsd.
The hardware is the same, workload the same, the same time.

ZFS server is with zil_disable=1.
Under smaller load ZFS rocks, with higher load it "suxx" :(
At least in a nfsd environment.



-- 
Best regards,
 Robert                            mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Torrey McMahon

2007-Apr-23 21:43 UTC

head link

[zfs-discuss] slow sync on zfs

Dickon Hood wrote:> [snip]
>
> I''m currently playing with ZFS on a T2000 with 24x500GB SATA discs
in an
> external array that presents as SCSI.  After having much
''fun'' with the
> Solaris SCSI driver not handling LUNs >2TB
That should work if you have the latest KJP and friends. (Actually, it 
should have been working for a while so if not....) What release are you on?

Dickon Hood

2007-Apr-23 21:59 UTC

head link

[zfs-discuss] slow sync on zfs

On Mon, Apr 23, 2007 at 17:43:31 -0400, Torrey McMahon wrote:
: Dickon Hood wrote:
: >[snip]

: >I''m currently playing with ZFS on a T2000 with 24x500GB SATA
discs in an
: >external array that presents as SCSI.  After having much
''fun'' with the
: >Solaris SCSI driver not handling LUNs >2TB

: That should work if you have the latest KJP and friends. (Actually, it 
: should have been working for a while so if not....) What release are you on?

Google suggested it may or may not, depending on how lucky I was.  I
assume I was just unlucky, or didn''t find the correct set of patches.
Actually I thought I had at one point, but writes after the first 2TB
returned IO errors.

I tried every recentish version on our Jumpstart server: 0305, 0606, and
1106, with the latest 10_Recommended patch cluster or not, and with
various other sd patches I could find.  Which versions I couldn''t
honestly
say; I gave up.

1106 out of the box won''t even see the SCSI card with a 2TB LUN, which
has
some interesting side effects with installing: the expansion cards appear
first, and if it can''t see it, suddenly your boot devices change once
patched.  I got one combination -- sorry, I don''t recall which, but I
think it was a 0606 with a patch -- to see the device, but as I say,
writes to >2TB fail with an IO error.  This is unhelpful.

I gave up and restructured it to export all the discs, as I said.  AIUI,
that''s better for zfs anyway.

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn''t bother looking.

Apparently Analagous Threads

Search for more possibly parallel threads

zfs discuss - Apr 2007 - slow sync on zfs

[zfs-discuss] slow sync on zfs

[zfs-discuss] slow sync on zfs

[zfs-discuss] slow sync on zfs

[zfs-discuss] slow sync on zfs

[zfs-discuss] slow sync on zfs

[zfs-discuss] slow sync on zfs

[zfs-discuss] slow sync on zfs

[zfs-discuss] slow sync on zfs

Apparently Analagous Threads