thr3ads.net - zfs discuss - [zfs-discuss] ZFS extremely slow on UFS backing store [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Jeff Bonwick

2005-Nov-24 02:24 UTC

[zfs-discuss] ZFS extremely slow on UFS backing store

> -	at the beginning of a directory, getdents() takes 0.1 .. 1 seconds
> 
> -	when the directory offset reaches some limit, a single getdents()
> 	reaches a constant value of 3 seconds.
J?rg,

I''m looking into it.

My suspicion is that because you''re using UFS files as the backing
store
for your ZFS storage pool, UFS and ZFS are stepping on each other
more and more as the memory pressure increases.  There''s an open
bug on this -- ZFS needs to cache a little less aggressively to
play nicer with other consumers of memory.

But performance work is often surprising -- it could be something else entirely.

I''ll let you know what I find.

Thanks,

Jeff

Jeff Bonwick

2005-Nov-24 02:29 UTC

head link

[zfs-discuss] ZFS extremely slow on UFS backing store

> With the T3''s cache disabled :
> ZFS - 17 minutes, 47 seconds real  (5:35 user, 0:11 system)
> UFS - 48 minutes, 28 seconds real  (5:38 user, 0:13 system)
> 
> With the T3''s cache enabled :
> ZFS - 15 minutes, 30 seconds real  (5:49 user, 0:13 system)
> UFS - 24 minutes, 29 seconds real  (5:39 user, 0:13 system)
> 
> So realistically ZFS is _significantly_ faster than UFS (for the untar
> at least). It also seems far less reliant on the speed of the underlying
> disk (as is seen by the minimal difference the hardware cache makes).
> ZFS layered on top of lofi layered on top of UFS may be slower, but
> that''s not exactly the use case it was designed for!
Thanks for gathering this data, Scott.  All very interesting...

Jeff

Joerg Schilling

2005-Nov-24 13:20 UTC

head link

[zfs-discuss] ZFS extremely slow

Jeff Bonwick <bonwick at zion.eng.sun.com> wrote:
> > -	at the beginning of a directory, getdents() takes 0.1 .. 1 seconds
> > 
> > -	when the directory offset reaches some limit, a single getdents()
> > 	reaches a constant value of 3 seconds.
>
> J?rg,
>
> I''m looking into it.
>
> My suspicion is that because you''re using UFS files as the backing
store
> for your ZFS storage pool, UFS and ZFS are stepping on each other
> more and more as the memory pressure increases.  There''s an open
> bug on this -- ZFS needs to cache a little less aggressively to
> play nicer with other consumers of memory.
My assumption is that ZFS does not cache enough meta data and need to
read a lot more than UFS does when running find. This assumption seems 
to be proved by the tests from Scott Howard <Scott.Howard at Sun.COM>

He did see increased speed after enabling the disk cache...

> But performance work is often surprising -- it could be something else
entirely.
Altough what I did may not be a test that shows typical behavior, it
seems to show where problems are located.J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de		(uni)  
       schilling at fokus.fraunhofer.de	(work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Joerg Schilling

2005-Nov-24 21:43 UTC

head link

[zfs-discuss] ZFS extremely slow

Jeff Bonwick <bonwick at zion.eng.sun.com> wrote:
> > -	at the beginning of a directory, getdents() takes 0.1 .. 1 seconds
> > 
> > -	when the directory offset reaches some limit, a single getdents()
> > 	reaches a constant value of 3 seconds.
> My suspicion is that because you''re using UFS files as the backing
store
> for your ZFS storage pool, UFS and ZFS are stepping on each other
> more and more as the memory pressure increases.  There''s an open
> bug on this -- ZFS needs to cache a little less aggressively to
> play nicer with other consumers of memory.
OK, back again... the problem is (as expected) not related to the 
backing store.

I did repeat the tests on a real disk and the results are basocally identical.

My conclusions are that ZFS does either not cache things at all
of that ZFS cahces the wrong things

All tests have been made with freedb-complete-20051104.tar.bz2

When extracting the archive, ZFS seems to be much faster as long as there
are less than ~ 200000 files in a directory. If there are more than
~ 400000 files in a directory, ZFS is significantly slower than UFS.

As long as ZFS is fast, the I/O rate is low
When ZFS becomes slow, the I/O rate is very high compared
to the task.

UFS test results:

star -xp bs=1m fs=32m -no-fsync < /tmp/freedb-complete-20051104.tar.bz2 
star: WARNING: Archive is ''bzip2'' compressed, trying to use
the -bz option.
star: current ''./'' newer.
star: 3094 blocks + 546816 bytes (total of 3244840960 bytes = 3168790.00k).
1:57:16.282r 826.930u 1869.440s 38% 0M 0+0k 0st 0+0io 0pf+0w

sfind . > /dev/null
2:11.517r 3.590u 69.330s 55% 0M 0+0k 0st 0+0io 0pf+0w

find . > /dev/null 
2:17.742r 4.690u 73.310s 56% 0M 0+0k 0st 0+0io 0pf+0w

As expected, sfind is a bit faster than Sun find as sfind uses a 
modern algorithm that is needed to grant correct functionality even
on deeply nested directories and non-seekable directories.

rm -rf *
deleting:
-rf COPYING README blues classical country data folk jazz misc newage reggae 
rock soundtrack
27:22.836r 40.800u 626.440s 40% 0M 0+0k 0st 0+0io 0pf+0w

The remove time is OK...

star -xp bs=1m fs=32m < 
/tmp/freedb-complete-20051104.tar.bz2           
star: WARNING: Archive is ''bzip2'' compressed, trying to use
the -bz option.
star: current ''./'' newer.
star: 3094 blocks + 546816 bytes (total of 3244840960 bytes = 3168790.00k).
1:55:20.271r 900.780u 4102.840s 72% 0M 0+0k 0st 0+0io 0pf+0w

A star extracrtion _with_ fsync(2) for every file is not slower....

ZFS results (done on exactly the same partition as the UFS results above):

star -xp bs=1m fs=32m -no-fsync < /tmp/freedb-complete-20051104.tar.bz2 
star: WARNING: Archive is ''bzip2'' compressed, trying to use
the -bz option.
star: current ''./'' newer.
star: 3094 blocks + 546816 bytes (total of 3244840960 bytes = 3168790.00k).
3:57:08.741r 820.440u 1725.050s 17% 0M 0+0k 0st 0+0io 0pf+0w

This time is roughly twice the UFs extract time. If ZFS would
not become extremely slow on large directories, ZFS could be twice
as fast as UFS on this test.

find . > /dev/null
28:22.793r 7.740u 548.740s 32% 0M 0+0k 0st 0+0io 0pf+0w

Sun find is more than 10x slower than on UFS.

sfind . > /dev/null
1:12:35.216r 6.300u 631.950s 14% 0M 0+0k 0st 0+0io 0pf+0w

Sfind is 33x slower than on UFS.

sfind is extremely slow although it uses an optimizes modern
tree walking algorithm. 

It seems that ZFS is optimized the wrong way (as it causes 
a program that is faster on all other known filesystems
and POSIXLY more correct than Sun find to be slower than
the ancient Sun find).

Note that the basic idea in the sfind algorithm is used by all
modern tree walking programs such as sfind, star and GNU tar.

rm -rf *

Well, ZFS is _REALLY_ slow here....

The rm test runs since 40 minutes and so far only 25% of the files
have been removed. I thus expect a total rm time of 160 minutes 
on ZFS. This 6x slower than on UFS.

Once I am ready with the remove, I will start the star extract test
_with_ an fsync(2) call for every file - which is the default.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de		(uni)  
       schilling at fokus.fraunhofer.de	(work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Roch Bourbonnais - Performance Engineering

2005-Nov-25 09:35 UTC

head link

[zfs-discuss] ZFS extremely slow

Not sure if this has been mentioned yet.

Is the working set bigger than available memory ?

If so, UFS is being nice by keeping  it''s clean page on some
free list; I would suspect it never causes the system to run
out of memory. No swapping / good perf.

Then, this  issue would fall into  the ZFS memory management
bucket we have to  deal with anyway.  The first 200000 files
from the archive  could mark the  point where memory becomes
overcommited. I''ll see if I can run a test...


-r

____________________________________________________________________________________
Roch Bourbonnais                        Sun Microsystems, Icnc-Grenoble 
Senior Performance Analyst              180, Avenue De L''Europe, 38330,
					Montbonnot Saint Martin, France
Performance & Availability Engineering  
http://icncweb.france/~rbourbon		http://blogs.sun.com/roller/page/roch
Roch.Bourbonnais at Sun.Com		(+33).4.76.18.83.20

Joerg Schilling

2005-Nov-25 14:01 UTC

head link

[zfs-discuss] ZFS extremely slow

Roch Bourbonnais - Performance Engineering <Roch.Bourbonnais at Sun.COM>
wrote:
>
> Not sure if this has been mentioned yet.
>
> Is the working set bigger than available memory ?
If you describe what you understand by the working set, I may comment....
>From the observation form me and others, it looks like:
-	ZFS may be copying my "gnode" idea from WOFS (my Worm Filesystem
	master thesis) and have file names tacked to the rest of the
	meta data.

	At least it would allow to understand why archaic code like
	nftw() (used by Sun find) performs better than modern code
	(used e.g. by sfind, star, gtar) that avoids limitations
	from nftw().

-	ZFS seems to cache meta data and is fast as long as cached 
	data could be used but becomes disproportionately slow when
	the cache gets a miss.

> If so, UFS is being nice by keeping  it''s clean page on some
> free list; I would suspect it never causes the system to run
> out of memory. No swapping / good perf.
UFS seems to implement a "nice" caching strategy that does behave 
user friendly when there is a miss.
> Then, this  issue would fall into  the ZFS memory management
> bucket we have to  deal with anyway.  The first 200000 files
> from the archive  could mark the  point where memory becomes
> overcommited. I''ll see if I can run a test...
Well, my test is not testing something that happens every day but
it demonmstrates where ZFS needs further work before calling it 
mature.

J?rg

-- 
 EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin
       js at cs.tu-berlin.de		(uni)  
       schilling at fokus.fraunhofer.de	(work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily

Roch Bourbonnais - Performance Engineering

2005-Nov-25 15:27 UTC

head link

[zfs-discuss] ZFS extremely slow

Joerg Schilling writes:
 > Roch Bourbonnais - Performance Engineering <Roch.Bourbonnais at
Sun.COM> wrote:
 > 
 > >
 > > Not sure if this has been mentioned yet.
 > >
 > > Is the working set bigger than available memory ?
 > 
 > If you describe what you understand by the working set, I may comment....
 > 

That would be  both the tar.gz file  size + the on-disk size
of   the   decompressed   data  set.     Just  to help   the
investigation, what    are the sizes   vs  available  system
memory ?

-r

zfs discuss - Nov 2005 - ZFS extremely slow on UFS backing store

[zfs-discuss] ZFS extremely slow on UFS backing store

[zfs-discuss] ZFS extremely slow on UFS backing store

[zfs-discuss] ZFS extremely slow

[zfs-discuss] ZFS extremely slow

[zfs-discuss] ZFS extremely slow

[zfs-discuss] ZFS extremely slow

[zfs-discuss] ZFS extremely slow