> - at the beginning of a directory, getdents() takes 0.1 .. 1 seconds > > - when the directory offset reaches some limit, a single getdents() > reaches a constant value of 3 seconds.J?rg, I''m looking into it. My suspicion is that because you''re using UFS files as the backing store for your ZFS storage pool, UFS and ZFS are stepping on each other more and more as the memory pressure increases. There''s an open bug on this -- ZFS needs to cache a little less aggressively to play nicer with other consumers of memory. But performance work is often surprising -- it could be something else entirely. I''ll let you know what I find. Thanks, Jeff
> With the T3''s cache disabled : > ZFS - 17 minutes, 47 seconds real (5:35 user, 0:11 system) > UFS - 48 minutes, 28 seconds real (5:38 user, 0:13 system) > > With the T3''s cache enabled : > ZFS - 15 minutes, 30 seconds real (5:49 user, 0:13 system) > UFS - 24 minutes, 29 seconds real (5:39 user, 0:13 system) > > So realistically ZFS is _significantly_ faster than UFS (for the untar > at least). It also seems far less reliant on the speed of the underlying > disk (as is seen by the minimal difference the hardware cache makes). > ZFS layered on top of lofi layered on top of UFS may be slower, but > that''s not exactly the use case it was designed for!Thanks for gathering this data, Scott. All very interesting... Jeff
Jeff Bonwick <bonwick at zion.eng.sun.com> wrote:> > - at the beginning of a directory, getdents() takes 0.1 .. 1 seconds > > > > - when the directory offset reaches some limit, a single getdents() > > reaches a constant value of 3 seconds. > > J?rg, > > I''m looking into it. > > My suspicion is that because you''re using UFS files as the backing store > for your ZFS storage pool, UFS and ZFS are stepping on each other > more and more as the memory pressure increases. There''s an open > bug on this -- ZFS needs to cache a little less aggressively to > play nicer with other consumers of memory.My assumption is that ZFS does not cache enough meta data and need to read a lot more than UFS does when running find. This assumption seems to be proved by the tests from Scott Howard <Scott.Howard at Sun.COM> He did see increased speed after enabling the disk cache...> But performance work is often surprising -- it could be something else entirely.Altough what I did may not be a test that shows typical behavior, it seems to show where problems are located.J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Jeff Bonwick <bonwick at zion.eng.sun.com> wrote:> > - at the beginning of a directory, getdents() takes 0.1 .. 1 seconds > > > > - when the directory offset reaches some limit, a single getdents() > > reaches a constant value of 3 seconds.> My suspicion is that because you''re using UFS files as the backing store > for your ZFS storage pool, UFS and ZFS are stepping on each other > more and more as the memory pressure increases. There''s an open > bug on this -- ZFS needs to cache a little less aggressively to > play nicer with other consumers of memory.OK, back again... the problem is (as expected) not related to the backing store. I did repeat the tests on a real disk and the results are basocally identical. My conclusions are that ZFS does either not cache things at all of that ZFS cahces the wrong things All tests have been made with freedb-complete-20051104.tar.bz2 When extracting the archive, ZFS seems to be much faster as long as there are less than ~ 200000 files in a directory. If there are more than ~ 400000 files in a directory, ZFS is significantly slower than UFS. As long as ZFS is fast, the I/O rate is low When ZFS becomes slow, the I/O rate is very high compared to the task. UFS test results: star -xp bs=1m fs=32m -no-fsync < /tmp/freedb-complete-20051104.tar.bz2 star: WARNING: Archive is ''bzip2'' compressed, trying to use the -bz option. star: current ''./'' newer. star: 3094 blocks + 546816 bytes (total of 3244840960 bytes = 3168790.00k). 1:57:16.282r 826.930u 1869.440s 38% 0M 0+0k 0st 0+0io 0pf+0w sfind . > /dev/null 2:11.517r 3.590u 69.330s 55% 0M 0+0k 0st 0+0io 0pf+0w find . > /dev/null 2:17.742r 4.690u 73.310s 56% 0M 0+0k 0st 0+0io 0pf+0w As expected, sfind is a bit faster than Sun find as sfind uses a modern algorithm that is needed to grant correct functionality even on deeply nested directories and non-seekable directories. rm -rf * deleting: -rf COPYING README blues classical country data folk jazz misc newage reggae rock soundtrack 27:22.836r 40.800u 626.440s 40% 0M 0+0k 0st 0+0io 0pf+0w The remove time is OK... star -xp bs=1m fs=32m < /tmp/freedb-complete-20051104.tar.bz2 star: WARNING: Archive is ''bzip2'' compressed, trying to use the -bz option. star: current ''./'' newer. star: 3094 blocks + 546816 bytes (total of 3244840960 bytes = 3168790.00k). 1:55:20.271r 900.780u 4102.840s 72% 0M 0+0k 0st 0+0io 0pf+0w A star extracrtion _with_ fsync(2) for every file is not slower.... ZFS results (done on exactly the same partition as the UFS results above): star -xp bs=1m fs=32m -no-fsync < /tmp/freedb-complete-20051104.tar.bz2 star: WARNING: Archive is ''bzip2'' compressed, trying to use the -bz option. star: current ''./'' newer. star: 3094 blocks + 546816 bytes (total of 3244840960 bytes = 3168790.00k). 3:57:08.741r 820.440u 1725.050s 17% 0M 0+0k 0st 0+0io 0pf+0w This time is roughly twice the UFs extract time. If ZFS would not become extremely slow on large directories, ZFS could be twice as fast as UFS on this test. find . > /dev/null 28:22.793r 7.740u 548.740s 32% 0M 0+0k 0st 0+0io 0pf+0w Sun find is more than 10x slower than on UFS. sfind . > /dev/null 1:12:35.216r 6.300u 631.950s 14% 0M 0+0k 0st 0+0io 0pf+0w Sfind is 33x slower than on UFS. sfind is extremely slow although it uses an optimizes modern tree walking algorithm. It seems that ZFS is optimized the wrong way (as it causes a program that is faster on all other known filesystems and POSIXLY more correct than Sun find to be slower than the ancient Sun find). Note that the basic idea in the sfind algorithm is used by all modern tree walking programs such as sfind, star and GNU tar. rm -rf * Well, ZFS is _REALLY_ slow here.... The rm test runs since 40 minutes and so far only 25% of the files have been removed. I thus expect a total rm time of 160 minutes on ZFS. This 6x slower than on UFS. Once I am ready with the remove, I will start the star extract test _with_ an fsync(2) call for every file - which is the default. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Roch Bourbonnais - Performance Engineering
2005-Nov-25 09:35 UTC
[zfs-discuss] ZFS extremely slow
Not sure if this has been mentioned yet. Is the working set bigger than available memory ? If so, UFS is being nice by keeping it''s clean page on some free list; I would suspect it never causes the system to run out of memory. No swapping / good perf. Then, this issue would fall into the ZFS memory management bucket we have to deal with anyway. The first 200000 files from the archive could mark the point where memory becomes overcommited. I''ll see if I can run a test... -r ____________________________________________________________________________________ Roch Bourbonnais Sun Microsystems, Icnc-Grenoble Senior Performance Analyst 180, Avenue De L''Europe, 38330, Montbonnot Saint Martin, France Performance & Availability Engineering http://icncweb.france/~rbourbon http://blogs.sun.com/roller/page/roch Roch.Bourbonnais at Sun.Com (+33).4.76.18.83.20
Roch Bourbonnais - Performance Engineering <Roch.Bourbonnais at Sun.COM> wrote:> > Not sure if this has been mentioned yet. > > Is the working set bigger than available memory ?If you describe what you understand by the working set, I may comment....>From the observation form me and others, it looks like:- ZFS may be copying my "gnode" idea from WOFS (my Worm Filesystem master thesis) and have file names tacked to the rest of the meta data. At least it would allow to understand why archaic code like nftw() (used by Sun find) performs better than modern code (used e.g. by sfind, star, gtar) that avoids limitations from nftw(). - ZFS seems to cache meta data and is fast as long as cached data could be used but becomes disproportionately slow when the cache gets a miss.> If so, UFS is being nice by keeping it''s clean page on some > free list; I would suspect it never causes the system to run > out of memory. No swapping / good perf.UFS seems to implement a "nice" caching strategy that does behave user friendly when there is a miss.> Then, this issue would fall into the ZFS memory management > bucket we have to deal with anyway. The first 200000 files > from the archive could mark the point where memory becomes > overcommited. I''ll see if I can run a test...Well, my test is not testing something that happens every day but it demonmstrates where ZFS needs further work before calling it mature. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Roch Bourbonnais - Performance Engineering
2005-Nov-25 15:27 UTC
[zfs-discuss] ZFS extremely slow
Joerg Schilling writes: > Roch Bourbonnais - Performance Engineering <Roch.Bourbonnais at Sun.COM> wrote: > > > > > Not sure if this has been mentioned yet. > > > > Is the working set bigger than available memory ? > > If you describe what you understand by the working set, I may comment.... > That would be both the tar.gz file size + the on-disk size of the decompressed data set. Just to help the investigation, what are the sizes vs available system memory ? -r