Brandorr
2007-Aug-21 04:01 UTC
[zfs-discuss] Is ZFS efficient for large collections of small files?
Is ZFS efficient at handling huge populations of tiny-to-small files - for example, 20 million TIFF images in a collection, each between 5 and 500k in size? I am asking because I could have sworn that I read somewhere that it isn''t, but I can''t find the reference. Thanks, Brian -- - Brian Gupta http://opensolaris.org/os/project/nycosug/
Matthew Ahrens
2007-Aug-21 04:24 UTC
[zfs-discuss] Is ZFS efficient for large collections of small files?
Brandorr wrote:> Is ZFS efficient at handling huge populations of tiny-to-small files - > for example, 20 million TIFF images in a collection, each between 5 > and 500k in size?Do you mean efficient in terms of space used? If so, then in general it is quite efficient. Eg, files < 128k space is rounded up to only a multiple of 512 bytes. Around 1k of metadata is consumed per file. There are however a few cases where it will not be optimal. Eg, 129k files will use up 256k of space. However, you can work around this problem by turning on compression.> I am asking because I could have sworn that I read somewhere that it > isn''t, but I can''t find the reference.If you find it, let us know. --matt
Ralf Ramge
2007-Aug-21 11:37 UTC
[zfs-discuss] Is ZFS efficient for large collections of small files?
Brandorr wrote:> Is ZFS efficient at handling huge populations of tiny-to-small files - > for example, 20 million TIFF images in a collection, each between 5 > and 500k in size? > > I am asking because I could have sworn that I read somewhere that it > isn''t, but I can''t find the reference. >If you''re worried about the I/O throughput, you should avoid RAIDZ1/2 configurations. random read performance will be desastrous if you do; I''ve seen random reads ratios with less than 1 MB/s on a X4500 with 40 dedicated disks for data storage. If you don''t have to worry about disk space, use mirrors; I got my best results during my extensive X4500 benchmarking sessions, when I mirrored single slices instead of complete disks (resulting in 40 2-way-mirrors on 40 physical discs, mirroring c0t0d0s0->c0t1d0s1 and c0t1d0s0->c0t0d0s1, and so on). If you''re worried about disk space, you should consider striping several instances of RAIDZ1 arrays, each one consisting of three discs or slices. sequential access will go down the cliff, but random reads will be boosted. You should also adjust the recordsize. Try to measure the average I/O transaction size. There''s a good chance that your I/O performance will be best if you set your recordsize to a smaller value. For instance, if your average file size is 12 KB, try using 8K or even 4K recordsize, stay away from 16K or higher. -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 ralf.ramge at webde.de - http://web.de/ 1&1 Internet AG Brauerstra?e 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren
Mario Goebbels
2007-Aug-21 12:39 UTC
[zfs-discuss] Is ZFS efficient for large collections of small files?
> There are however a few cases where it will not be optimal. Eg, 129k files > will use up 256k of space. However, you can work around this problem by > turning on compression.Doesn''t ZFS pack the last block into one of a multiple of 512? If not, it''s a surprise that there isn''t a pseudo-compression mode available to deal with that. -mg -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 648 bytes Desc: OpenPGP digital signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070821/b5bf1c2d/attachment.bin>
Ćukasz K
2007-Aug-21 13:26 UTC
[zfs-discuss] Odp: Is ZFS efficient for large collections of small files?
> Is ZFS efficient at handling huge populations of tiny-to-small files - > for example, 20 million TIFF images in a collection, each between 5 > and 500k in size? > > I am asking because I could have sworn that I read somewhere that it > isn''t, but I can''t find the reference.It depends, what type of I/O you will do. If only reads, there is no problem. Writting small files ( and removing ) will fragmentate pool and it will be a huge problem. You can set recordsize to 32k ( or 16k ) and it will help for some time. Lukas ---------------------------------------------------- CLUBNETIC SUMMER PARTY 2007 House, club, electro. Najlepsza kompilacja na letnie imprezy! http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fclubnetic.html&sid=1266
Eric Schrock
2007-Aug-21 15:40 UTC
[zfs-discuss] Is ZFS efficient for large collections of small files?
On Tue, Aug 21, 2007 at 02:39:00PM +0200, Mario Goebbels wrote:> > There are however a few cases where it will not be optimal. Eg, 129k files > > will use up 256k of space. However, you can work around this problem by > > turning on compression. > > Doesn''t ZFS pack the last block into one of a multiple of 512? > > If not, it''s a surprise that there isn''t a pseudo-compression mode > available to deal with that. > > -mg >This would certainly be nice. See: 6279263 We should have a "zeroes" compression algorithm - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Brandorr
2007-Aug-21 15:52 UTC
[zfs-discuss] Is ZFS efficient for large collections of small files?
On 8/21/07, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote:> Brandorr wrote: > > Is ZFS efficient at handling huge populations of tiny-to-small files - > > for example, 20 million TIFF images in a collection, each between 5 > > and 500k in size? > > Do you mean efficient in terms of space used? If so, then in general it is > quite efficient. Eg, files < 128k space is rounded up to only a multiple of > 512 bytes. Around 1k of metadata is consumed per file. > > There are however a few cases where it will not be optimal. Eg, 129k files > will use up 256k of space. However, you can work around this problem by > turning on compression.You answer part of my question perfectly. (Regarding space utilization.) The other was related to performance, but someone else has answered that.> > I am asking because I could have sworn that I read somewhere that it > > isn''t, but I can''t find the reference. > > If you find it, let us know.It turns out, what I read was related to the fact that RAID-Z was a suboptimal volume layout for the "large amounts of small files" use case. (I you still want, I can probably find it, as I think it was an opensolaris.org discussion thread.) (Richard reminded me of this.) One issue, the person looking at doing this has 8 x 750GB drives, so some sort of parity based raid striping will be required. Being that this will be suboptimal for any file system, I think the performance impact can be dealt with. I''d like to thank everyone for their responses. Based on the discussion we''ve had and my reviews of the XFS and ReiserFS file systems, I feel very confident recommending ZFS as a superior alternative. (ZFS data integrity, scalability and compression are the key winners here.). (Now I just need to research compatibility between Linux and Solaris NFS and whether his 3Ware card works with OpenSolaris. Thanks, Brian P.S. - Is there a ZFS FAQ somewhere? -- - Brian Gupta http://opensolaris.org/os/project/nycosug/
Cindy.Swearingen at Sun.COM
2007-Aug-21 15:57 UTC
[zfs-discuss] Is ZFS efficient for large collections of small files?
The OpenSolaris ZFS FAQ is here: http://www.opensolaris.org/os/community/zfs/faq Other resources are listed here: http://www.opensolaris.org/os/community/zfs/links/ Cindy Brandorr wrote:> P.S. - Is there a ZFS FAQ somewhere? >
Matthew Ahrens
2007-Aug-21 19:28 UTC
[zfs-discuss] Is ZFS efficient for large collections of small files?
Mario Goebbels wrote:>> There are however a few cases where it will not be optimal. Eg, 129k files >> will use up 256k of space. However, you can work around this problem by >> turning on compression. > > Doesn''t ZFS pack the last block into one of a multiple of 512?Unfortunately, not yet. See: 5003563 use smaller "tail block" for last block of object --matt
Roch - PAE
2007-Aug-22 08:49 UTC
[zfs-discuss] Is ZFS efficient for large collections of small files?
Brandorr wrote: > Is ZFS efficient at handling huge populations of tiny-to-small files - > for example, 20 million TIFF images in a collection, each between 5 > and 500k in size? > > I am asking because I could have sworn that I read somewhere that it > isn''t, but I can''t find the reference. > If you''re worried about the I/O throughput, you should avoid RAIDZ1/2 configurations. random read performance will be desastrous if you do; A raid-z group can do one random read per I/O latency. So for 8 disks (each capable of 200 IOPS) in a zpool split into 2 raid-z groups should be able to server 400 files per second. If you need to serve more files, then you need more disks or need to use mirroring. With mirroring, I''d expect to serve 1600 files (8*200). This model only applies to random reading, not sequential access, not to any types of write loads. For small file creation ZFS can be extremely efficient in that it can create more than 1 file per I/O. It should also approach disk streaming performance for write loads. I''ve seen random reads ratios with less than 1 MB/s on a X4500 with 40 dedicated disks for data storage. It would be nice to see if the above model matches your data. So if you have all 40 disks in a single raid-z group (an anti best practice) I''d expect <200 files served per second and if the files were of 5K avg size then I''d expect that 1MB/sec. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide, If you don''t have to worry about disk space, use mirrors; right on ! I got my best results during my extensive X4500 benchmarking sessions, when I mirrored single slices instead of complete disks (resulting in 40 2-way-mirrors on 40 physical discs, mirroring c0t0d0s0->c0t1d0s1 and c0t1d0s0->c0t0d0s1, and so on). If you''re worried about disk space, you should consider striping several instances of RAIDZ1 arrays, each one consisting of three discs or slices. sequential access will go down the cliff, but random reads will be boosted. Writes should be good if not great, no matter what the workload is. I''m interested in data that shows otherwise. You should also adjust the recordsize. For small files I certainly would not. Small files are stored as single record when they are smaller than the recordsize. Single record is good in my book. Not sure when one would want otherwise for small files. Try to measure the average I/O transaction size. There''s a good chance that your I/O performance will be best if you set your recordsize to a smaller value. For instance, if your average file size is 12 KB, try using 8K or even 4K recordsize, stay away from 16K or higher. Tuning the record size is currently only recommended for databases (large file) with fixed record access. Again it''s interesting input if tuning the recordsize helped another type of workload. -r -- Ralf Ramge Senior Solaris Administrator, SCNA, SCSA Tel. +49-721-91374-3963 ralf.ramge at webde.de - http://web.de/ 1&1 Internet AG Brauerstra?e 48 76135 Karlsruhe Amtsgericht Montabaur HRB 6484 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss Aufsichtsratsvorsitzender: Michael Scheeren _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Roch - PAE
2007-Aug-22 09:04 UTC
[zfs-discuss] Odp: Is ZFS efficient for large collections of small files?
?ukasz K writes: > > Is ZFS efficient at handling huge populations of tiny-to-small files - > > for example, 20 million TIFF images in a collection, each between 5 > > and 500k in size? > > > > I am asking because I could have sworn that I read somewhere that it > > isn''t, but I can''t find the reference. > > It depends, what type of I/O you will do. If only reads, there is no > problem. Writting small files ( and removing ) will fragmentate pool > and it will be a huge problem. > You can set recordsize to 32k ( or 16k ) and it will help for some time. > Comparing recordsize of 16K with 128K. Files in the range of [0,16K] : no difference. Files in the range of [16K,128K] : more efficient to use 128K Files in the range of [128K,500K] : more efficient to use 16K In the [16K,128K] range the actual filesize is rounded up to 16K with 16K recordsize and to the nearest 512B boundary with 128K recordsize. This will be fairly catastrophic for files slightly above 16K (rounded up to 32K vs 16K+512B). In the [128K, 500K] range we''re hurt by this 5003563 use smaller "tail block" for last block of object until it is fixed, then yes , files stored using 16K records are rounded up more tightly. metadata probably east parts of the gains. -r > Lukas > > ---------------------------------------------------- > CLUBNETIC SUMMER PARTY 2007 > House, club, electro. Najlepsza kompilacja na letnie imprezy! > http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fclubnetic.html&sid=1266 > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Roch - PAE
2007-Aug-22 09:13 UTC
[zfs-discuss] Odp: Is ZFS efficient for large collections of small files?
?ukasz K writes: > > Is ZFS efficient at handling huge populations of tiny-to-small files - > > for example, 20 million TIFF images in a collection, each between 5 > > and 500k in size? > > > > I am asking because I could have sworn that I read somewhere that it > > isn''t, but I can''t find the reference. > > It depends, what type of I/O you will do. If only reads, there is no > problem. Writting small files ( and removing ) will fragmentate pool > and it will be a huge problem. > You can set recordsize to 32k ( or 16k ) and it will help for some time. > Comparing recordsize of 16K with 128K. Files in the range of [0,16K] : no difference. Files in the range of [16K,128K] : more efficient to use 128K Files in the range of [128K,500K] : more efficient to use 16K In the [16K,128K] range the actual filesize is rounded up to 16K with 16K recordsize and to the nearest 512B boundary with 128K recordsize. This will be fairly catastrophic for files slightly above 16K (rounded up to 32K vs 16K+512B). In the [128K, 500K] range we''re hurt by this 5003563 use smaller "tail block" for last block of object until it is fixed, then yes , files stored using 16K records are rounded up more tightly. metadata probably east parts of the gains. -r > Lukas > > ---------------------------------------------------- > CLUBNETIC SUMMER PARTY 2007 > House, club, electro. Najlepsza kompilacja na letnie imprezy! > http://klik.wp.pl/?adr=http%3A%2F%2Fadv.reklama.wp.pl%2Fas%2Fclubnetic.html&sid=1266 > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Robert Milkowski
2007-Aug-23 07:30 UTC
[zfs-discuss] Odp: Is ZFS efficient for large collections of small files?
Hello Roch, Wednesday, August 22, 2007, 10:13:10 AM, you wrote: RP> ?ukasz K writes: >> > Is ZFS efficient at handling huge populations of tiny-to-small files - >> > for example, 20 million TIFF images in a collection, each between 5 >> > and 500k in size? >> > >> > I am asking because I could have sworn that I read somewhere that it >> > isn''t, but I can''t find the reference. >> >> It depends, what type of I/O you will do. If only reads, there is no >> problem. Writting small files ( and removing ) will fragmentate pool >> and it will be a huge problem. >> You can set recordsize to 32k ( or 16k ) and it will help for some time. >> RP> Comparing recordsize of 16K with 128K. RP> Files in the range of [0,16K] : no difference. RP> Files in the range of [16K,128K] : more efficient to use 128K RP> Files in the range of [128K,500K] : more efficient to use 16K RP> In the [16K,128K] range the actual filesize is rounded up to RP> 16K with 16K recordsize and to the nearest 512B boundary RP> with 128K recordsize. This will be fairly catastrophic for RP> files slightly above 16K (rounded up to 32K vs 16K+512B). RP> In the [128K, 500K] range we''re hurt by this RP> 5003563 use smaller "tail block" for last block of object RP> until it is fixed, then yes , files stored using 16K RP> records are rounded up more tightly. metadata probably RP> east parts of the gains. Roch, I guess Lukasz was talking about some problems we''re seeing here which are partly caused by utilizing all 128KB slabs, so forcing file system to 16KB helps here (for CPU) - workaround. Sure, we''re talking about lots and lots of files, really small. Perhaps someone could work with Lukasz and investigate it more closely. Lukasz posted so more detailed info not so long ago - unfortunately there was no feedback. -- Best regards, Robert Milkowski mailto:rmilkowski at task.gda.pl http://milek.blogspot.com