Hi, I''d like to compress quite well compressable (~4x) data on a file server using ZFS compression, and still get good transfer speed. The users are transferring several GB of data (typically, 8-10 GB). The host is a X4150 with 16 GB of RAM. Looking at ZFS layer described at http://www.opensolaris.org/os/community/zfs/source/ , it seems that the ZIL can play this role, as it is able to store the data uncompressed on a disk before passing the data to ZIO for the compression and the transfer to the final drives. Fortunately, I got two idle drives in the server that I can use as log devices for my data pool. Unfortunately, it has no effect on write performances, so I guess I was wrong in thinking that the ZIL can cache the data before compression. So is it possible to use the ZIL cache the data before compression? Did I miss an option somewhere? Thanks! Ga?tan -- Ga?tan Lehmann Biologie du D?veloppement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66 fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 203 bytes Desc: Ceci est une signature ?lectronique PGP URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090706/befabdf9/attachment.bin>
Darren J Moffat
2009-Jul-07 11:41 UTC
[zfs-discuss] Wrice cache for compressed file system
Ga?tan Lehmann wrote:> I''d like to compress quite well compressable (~4x) data on a file server > using ZFS compression, and still get good transfer speed. The users are > transferring several GB of data (typically, 8-10 GB). The host is a > X4150 with 16 GB of RAM.What protocol is being used for file transfer ? FTP, sftp, NFSv3, NFSv4, CIFS/SMB, iSCSI, FCoE, other ? It is really important to know that. Also really important is what the data access pattern is ? Is it read mostly or write mostly ? Are the files that are written accessed for read soon ? Is it sequential or random reads ? What is "good" by your definition ? What kind of networking do you have ? 1G or 10G ? what about the clients ? What OS are the clients running ?> Looking at ZFS layer described at > http://www.opensolaris.org/os/community/zfs/source/, it seems that the > ZIL can play this role, as it is able to store the data uncompressed on > a disk before passing the data to ZIO for the compression and the > transfer to the final drives.The ZIL is not a cache it is used to provide the synchronous requirements for calls such as fsync() and the requirements of NFS.> Fortunately, I got two idle drives in the server that I can use as log > devices for my data pool.To be useful as a log device it must be faster than the pool devices. You may see some benefit using a 15k RPM drive if your main pool devices are 7200rpm. > Unfortunately, it has no effect on write> performances, so I guess I was wrong in thinking that the ZIL can cache > the data before compression.However it very much depends on your workload (which you didn''t describe). You may need to use a fast (and expensive) SSD or you may not have a workload that is even helped by the ZIL.> So is it possible to use the ZIL cache the data before compression? Did > I miss an option somewhere?The fact that the ZIL doesn''t compress data is not a feature it is a very low level implementation detail. -- Darren J Moffat
Hi Darren, Le 7 juil. 09 ? 13:41, Darren J Moffat a ?crit :> Ga?tan Lehmann wrote: >> I''d like to compress quite well compressable (~4x) data on a file >> server using ZFS compression, and still get good transfer speed. >> The users are transferring several GB of data (typically, 8-10 GB). >> The host is a X4150 with 16 GB of RAM. > > What protocol is being used for file transfer ? > > FTP, sftp, NFSv3, NFSv4, CIFS/SMB, iSCSI, FCoE, other ?There will be two kinds of transfer protocol, once in production - CIFS and one specific to the application. But for a quick test, the test was made with scp.> > It is really important to know that. > > Also really important is what the data access pattern is ? Is it > read mostly or write mostly ? Are the files that are written > accessed for read soon ? Is it sequential or random reads ?It is read mostly. Files are usually rarely accessed, but are likely to be accessed several times when they begin to be accessed. This is sequential read. The data set is transfered on the server, and the files are not modified after that (or very rarely). The whole data set is transfered at once.> > What is "good" by your definition ?The data set is transfered in a little less than 3 minutes without compression - that''s good! With compression, the data set is transfered in 15 minutes. I was hopping that a disk write cache can keep the transfer speed close to the 3 minutes obtained without compression.> > What kind of networking do you have ? 1G or 10G ? what about the > clients ?1G on both sides.> > What OS are the clients running ?windows, mac and linux. The test was made with a linux client.> >> Looking at ZFS layer described at http://www.opensolaris.org/os/community/zfs/source/ >> , it seems that the ZIL can play this role, as it is able to store >> the data uncompressed on a disk before passing the data to ZIO for >> the compression and the transfer to the final drives. > > The ZIL is not a cache it is used to provide the synchronous > requirements for calls such as fsync() and the requirements of NFS. > >> Fortunately, I got two idle drives in the server that I can use as >> log devices for my data pool. > > To be useful as a log device it must be faster than the pool > devices. You may see some benefit using a 15k RPM drive if your main > pool devices are 7200rpm. > > > Unfortunately, it has no effect on write >> performances, so I guess I was wrong in thinking that the ZIL can >> cache the data before compression. > > However it very much depends on your workload (which you didn''t > describe). You may need to use a fast (and expensive) SSD or you > may not have a workload that is even helped by the ZIL. > >> So is it possible to use the ZIL cache the data before compression? >> Did I miss an option somewhere? > > The fact that the ZIL doesn''t compress data is not a feature it is a > very low level implementation detail. >My workload is very simple: the user copy approximately 10GB of data on the server, and then only read it from time to time. During the copy on the server, the transfer is cpu bounded, and there is a lot of cpu time available when there is no copy to the server. Using a disk to store the uncompressed data, as I guess it is done by the memory cache, may have helped. I thought the ZIL may have played this role. I understand that this is very specific to a compressed filesystem, but this caching behavior would greatly enhance this kind of workload. For now, I guess I have to tune the compression rate and/or increase the amount of RAM so that ZFS can cache more data. Regards, Ga?tan -- Ga?tan Lehmann Biologie du D?veloppement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66 fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 203 bytes Desc: Ceci est une signature ?lectronique PGP URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090707/aca7123e/attachment.bin>
Darren J Moffat
2009-Jul-07 13:21 UTC
[zfs-discuss] Wrice cache for compressed file system
Ga?tan Lehmann wrote:> There will be two kinds of transfer protocol, once in production - CIFS > and one specific to the application. > But for a quick test, the test was made with scp.CIFS and scp are very different protocols with very different performance characteristics.>> Also really important is what the data access pattern is ? Is it read >> mostly or write mostly ? Are the files that are written accessed for >> read soon ? Is it sequential or random reads ? > > It is read mostly. Files are usually rarely accessed, but are likely to > be accessed several times when they begin to be accessed. This is > sequential read. > The data set is transfered on the server, and the files are not modified > after that (or very rarely). The whole data set is transfered at once. > >> >> What is "good" by your definition ? > > The data set is transfered in a little less than 3 minutes without > compression - that''s good! > With compression, the data set is transfered in 15 minutes. I was > hopping that a disk write cache can keep the transfer speed close to the > 3 minutes obtained without compression.That is a big difference and without seeing data my initial thought is that the system appears to be CPU bound, particularly if the only difference really was a compressed ZFS dataset versus an uncompressed one. Is "transfer" in this case a read from the server or a write to it ?> My workload is very simple: the user copy approximately 10GB of data on > the server, and then only read it from time to time. > During the copy on the server, the transfer is cpu bounded, and there is > a lot of cpu time available when there is no copy to the server.Are you talking here about the server''s CPU or the clients ? > Using a> disk to store the uncompressed data, as I guess it is done by the memory > cache, may have helped. > > I thought the ZIL may have played this role.I think you are maybe confusing the ZIL and the L2ARC. What compression algorithm are you using ? The default "on" value of lzjb or are you doing something like gzip-9 ? -- Darren J Moffat
Le 7 juil. 09 ? 15:21, Darren J Moffat a ?crit :> Ga?tan Lehmann wrote: >> There will be two kinds of transfer protocol, once in production - >> CIFS and one specific to the application. >> But for a quick test, the test was made with scp. > > CIFS and scp are very different protocols with very different > performance characteristics. > >>> Also really important is what the data access pattern is ? Is it >>> read mostly or write mostly ? Are the files that are written >>> accessed for read soon ? Is it sequential or random reads ? >> It is read mostly. Files are usually rarely accessed, but are >> likely to be accessed several times when they begin to be accessed. >> This is sequential read. >> The data set is transfered on the server, and the files are not >> modified after that (or very rarely). The whole data set is >> transfered at once. >>> >>> What is "good" by your definition ? >> The data set is transfered in a little less than 3 minutes without >> compression - that''s good! >> With compression, the data set is transfered in 15 minutes. I was >> hopping that a disk write cache can keep the transfer speed close >> to the 3 minutes obtained without compression. > > That is a big difference and without seeing data my initial thought > is that the system appears to be CPU bound, particularly if the only > difference really was a compressed ZFS dataset versus an > uncompressed one. > > Is "transfer" in this case a read from the server or a write to it ?It is a write to the server. The server is cpu bound, because of the compression.> >> My workload is very simple: the user copy approximately 10GB of >> data on the server, and then only read it from time to time. >> During the copy on the server, the transfer is cpu bounded, and >> there is a lot of cpu time available when there is no copy to the >> server. > > Are you talking here about the server''s CPU or the clients ?I''m talking about the server''s cpu. Client is not cpu bounded by scp.> > > Using a >> disk to store the uncompressed data, as I guess it is done by the >> memory cache, may have helped. >> I thought the ZIL may have played this role. > > I think you are maybe confusing the ZIL and the L2ARC.I think L2ARC is for reading data - I''d like an L2ARC for writing data.> > What compression algorithm are you using ? The default "on" value > of lzjb or are you doing something like gzip-9 ?gzip-6. There is no speed problem with lzjb, but also not the same compression ratio :-) Ga?tan -- Ga?tan Lehmann Biologie du D?veloppement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66 fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 203 bytes Desc: Ceci est une signature ?lectronique PGP URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090707/f0f9f19b/attachment.bin>
Darren J Moffat
2009-Jul-07 13:54 UTC
[zfs-discuss] Wrice cache for compressed file system
Ga?tan Lehmann wrote:> > Le 7 juil. 09 ? 15:21, Darren J Moffat a ?crit : > >> Ga?tan Lehmann wrote: >>> There will be two kinds of transfer protocol, once in production - >>> CIFS and one specific to the application. >>> But for a quick test, the test was made with scp. >> >> CIFS and scp are very different protocols with very different >> performance characteristics. >> >>>> Also really important is what the data access pattern is ? Is it >>>> read mostly or write mostly ? Are the files that are written >>>> accessed for read soon ? Is it sequential or random reads ? >>> It is read mostly. Files are usually rarely accessed, but are likely >>> to be accessed several times when they begin to be accessed. This is >>> sequential read. >>> The data set is transfered on the server, and the files are not >>> modified after that (or very rarely). The whole data set is >>> transfered at once. >>>> >>>> What is "good" by your definition ? >>> The data set is transfered in a little less than 3 minutes without >>> compression - that''s good! >>> With compression, the data set is transfered in 15 minutes. I was >>> hopping that a disk write cache can keep the transfer speed close to >>> the 3 minutes obtained without compression. >> >> That is a big difference and without seeing data my initial thought is >> that the system appears to be CPU bound, particularly if the only >> difference really was a compressed ZFS dataset versus an uncompressed >> one. >> >> Is "transfer" in this case a read from the server or a write to it ? > > It is a write to the server. The server is cpu bound, because of the > compression. > >> >>> My workload is very simple: the user copy approximately 10GB of data >>> on the server, and then only read it from time to time. >>> During the copy on the server, the transfer is cpu bounded, and there >>> is a lot of cpu time available when there is no copy to the server. >> >> Are you talking here about the server''s CPU or the clients ? > > I''m talking about the server''s cpu. Client is not cpu bounded by scp. > >> >> > Using a >>> disk to store the uncompressed data, as I guess it is done by the >>> memory cache, may have helped. >>> I thought the ZIL may have played this role. >> >> I think you are maybe confusing the ZIL and the L2ARC. > > I think L2ARC is for reading data - I''d like an L2ARC for writing data.Correct the L2ARC is for reads. I think actually what you want is not a cache at all but not to be CPU bound by gzip-6.>> What compression algorithm are you using ? The default "on" value of >> lzjb or are you doing something like gzip-9 ? > > > gzip-6. There is no speed problem with lzjb, but also not the same > compression ratio :-)What build of OpenSolaris are you running ? The fix for "6812655 need larger kmem caches for newer workloads" might help but you need to be running build 114 or higher which means you need to be using the pkg.opensolaris.org/dev repository not the /release one. It could also be "6586537 async zio taskqs can block out userland commands" which is fixed in 115 and was the "real" fix. This is guess work though since I haven''t seen perf data from your particular system. -- Darren J Moffat
Le 7 juil. 09 ? 15:54, Darren J Moffat a ?crit :>>> What compression algorithm are you using ? The default "on" value >>> of lzjb or are you doing something like gzip-9 ? >> gzip-6. There is no speed problem with lzjb, but also not the same >> compression ratio :-) > > What build of OpenSolaris are you running ? > > The fix for "6812655 need larger kmem caches for newer workloads" > might help but you need to be running build 114 or higher which > means you need to be using the pkg.opensolaris.org/dev repository > not the /release one. > > It could also be "6586537 async zio taskqs can block out userland > commands" which is fixed in 115 and was the "real" fix. > > This is guess work though since I haven''t seen perf data from your > particular system.I think it''s 2008.11. I won''t be able to access that host until next week. I''ll update the OS and post the result on the list. Thanks, Ga?tan -- Ga?tan Lehmann Biologie du D?veloppement et de la Reproduction INRA de Jouy-en-Josas (France) tel: +33 1 34 65 29 66 fax: 01 34 65 29 09 http://voxel.jouy.inra.fr http://www.itk.org http://www.mandriva.org http://www.bepo.fr -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 203 bytes Desc: Ceci est une signature ?lectronique PGP URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090707/85e716d5/attachment.bin>