thr3ads.net - zfs discuss - [zfs-discuss] Wrice cache for compressed file system [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Gaëtan Lehmann

2009-Jul-06 17:02 UTC

[zfs-discuss] Wrice cache for compressed file system

Hi,

I''d like to compress quite well compressable (~4x) data on a file  
server using ZFS compression, and still get good transfer speed. The  
users are transferring several GB of data (typically, 8-10 GB). The  
host is a X4150 with 16 GB of RAM.
Looking at ZFS layer described at
http://www.opensolaris.org/os/community/zfs/source/
, it seems that the ZIL can play this role, as it is able to store the  
data uncompressed on a disk before passing the data to ZIO for the  
compression and the transfer to the final drives.

Fortunately, I got two idle drives in the server that I can use as log  
devices for my data pool. Unfortunately, it has no effect on write  
performances, so I guess I was wrong in thinking that the ZIL can  
cache the data before compression.

So is it possible to use the ZIL cache the data before compression?  
Did I miss an option somewhere?

Thanks!

Ga?tan

-- 
Ga?tan Lehmann
Biologie du D?veloppement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66    fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: Ceci est une signature ?lectronique PGP
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090706/befabdf9/attachment.bin>

Darren J Moffat

2009-Jul-07 11:41 UTC

head link

[zfs-discuss] Wrice cache for compressed file system

Ga?tan Lehmann wrote:> I''d like to compress quite well compressable (~4x) data on a file
server
> using ZFS compression, and still get good transfer speed. The users are 
> transferring several GB of data (typically, 8-10 GB). The host is a 
> X4150 with 16 GB of RAM.
What protocol is being used for file transfer ?

FTP, sftp, NFSv3, NFSv4, CIFS/SMB, iSCSI, FCoE, other ?

It is really important to know that.

Also really important is what the data access pattern is ?  Is it read 
mostly or write mostly ?  Are the files that are written accessed for 
read soon ?  Is it sequential or random reads ?

What is "good" by your definition ?

What kind of networking do you have ? 1G or 10G ? what about the clients ?

What OS are the clients running ?
> Looking at ZFS layer described at 
> http://www.opensolaris.org/os/community/zfs/source/, it seems that the 
> ZIL can play this role, as it is able to store the data uncompressed on 
> a disk before passing the data to ZIO for the compression and the 
> transfer to the final drives.
The ZIL is not a cache it is used to provide the synchronous 
requirements for calls such as fsync() and the requirements of NFS.
> Fortunately, I got two idle drives in the server that I can use as log 
> devices for my data pool. 
To be useful as a log device it must be faster than the pool devices. 
You may see some benefit using a 15k RPM drive if your main pool devices 
are 7200rpm.

 > Unfortunately, it has no effect on write> performances, so I guess I was wrong in thinking that the ZIL can cache 
> the data before compression.
However it very much depends on your workload (which you didn''t 
describe).  You may need to use a fast (and expensive) SSD or you may 
not have a workload that is even helped by the ZIL.
> So is it possible to use the ZIL cache the data before compression? Did 
> I miss an option somewhere?
The fact that the ZIL doesn''t compress data is not a feature it is a 
very low level implementation detail.

-- 
Darren J Moffat

Gaëtan Lehmann

2009-Jul-07 13:05 UTC

head link

[zfs-discuss] Wrice cache for compressed file system

Hi Darren,

Le 7 juil. 09 ? 13:41, Darren J Moffat a ?crit :
> Ga?tan Lehmann wrote:
>> I''d like to compress quite well compressable (~4x) data on a
file
>> server using ZFS compression, and still get good transfer speed.  
>> The users are transferring several GB of data (typically, 8-10 GB).  
>> The host is a X4150 with 16 GB of RAM.
>
> What protocol is being used for file transfer ?
>
> FTP, sftp, NFSv3, NFSv4, CIFS/SMB, iSCSI, FCoE, other ?
There will be two kinds of transfer protocol, once in production -  
CIFS and one specific to the application.
But for a quick test, the test was made with scp.
>
> It is really important to know that.
>
> Also really important is what the data access pattern is ?  Is it  
> read mostly or write mostly ?  Are the files that are written  
> accessed for read soon ?  Is it sequential or random reads ?
It is read mostly. Files are usually rarely accessed, but are likely  
to be accessed several times when they begin to be accessed. This is  
sequential read.
The data set is transfered on the server, and the files are not  
modified after that (or very rarely). The whole data set is transfered  
at once.
>
> What is "good" by your definition ?
The data set is transfered in a little less than 3 minutes without  
compression - that''s good!
With compression, the data set is transfered in 15 minutes. I was  
hopping that a disk write cache can keep the transfer speed close to  
the 3 minutes obtained without compression.
>
> What kind of networking do you have ? 1G or 10G ? what about the  
> clients ?
1G on both sides.
>
> What OS are the clients running ?
windows, mac and linux.
The test was made with a linux client.
>
>> Looking at ZFS layer described at
http://www.opensolaris.org/os/community/zfs/source/
>> , it seems that the ZIL can play this role, as it is able to store  
>> the data uncompressed on a disk before passing the data to ZIO for  
>> the compression and the transfer to the final drives.
>
> The ZIL is not a cache it is used to provide the synchronous  
> requirements for calls such as fsync() and the requirements of NFS.
>
>> Fortunately, I got two idle drives in the server that I can use as  
>> log devices for my data pool.
>
> To be useful as a log device it must be faster than the pool  
> devices. You may see some benefit using a 15k RPM drive if your main  
> pool devices are 7200rpm.
>
> > Unfortunately, it has no effect on write
>> performances, so I guess I was wrong in thinking that the ZIL can  
>> cache the data before compression.
>
> However it very much depends on your workload (which you didn''t  
> describe).  You may need to use a fast (and expensive) SSD or you  
> may not have a workload that is even helped by the ZIL.
>
>> So is it possible to use the ZIL cache the data before compression?  
>> Did I miss an option somewhere?
>
> The fact that the ZIL doesn''t compress data is not a feature it is
a
> very low level implementation detail.
>
My workload is very simple: the user copy approximately 10GB of data  
on the server, and then only read it from time to time.
During the copy on the server, the transfer is cpu bounded, and there  
is a lot of cpu time available when there is no copy to the server.  
Using a disk to store the uncompressed data, as I guess it is done by  
the memory cache, may have helped.

I thought the ZIL may have played this role.

I understand that this is very specific to a compressed filesystem,  
but this caching behavior would greatly enhance this kind of workload.  
For now, I guess I have to tune the compression rate and/or increase  
the amount of RAM so that ZFS can cache more data.

Regards,

Ga?tan

-- 
Ga?tan Lehmann
Biologie du D?veloppement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66    fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: Ceci est une signature ?lectronique PGP
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090707/aca7123e/attachment.bin>

Darren J Moffat

2009-Jul-07 13:21 UTC

head link

[zfs-discuss] Wrice cache for compressed file system

Ga?tan Lehmann wrote:> There will be two kinds of transfer protocol, once in production - CIFS 
> and one specific to the application.
> But for a quick test, the test was made with scp.
CIFS and scp are very different protocols with very different 
performance characteristics.
>> Also really important is what the data access pattern is ?  Is it read 
>> mostly or write mostly ?  Are the files that are written accessed for 
>> read soon ?  Is it sequential or random reads ?
> 
> It is read mostly. Files are usually rarely accessed, but are likely to 
> be accessed several times when they begin to be accessed. This is 
> sequential read.
> The data set is transfered on the server, and the files are not modified 
> after that (or very rarely). The whole data set is transfered at once.
> 
>>
>> What is "good" by your definition ?
> 
> The data set is transfered in a little less than 3 minutes without 
> compression - that''s good!
> With compression, the data set is transfered in 15 minutes. I was 
> hopping that a disk write cache can keep the transfer speed close to the 
> 3 minutes obtained without compression.
That is a big difference and without seeing data my initial thought is 
that the system appears to be CPU bound, particularly if the only 
difference really was a compressed ZFS dataset versus an uncompressed one.

Is "transfer" in this case a read from the server or a write to it ?
> My workload is very simple: the user copy approximately 10GB of data on 
> the server, and then only read it from time to time.
> During the copy on the server, the transfer is cpu bounded, and there is 
> a lot of cpu time available when there is no copy to the server. 
Are you talking here about the server''s CPU or the clients ?

 >			Using a> disk to store the uncompressed data, as I guess it is done by the memory 
> cache, may have helped.
> 
> I thought the ZIL may have played this role.
I think you are maybe confusing the ZIL and the L2ARC.

What compression algorithm are you using ?  The default "on" value of 
lzjb or are you doing something like gzip-9 ?

-- 
Darren J Moffat

Gaëtan Lehmann

2009-Jul-07 13:36 UTC

head link

[zfs-discuss] Wrice cache for compressed file system

Le 7 juil. 09 ? 15:21, Darren J Moffat a ?crit :
> Ga?tan Lehmann wrote:
>> There will be two kinds of transfer protocol, once in production -  
>> CIFS and one specific to the application.
>> But for a quick test, the test was made with scp.
>
> CIFS and scp are very different protocols with very different  
> performance characteristics.
>
>>> Also really important is what the data access pattern is ?  Is it  
>>> read mostly or write mostly ?  Are the files that are written  
>>> accessed for read soon ?  Is it sequential or random reads ?
>> It is read mostly. Files are usually rarely accessed, but are  
>> likely to be accessed several times when they begin to be accessed.  
>> This is sequential read.
>> The data set is transfered on the server, and the files are not  
>> modified after that (or very rarely). The whole data set is  
>> transfered at once.
>>>
>>> What is "good" by your definition ?
>> The data set is transfered in a little less than 3 minutes without  
>> compression - that''s good!
>> With compression, the data set is transfered in 15 minutes. I was  
>> hopping that a disk write cache can keep the transfer speed close  
>> to the 3 minutes obtained without compression.
>
> That is a big difference and without seeing data my initial thought  
> is that the system appears to be CPU bound, particularly if the only  
> difference really was a compressed ZFS dataset versus an  
> uncompressed one.
>
> Is "transfer" in this case a read from the server or a write to
it ?
It is a write to the server. The server is cpu bound, because of the  
compression.
>
>> My workload is very simple: the user copy approximately 10GB of  
>> data on the server, and then only read it from time to time.
>> During the copy on the server, the transfer is cpu bounded, and  
>> there is a lot of cpu time available when there is no copy to the  
>> server.
>
> Are you talking here about the server''s CPU or the clients ?
I''m talking about the server''s cpu. Client is not cpu bounded
by scp.
>
> >			Using a
>> disk to store the uncompressed data, as I guess it is done by the  
>> memory cache, may have helped.
>> I thought the ZIL may have played this role.
>
> I think you are maybe confusing the ZIL and the L2ARC.
I think L2ARC is for reading data - I''d like an L2ARC for writing data.
>
> What compression algorithm are you using ?  The default "on"
value
> of lzjb or are you doing something like gzip-9 ?

gzip-6. There is no speed problem with lzjb, but also not the same  
compression ratio :-)

Ga?tan


-- 
Ga?tan Lehmann
Biologie du D?veloppement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66    fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: Ceci est une signature ?lectronique PGP
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090707/f0f9f19b/attachment.bin>

Darren J Moffat

2009-Jul-07 13:54 UTC

head link

[zfs-discuss] Wrice cache for compressed file system

Ga?tan Lehmann wrote:> 
> Le 7 juil. 09 ? 15:21, Darren J Moffat a ?crit :
> 
>> Ga?tan Lehmann wrote:
>>> There will be two kinds of transfer protocol, once in production - 
>>> CIFS and one specific to the application.
>>> But for a quick test, the test was made with scp.
>>
>> CIFS and scp are very different protocols with very different 
>> performance characteristics.
>>
>>>> Also really important is what the data access pattern is ?  Is
it
>>>> read mostly or write mostly ?  Are the files that are written 
>>>> accessed for read soon ?  Is it sequential or random reads ?
>>> It is read mostly. Files are usually rarely accessed, but are
likely
>>> to be accessed several times when they begin to be accessed. This
is
>>> sequential read.
>>> The data set is transfered on the server, and the files are not 
>>> modified after that (or very rarely). The whole data set is 
>>> transfered at once.
>>>>
>>>> What is "good" by your definition ?
>>> The data set is transfered in a little less than 3 minutes without 
>>> compression - that''s good!
>>> With compression, the data set is transfered in 15 minutes. I was 
>>> hopping that a disk write cache can keep the transfer speed close
to
>>> the 3 minutes obtained without compression.
>>
>> That is a big difference and without seeing data my initial thought is 
>> that the system appears to be CPU bound, particularly if the only 
>> difference really was a compressed ZFS dataset versus an uncompressed 
>> one.
>>
>> Is "transfer" in this case a read from the server or a write
to it ?
> 
> It is a write to the server. The server is cpu bound, because of the 
> compression.
> 
>>
>>> My workload is very simple: the user copy approximately 10GB of
data
>>> on the server, and then only read it from time to time.
>>> During the copy on the server, the transfer is cpu bounded, and
there
>>> is a lot of cpu time available when there is no copy to the server.
>>
>> Are you talking here about the server''s CPU or the clients ?
> 
> I''m talking about the server''s cpu. Client is not cpu
bounded by scp.
> 
>>
>> >            Using a
>>> disk to store the uncompressed data, as I guess it is done by the 
>>> memory cache, may have helped.
>>> I thought the ZIL may have played this role.
>>
>> I think you are maybe confusing the ZIL and the L2ARC.
> 
> I think L2ARC is for reading data - I''d like an L2ARC for writing
data.
Correct the L2ARC is for reads.

I think actually what you want is not a cache at all but not to be CPU 
bound by gzip-6.
>> What compression algorithm are you using ?  The default "on"
value of
>> lzjb or are you doing something like gzip-9 ?
> 
> 
> gzip-6. There is no speed problem with lzjb, but also not the same 
> compression ratio :-)
What build of OpenSolaris are you running ?

The fix for "6812655 need larger kmem caches for newer workloads"
might
help but you need to be running build 114 or higher which means you need 
to be using the pkg.opensolaris.org/dev repository not the /release one.

It could also be "6586537 async zio taskqs can block out userland 
commands" which is fixed in 115 and was the "real" fix.

This is guess work though since I haven''t seen perf data from your 
particular system.

-- 
Darren J Moffat

Gaëtan Lehmann

2009-Jul-07 16:26 UTC

head link

[zfs-discuss] Wrice cache for compressed file system

Le 7 juil. 09 ? 15:54, Darren J Moffat a ?crit :
>>> What compression algorithm are you using ?  The default
"on" value
>>> of lzjb or are you doing something like gzip-9 ?
>> gzip-6. There is no speed problem with lzjb, but also not the same  
>> compression ratio :-)
>
> What build of OpenSolaris are you running ?
>
> The fix for "6812655 need larger kmem caches for newer workloads"
> might help but you need to be running build 114 or higher which  
> means you need to be using the pkg.opensolaris.org/dev repository  
> not the /release one.
>
> It could also be "6586537 async zio taskqs can block out userland  
> commands" which is fixed in 115 and was the "real" fix.
>
> This is guess work though since I haven''t seen perf data from your
> particular system.
I think it''s 2008.11.
I won''t be able to access that host until next week. I''ll
update the
OS and post the result on the list.

Thanks,

Ga?tan

-- 
Ga?tan Lehmann
Biologie du D?veloppement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66    fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 203 bytes
Desc: Ceci est une signature ?lectronique PGP
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090707/85e716d5/attachment.bin>

zfs discuss - Jul 2009 - Wrice cache for compressed file system

[zfs-discuss] Wrice cache for compressed file system

[zfs-discuss] Wrice cache for compressed file system

[zfs-discuss] Wrice cache for compressed file system

[zfs-discuss] Wrice cache for compressed file system

[zfs-discuss] Wrice cache for compressed file system

[zfs-discuss] Wrice cache for compressed file system

[zfs-discuss] Wrice cache for compressed file system