thr3ads.net - zfs discuss - [zfs-discuss] zfs send and ARC [Mar 2010]

If this information is useful, please help other people find it:
Share via:

Damon Atkins

2010-Mar-25 13:13 UTC

[zfs-discuss] zfs send and ARC

In the "Thoughts on ZFS Pool Backup Strategies thread" it was stated
that zfs send, sends uncompress data and uses the ARC.

If "zfs send" sends uncompress data which has already been compress
this is not very efficient, and it would be *nice* to see it send the original
compress data. (or an option to do it)

I thought I would ask a true or false type questions mainly for curiosity sake.

If "zfs send" uses standard ARC cache (when something is not already
in the ARC) I would expect this to hurt (to some degree??) the performance of
the system. (ie I assume it has the effect of replacing current/useful data in
the cache with not very useful/old data depending on how large the ZFS send is)


If above true,  zfs send and ?zfs backup? (if it the cmd existed to backup and
restore a file or set of files with all ZFS attributes) would improve the
performance of normal read/write by avoiding the ARC cache (or if easier to
implement having its own private ARC cache).

Or does it use the same sort of code, as setting ?primarycache=none? on a file
system.

Has anyone monitored ARC hit rates while doing a large zfs send?

Cheers
-- 
This message posted from opensolaris.org

Richard Elling

2010-Mar-25 14:19 UTC

head link

[zfs-discuss] zfs send and ARC

On Mar 25, 2010, at 6:13 AM, Damon Atkins wrote:> In the "Thoughts on ZFS Pool Backup Strategies thread" it was
stated that zfs send, sends uncompress data and uses the ARC.
> 
> If "zfs send" sends uncompress data which has already been
compress this is not very efficient, and it would be *nice* to see it send the
original compress data. (or an option to do it)
> 
> I thought I would ask a true or false type questions mainly for curiosity
sake.
> 
> If "zfs send" uses standard ARC cache (when something is not
already in the ARC) I would expect this to hurt (to some degree??) the
performance of the system. (ie I assume it has the effect of replacing
current/useful data in the cache with not very useful/old data depending on how
large the ZFS send is)
If you restrict answers to "true/false" then the answer is false :-)
Actually, the answer is mostly false. The ARC is divided into a most 
frequently used cache and a most recently used cache. The send
data should stick to the most recently used side.
> If above true,  zfs send and ?zfs backup? (if it the cmd existed to backup
and restore a file or set of files with all ZFS attributes) would improve the
performance of normal read/write by avoiding the ARC cache (or if easier to
implement having its own private ARC cache).
The zio pipeline can, in theory, be tapped between the checksum and
decompression side, but I think you will find that this defeats both piped
compression and receive compression.
> 
> Or does it use the same sort of code, as setting ?primarycache=none? on a
file system.
> 
> Has anyone monitored ARC hit rates while doing a large zfs send?
Yes.  I see very good ARC hit rates when I send from a high transaction
system. This is a good thing because recently written data is likely to be
in the ARC.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com

Darren J Moffat

2010-Mar-25 16:23 UTC

head link

[zfs-discuss] zfs send and ARC

Wither it is efficient or not to send the compressed or uncompressed 
data depends on a lot of factors.

If the data is already in the ARC for some other reason then it is 
likely much more efficient to use that because sending the compressed 
blocks involves doing IO to disk.  Reading the version from the in 
memory ARC does not.

If the data is in the L2ARC that is still better than going out to the 
main pool disks to get the compressed version.

Reading from disk is always slower than reading from memory.

Depending on what your working set of data in the ARC is and the size of 
the dataset you are sending it is possible that the ''zfs send''
will
cause data that was in the ARC to be evicted to make room for the blocks 
that ''zfs send'' needs.   This is a perfect use case for having
a large
L2ARC if you can''t fit your working set and the blocks for the
''zfs
send'' into the ARC.

If you are using incremental ''zfs send'' streams the chances of
you
thrashing the ARC are probably reduced, particularly if you do them 
frequently enough so that they aren''t too big.

I know people have monitored the ARC hit rates when doing large zfs 
sends.  Using the DTrace Analytics in an SS7000 makes this very easy.

It really comes down to the size of your working set in the ARC, the 
size of your L2ARC and your pattern of data access all that combined 
with the volumen of data you are ''zfs send''ing.

-- 
Darren J Moffat

Nicolas Williams

2010-Mar-25 16:33 UTC

head link

[zfs-discuss] zfs send and ARC

On Thu, Mar 25, 2010 at 04:23:38PM +0000, Darren J Moffat
wrote:> If the data is in the L2ARC that is still better than going out to
> the main pool disks to get the compressed version.
<advocate customer=''devil''>

Well, one could just compress it...  If you''d otherwise put compression
in the ssh pipe (or elsewhere) then you could stop doing that.

</advocate customer=''devil''>

Nico
--

Edward Ned Harvey

2010-Mar-26 12:06 UTC

head link

[zfs-discuss] zfs send and ARC

> In the "Thoughts on ZFS Pool Backup Strategies thread" it was
stated
> that zfs send, sends uncompress data and uses the ARC.
> 
> If "zfs send" sends uncompress data which has already been
compress
> this is not very efficient, and it would be *nice* to see it send the
> original compress data. (or an option to do it)
You''ve got 2 questions in your post.  The one above first ...

It''s true that "zfs send" sends uncompressed data.  So
I''ve heard.  I haven''t tested it personally.

I seem to remember there''s some work to improve this, but not available
yet.  Because it was easier to implement the uncompressed send, and that already
is super-fast compared to all the alternatives.

> I thought I would ask a true or false type questions mainly for
> curiosity sake.
> 
> If "zfs send" uses standard ARC cache (when something is not
already in
> the ARC) I would expect this to hurt (to some degree??) the performance
> of the system. (ie I assume it has the effect of replacing
> current/useful data in the cache with not very useful/old data
And this is a separate question.

I can''t say first-hand what ZFS does, but I have an educated guess.  I
would say, for every block the "zfs send" needs to read ... if the
block is in ARC or L2ARC, then it won''t fetch again from disk.  But it
is not obliterating the ARC or L2ARC with old data.  Because it''s smart
enough to work at a lower level than a user-space process, and tell the kernel
(or whatever) something like "I''m only reading this block once;
don''t bother caching it for my sake."

David Dyer-Bennet

2010-Mar-26 13:46 UTC

head link

[zfs-discuss] zfs send and ARC

On Fri, March 26, 2010 07:06, Edward Ned Harvey wrote:>> In the "Thoughts on ZFS Pool Backup Strategies thread" it was
stated
>> that zfs send, sends uncompress data and uses the ARC.
>>
>> If "zfs send" sends uncompress data which has already been
compress
>> this is not very efficient, and it would be *nice* to see it send the
>> original compress data. (or an option to do it)
>
> You''ve got 2 questions in your post.  The one above first ...
>
> It''s true that "zfs send" sends uncompressed data.  So
I''ve heard.  I
> haven''t tested it personally.
>
> I seem to remember there''s some work to improve this, but not
available
> yet.  Because it was easier to implement the uncompressed send, and that
> already is super-fast compared to all the alternatives.
I don''t know that it makes sense to.  There are lots of existing filter
packages that do compression; so if you want compression, just put them in
your pipeline.  That way you''re not limited by what zfs send has
implemented, either.  When they implement bzip98 with a new compression
technology breakthrough, you can just use it :-) .

-- 
David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

David Magda

2010-Mar-26 15:24 UTC

head link

[zfs-discuss] zfs send and ARC

On Fri, March 26, 2010 09:46, David Dyer-Bennet wrote:
> I don''t know that it makes sense to.  There are lots of existing
filter
> packages that do compression; so if you want compression, just put them in
> your pipeline.  That way you''re not limited by what zfs send has
> implemented, either.  When they implement bzip98 with a new compression
> technology breakthrough, you can just use it :-) .
Actually a better example may be using parallel implementations of popular
algorithms:

    http://www.zlib.net/pigz/
    http://www.google.com/search?q=parallel+bzip

Given the amount of cores we have nowadays (especially the Niagara-based
CPUs), might as well use them. There are also better algorithms out there
(some of which assume parallelism):

    http://en.wikipedia.org/wiki/Xz
    http://en.wikipedia.org/wiki/7z

If you''re using OpenSSH, there are also some third-party patches that
may
help in performance:

    http://www.psc.edu/networking/projects/hpn-ssh/

However, if the data is already compressed (and/or deduped), there''s no
sense in doing it again. If ZFS does have to go to disk, might as well
send the data as-is.

zfs discuss - Mar 2010 - zfs send and ARC

[zfs-discuss] zfs send and ARC

[zfs-discuss] zfs send and ARC

[zfs-discuss] zfs send and ARC

[zfs-discuss] zfs send and ARC

[zfs-discuss] zfs send and ARC

[zfs-discuss] zfs send and ARC

[zfs-discuss] zfs send and ARC