thr3ads.net - zfs discuss - [zfs-discuss] ZFS send/recv checksum transmission [Feb 2010]

If this information is useful, please help other people find it:
Share via:

grarpamp

2010-Feb-05 11:11 UTC

[zfs-discuss] ZFS send/recv checksum transmission

Are the sha256/fletcher[x]/etc checksums sent to the receiver along
with the other data/metadata? And checked upon receipt of course.
Do they chain all the way back to the uberblock or to some calculated
transfer specific checksum value?
The idea is to carry through the integrity checks wherever possible.
Whether done as close as within the same zpool, or miles away.

Richard Elling

2010-Feb-05 23:42 UTC

head link

[zfs-discuss] ZFS send/recv checksum transmission

On Feb 5, 2010, at 3:11 AM, grarpamp wrote:> Are the sha256/fletcher[x]/etc checksums sent to the receiver along
> with the other data/metadata?
No. Checksums are made on the records, and there could be a different
record size for the sending and receiving file systems. The stream itself
is checksummed with fletcher4.
> And checked upon receipt of course.
Of course.
> Do they chain all the way back to the uberblock or to some calculated
> transfer specific checksum value?
I suppose one could say a calculated transfer fletcher4 checksum value.
> The idea is to carry through the integrity checks wherever possible.
> Whether done as close as within the same zpool, or miles away.
yes.
 -- richard

grarpamp

2010-Feb-06 03:20 UTC

head link

[zfs-discuss] ZFS send/recv checksum transmission

>  No. Checksums are made on the records, and there could be a different
>  record size for the sending and receiving file systems.
Oh. So there''s a zfs read to ram somewhere, which checks the sums on
disk.
And then entirely new stream checksums are made while sending it all off
to the pipe.

I see the bit about different zfs block sizes perhaps preventing use of the
actual on disk checksums in the transfer itself... including thereby, the
chain to uberblock in the transfer. Thanks for that part.
> The stream itself is checksummed with fletcher4.
>  I suppose one could say a calculated transfer fletcher4 checksum value.
Hmm, is that configurable? Say to match the checksums being
used on the filesystem itself... ie: sha256? It would seem odd to
send with less bits than what is used on disk.
>> The idea is to carry through the integrity checks wherever possible.
>> Whether done as close as within the same zpool, or miles away.
>  yes.
Was thinking that plaintext ethernet/wan and even some of the
''weaker''
ssl algorithms would be candidates to back with sha256 in a transfer.
Not really needed for a ''within the box only'' unix pipe
though.

Richard Elling

2010-Feb-06 03:28 UTC

head link

[zfs-discuss] ZFS send/recv checksum transmission

On Feb 5, 2010, at 7:20 PM, grarpamp wrote:>> No. Checksums are made on the records, and there could be a different
>> record size for the sending and receiving file systems.
> 
> Oh. So there''s a zfs read to ram somewhere, which checks the sums
on disk.
> And then entirely new stream checksums are made while sending it all off
> to the pipe.
> 
> I see the bit about different zfs block sizes perhaps preventing use of the
> actual on disk checksums in the transfer itself... including thereby, the
> chain to uberblock in the transfer. Thanks for that part.
> 
>> The stream itself is checksummed with fletcher4.
>> I suppose one could say a calculated transfer fletcher4 checksum value.
> 
> Hmm, is that configurable? Say to match the checksums being
> used on the filesystem itself... ie: sha256? It would seem odd to
> send with less bits than what is used on disk.
Do you expect the same errors in the pipe as you do on disk?
>>> The idea is to carry through the integrity checks wherever
possible.
>>> Whether done as close as within the same zpool, or miles away.
>> yes.
> 
> Was thinking that plaintext ethernet/wan and even some of the
''weaker''
> ssl algorithms would be candidates to back with sha256 in a transfer.
> Not really needed for a ''within the box only'' unix pipe
though.
most folks use ssh.
 -- richard

grarpamp

2010-Feb-06 04:09 UTC

head link

[zfs-discuss] ZFS send/recv checksum transmission

>> Hmm, is that configurable? Say to match the checksums being
>> used on the filesystem itself... ie: sha256? It would seem odd to
>> send with less bits than what is used on disk.
>> Was thinking that plaintext ethernet/wan and even some of the
''weaker''
>> ssl algorithms
> Do you expect the same errors in the pipe as you do on disk?
Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk]
is assumed to handle data with integrity. So say netcat is used as transport,
zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv,
and your wire takes some undetected/uncorrected hits, and the hits also
happen to make it past fletcher4... it kindof nullifies the SA''s
choice/thought
that sha256 would be used throughout all zfs operations.

I din''t see notation in the man page that checksums are indeed used
in send/recv operations...

In any case, at least something is used over the bare wire :)

Richard Elling

2010-Feb-06 04:46 UTC

head link

[zfs-discuss] ZFS send/recv checksum transmission

On Feb 5, 2010, at 8:09 PM, grarpamp wrote:
>>> Hmm, is that configurable? Say to match the checksums being
>>> used on the filesystem itself... ie: sha256? It would seem odd to
>>> send with less bits than what is used on disk.
> 
>>> Was thinking that plaintext ethernet/wan and even some of the
''weaker''
>>> ssl algorithms
> 
>> Do you expect the same errors in the pipe as you do on disk?
> 
> Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except
disk]
> is assumed to handle data with integrity. So say netcat is used as
transport,
> zfs is using sha256 on disk, but only fletcher4 over the wire with
send/recv,
> and your wire takes some undetected/uncorrected hits, and the hits also
> happen to make it past fletcher4... it kindof nullifies the SA''s
choice/thought
> that sha256 would be used throughout all zfs operations.
Hold it right there, fella.  SHA256 is not used for everything ZFS, so
expecting it to be so will set the stage for disappointment.  You can
set the data to be checksummed with SHA256.
> I din''t see notation in the man page that checksums are indeed
used
> in send/recv operations...
It is an implementation detail.  But if you can make the case for
why it is required to be inside the protocol, rather than its transport,
then please file an RFE.
> In any case, at least something is used over the bare wire :)
Lots of things are used on the bare wire and there are many
hops along the way. This is another good reason to use ssh, or
some other end-to-end verification mechanism. UNIX pipes are
a great invention! :-)
 -- richard

grarpamp

2010-Feb-06 06:50 UTC

head link

[zfs-discuss] ZFS send/recv checksum transmission

>> Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except
disk]
>> is assumed to handle data with integrity. So say netcat is used as
transport,
>> zfs is using sha256 on disk, but only fletcher4 over the wire with
send/recv,
>> and your wire takes some undetected/uncorrected hits, and the hits also
>> happen to make it past fletcher4... it kindof nullifies the
SA''s choice/thought
>> that sha256 would be used throughout all zfs operations.
>
>  Hold it right there, fella.  SHA256 is not used for everything ZFS,
Well, ok, and in my limited knowhow... zfs set checksum=sha256 only
covers user scribbled data [POSIX file metadata, file contents, directory
structure, ZVOL blocks] and not necessarily any zfs filesystem internals.
> You can set the data to be checksummed with SHA256.
Definitely, as indeed set above :)
>> I din''t see notation in the man page that checksums are indeed
used
>> in send/recv operations...
>
>  It is an implementation detail.  But if you can make the case for
>  why it is required to be inside the protocol, rather than its transport,
>  then please file an RFE.
The case had to have been previously made to include fletcher4 in the
zfs send/recv protocol. So sha256 would just be an update to the user''s
options. Similar to how f4 was an available on disk update to f2, z3 to z2
to z1, etc.

Was really only looking to see what, if anything, was currently used in
the protocol, not actually proposing an update. Now I know :)

Transport is certainly always up to the user: pipe/netcat/ssh/rsh/pigeon
>> In any case, at least something is used over the bare wire :)
> UNIX pipes are a great invention! :-)
Yeah, I suppose a pipe to ssh has enough bits to catch things these days.
Netcat might be different, ergo, at least f4 as already implemented.

debug1: kex: server->client aes128-ctr hmac-sha1 none
debug1: kex: client->server aes128-ctr hmac-sha1 none

Thanks.

Richard Elling

2010-Feb-06 17:22 UTC

head link

[zfs-discuss] ZFS send/recv checksum transmission

On Feb 5, 2010, at 10:50 PM, grarpamp wrote:
>>> Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io,
except disk]
>>> is assumed to handle data with integrity. So say netcat is used as
transport,
>>> zfs is using sha256 on disk, but only fletcher4 over the wire with
send/recv,
>>> and your wire takes some undetected/uncorrected hits, and the hits
also
>>> happen to make it past fletcher4... it kindof nullifies the
SA''s choice/thought
>>> that sha256 would be used throughout all zfs operations.
>> 
>> Hold it right there, fella.  SHA256 is not used for everything ZFS,
> 
> Well, ok, and in my limited knowhow... zfs set checksum=sha256 only
> covers user scribbled data [POSIX file metadata, file contents, directory
> structure, ZVOL blocks] and not necessarily any zfs filesystem internals.
metadata is fletcher4 except for the uberblocks which are self-checksummed
using sha256.
>> You can set the data to be checksummed with SHA256.
> 
> Definitely, as indeed set above :)
SHA256 is approximately 1/2 the speed of fletcher4, so the trade-off
does not consider only the checksum algorithm.  For older machines,
the speed difference could be worse.
>>> I din''t see notation in the man page that checksums are
indeed used
>>> in send/recv operations...
>> 
>> It is an implementation detail.  But if you can make the case for
>> why it is required to be inside the protocol, rather than its
transport,
>> then please file an RFE.
> 
> The case had to have been previously made to include fletcher4 in the
> zfs send/recv protocol. So sha256 would just be an update to the
user''s
> options. Similar to how f4 was an available on disk update to f2, z3 to z2
> to z1, etc.
This is a very different use case than the data stored on media. Since
the pipe interface is very reliable, you can reasonably choose to use
more or less protection through the pipe without complicating the ZFS
user interface [insert UNIX philosophy argument here :-)]
> Was really only looking to see what, if anything, was currently used in
> the protocol, not actually proposing an update. Now I know :)
> 
> Transport is certainly always up to the user: pipe/netcat/ssh/rsh/pigeon
> 
>>> In any case, at least something is used over the bare wire :)
>> UNIX pipes are a great invention! :-)
> 
> Yeah, I suppose a pipe to ssh has enough bits to catch things these days.
> Netcat might be different, ergo, at least f4 as already implemented.
> 
> debug1: kex: server->client aes128-ctr hmac-sha1 none
> debug1: kex: client->server aes128-ctr hmac-sha1 none
I''m interested in anecdotal evidence which suggests there is a
problem as it is currently designed. Thus far, I believe the reports
of send stream corruption on this forum have been attributed to
other things.
 -- richard

grarpamp

2010-Feb-06 18:51 UTC

head link

[zfs-discuss] ZFS send/recv checksum transmission

>  > Well, ok, and in my limited knowhow... zfs set checksum=sha256 only
>  > covers user scribbled data [POSIX file metadata, file contents,
directory
>  > structure, ZVOL blocks] and not necessarily any zfs filesystem
internals.
>
>  metadata is fletcher4 except for the uberblocks which are self-checksummed
>  using sha256.
Surely you''re referring to ''metadata'' as
''zfs filesystem internals'',
not ''user scribbled data'', particularly here, stat(2) info
[commonly
called metadata]?
>  SHA256 is approximately 1/2 the speed of fletcher4, so the trade-off
>  does not consider only the checksum algorithm.  For older machines,
>  the speed difference could be worse.
Hah! Considering I get only 20MiB/s read and about half that on write,
I''m
used to the pain :) Of course different algos come with a price, ref: openssl
speed.
>  > Yeah, I suppose a pipe to ssh has enough bits to catch things these
days.
>  > Netcat might be different, ergo, at least f4 as already implemented.
>
>  I''m interested in anecdotal evidence which suggests there is a
>  problem as it is currently designed. Thus far, I believe the reports
>  of send stream corruption on this forum have been attributed to
>  other things.
An anti-anecdote of sorts...
With all the petabytes I''ve stuffed over unchecked ethernet/wan I
don''t
think I''ve ever run into a confirmed case of data error there. Whether
netcat/rsh/ssh/ftp/etc, it was always bad hardware. Especially the crap
$megabuck SAN I ended up proving bad by wrapping backups with aes, sha
and pki. Before then, I simply wasn''t watching anything closely. Though
it never
seemed that the layer0 [wire] through layer3 [ip] checksums would be all that
strong compared to 128+ bits of real hash.

I can''t seem to find a bit eqivalency for fletcher4. Though I
don''t think
that''s quite the way I should be thinking f4 works.

Oh well, this thread''s baked. Thanks for zfs, it totally made my world
better.

Daniel Carosone

2010-Feb-08 03:48 UTC

head link

[zfs-discuss] ZFS send/recv checksum transmission

On Sat, Feb 06, 2010 at 09:22:57AM -0800, Richard Elling
wrote:> I''m interested in anecdotal evidence which suggests there is a
> problem as it is currently designed.
I like to look at it differently: I''m not sure if there is a
problem. I''d like to have a simple way to discover a problem, using
the work zfs is already doing for me.

So, I''d like two things from the "system" as a whole:

- confidence that a send|recv which completes "successfully" has
really delivered an exact copy.
- verification that two datasets are the same, from a simple, quick,
ideally cheap test.

I can get some way to the former from understanding of the mechanisms
used and analysis of their protective coverage and reasoning about the
possible failure modes. Having the latter gets me the rest of the way
there, and even most of the way there by itself.

confidence < verification < assurance.

So, for example, in early tests with send|recv, I''m sure many of us
have run "rsync -nc .." comparison runs over the results.
That''s
easy, relatively quick, but not entirely as cheap as could be.

"It would be very nice" if there was a simple dataset fingerprint that
depended, merkle-style, on the entire contents of the dataset
(snapshot) below, and that could be easily compared on sender and
receiver. This (together with scrub) would provide the desied
assurance that the two are indeed the same.

Back to analysis and reasoning for a moment; I would have more
confidence in send|recv if I knew the end-to-end protections extended
to cover the on-disk checksums (since the on-disk copies are the
important endpoints for this operation). I suspect this was a large
part of the intent behind the OP''s question.

As it stands from the current description, there are windows where
errors might be introduced and not detected - in particular, if I have
a protection gap via non-ECC RAM at either send or recv. I can
cover many of the other gaps with pipeline tools, as discussed. This
is a hard gap to cover, even for detection, without help from the
actual zfs endpoints.

Of course there are conflicting requirements, since we also want
send|recv to facilitate recompression, reblocking, changing checksum
method, etc etc.

So lets turn the question around: what is the best way to verify that
send|recv really has produced an identical copy?

--
Dan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100208/a4108e76/attachment.bin>

zfs discuss - Feb 2010 - ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission

[zfs-discuss] ZFS send/recv checksum transmission