Are the sha256/fletcher[x]/etc checksums sent to the receiver along with the other data/metadata? And checked upon receipt of course. Do they chain all the way back to the uberblock or to some calculated transfer specific checksum value? The idea is to carry through the integrity checks wherever possible. Whether done as close as within the same zpool, or miles away.
On Feb 5, 2010, at 3:11 AM, grarpamp wrote:> Are the sha256/fletcher[x]/etc checksums sent to the receiver along > with the other data/metadata?No. Checksums are made on the records, and there could be a different record size for the sending and receiving file systems. The stream itself is checksummed with fletcher4.> And checked upon receipt of course.Of course.> Do they chain all the way back to the uberblock or to some calculated > transfer specific checksum value?I suppose one could say a calculated transfer fletcher4 checksum value.> The idea is to carry through the integrity checks wherever possible. > Whether done as close as within the same zpool, or miles away.yes. -- richard
> No. Checksums are made on the records, and there could be a different > record size for the sending and receiving file systems.Oh. So there''s a zfs read to ram somewhere, which checks the sums on disk. And then entirely new stream checksums are made while sending it all off to the pipe. I see the bit about different zfs block sizes perhaps preventing use of the actual on disk checksums in the transfer itself... including thereby, the chain to uberblock in the transfer. Thanks for that part.> The stream itself is checksummed with fletcher4. > I suppose one could say a calculated transfer fletcher4 checksum value.Hmm, is that configurable? Say to match the checksums being used on the filesystem itself... ie: sha256? It would seem odd to send with less bits than what is used on disk.>> The idea is to carry through the integrity checks wherever possible. >> Whether done as close as within the same zpool, or miles away. > yes.Was thinking that plaintext ethernet/wan and even some of the ''weaker'' ssl algorithms would be candidates to back with sha256 in a transfer. Not really needed for a ''within the box only'' unix pipe though.
On Feb 5, 2010, at 7:20 PM, grarpamp wrote:>> No. Checksums are made on the records, and there could be a different >> record size for the sending and receiving file systems. > > Oh. So there''s a zfs read to ram somewhere, which checks the sums on disk. > And then entirely new stream checksums are made while sending it all off > to the pipe. > > I see the bit about different zfs block sizes perhaps preventing use of the > actual on disk checksums in the transfer itself... including thereby, the > chain to uberblock in the transfer. Thanks for that part. > >> The stream itself is checksummed with fletcher4. >> I suppose one could say a calculated transfer fletcher4 checksum value. > > Hmm, is that configurable? Say to match the checksums being > used on the filesystem itself... ie: sha256? It would seem odd to > send with less bits than what is used on disk.Do you expect the same errors in the pipe as you do on disk?>>> The idea is to carry through the integrity checks wherever possible. >>> Whether done as close as within the same zpool, or miles away. >> yes. > > Was thinking that plaintext ethernet/wan and even some of the ''weaker'' > ssl algorithms would be candidates to back with sha256 in a transfer. > Not really needed for a ''within the box only'' unix pipe though.most folks use ssh. -- richard
>> Hmm, is that configurable? Say to match the checksums being >> used on the filesystem itself... ie: sha256? It would seem odd to >> send with less bits than what is used on disk.>> Was thinking that plaintext ethernet/wan and even some of the ''weaker'' >> ssl algorithms> Do you expect the same errors in the pipe as you do on disk?Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk] is assumed to handle data with integrity. So say netcat is used as transport, zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv, and your wire takes some undetected/uncorrected hits, and the hits also happen to make it past fletcher4... it kindof nullifies the SA''s choice/thought that sha256 would be used throughout all zfs operations. I din''t see notation in the man page that checksums are indeed used in send/recv operations... In any case, at least something is used over the bare wire :)
On Feb 5, 2010, at 8:09 PM, grarpamp wrote:>>> Hmm, is that configurable? Say to match the checksums being >>> used on the filesystem itself... ie: sha256? It would seem odd to >>> send with less bits than what is used on disk. > >>> Was thinking that plaintext ethernet/wan and even some of the ''weaker'' >>> ssl algorithms > >> Do you expect the same errors in the pipe as you do on disk? > > Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk] > is assumed to handle data with integrity. So say netcat is used as transport, > zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv, > and your wire takes some undetected/uncorrected hits, and the hits also > happen to make it past fletcher4... it kindof nullifies the SA''s choice/thought > that sha256 would be used throughout all zfs operations.Hold it right there, fella. SHA256 is not used for everything ZFS, so expecting it to be so will set the stage for disappointment. You can set the data to be checksummed with SHA256.> I din''t see notation in the man page that checksums are indeed used > in send/recv operations...It is an implementation detail. But if you can make the case for why it is required to be inside the protocol, rather than its transport, then please file an RFE.> In any case, at least something is used over the bare wire :)Lots of things are used on the bare wire and there are many hops along the way. This is another good reason to use ssh, or some other end-to-end verification mechanism. UNIX pipes are a great invention! :-) -- richard
>> Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk] >> is assumed to handle data with integrity. So say netcat is used as transport, >> zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv, >> and your wire takes some undetected/uncorrected hits, and the hits also >> happen to make it past fletcher4... it kindof nullifies the SA''s choice/thought >> that sha256 would be used throughout all zfs operations. > > Hold it right there, fella. SHA256 is not used for everything ZFS,Well, ok, and in my limited knowhow... zfs set checksum=sha256 only covers user scribbled data [POSIX file metadata, file contents, directory structure, ZVOL blocks] and not necessarily any zfs filesystem internals.> You can set the data to be checksummed with SHA256.Definitely, as indeed set above :)>> I din''t see notation in the man page that checksums are indeed used >> in send/recv operations... > > It is an implementation detail. But if you can make the case for > why it is required to be inside the protocol, rather than its transport, > then please file an RFE.The case had to have been previously made to include fletcher4 in the zfs send/recv protocol. So sha256 would just be an update to the user''s options. Similar to how f4 was an available on disk update to f2, z3 to z2 to z1, etc. Was really only looking to see what, if anything, was currently used in the protocol, not actually proposing an update. Now I know :) Transport is certainly always up to the user: pipe/netcat/ssh/rsh/pigeon>> In any case, at least something is used over the bare wire :) > UNIX pipes are a great invention! :-)Yeah, I suppose a pipe to ssh has enough bits to catch things these days. Netcat might be different, ergo, at least f4 as already implemented. debug1: kex: server->client aes128-ctr hmac-sha1 none debug1: kex: client->server aes128-ctr hmac-sha1 none Thanks.
On Feb 5, 2010, at 10:50 PM, grarpamp wrote:>>> Perhaps I meant to say that the box itself [cpu/ram/bus/nic/io, except disk] >>> is assumed to handle data with integrity. So say netcat is used as transport, >>> zfs is using sha256 on disk, but only fletcher4 over the wire with send/recv, >>> and your wire takes some undetected/uncorrected hits, and the hits also >>> happen to make it past fletcher4... it kindof nullifies the SA''s choice/thought >>> that sha256 would be used throughout all zfs operations. >> >> Hold it right there, fella. SHA256 is not used for everything ZFS, > > Well, ok, and in my limited knowhow... zfs set checksum=sha256 only > covers user scribbled data [POSIX file metadata, file contents, directory > structure, ZVOL blocks] and not necessarily any zfs filesystem internals.metadata is fletcher4 except for the uberblocks which are self-checksummed using sha256.>> You can set the data to be checksummed with SHA256. > > Definitely, as indeed set above :)SHA256 is approximately 1/2 the speed of fletcher4, so the trade-off does not consider only the checksum algorithm. For older machines, the speed difference could be worse.>>> I din''t see notation in the man page that checksums are indeed used >>> in send/recv operations... >> >> It is an implementation detail. But if you can make the case for >> why it is required to be inside the protocol, rather than its transport, >> then please file an RFE. > > The case had to have been previously made to include fletcher4 in the > zfs send/recv protocol. So sha256 would just be an update to the user''s > options. Similar to how f4 was an available on disk update to f2, z3 to z2 > to z1, etc.This is a very different use case than the data stored on media. Since the pipe interface is very reliable, you can reasonably choose to use more or less protection through the pipe without complicating the ZFS user interface [insert UNIX philosophy argument here :-)]> Was really only looking to see what, if anything, was currently used in > the protocol, not actually proposing an update. Now I know :) > > Transport is certainly always up to the user: pipe/netcat/ssh/rsh/pigeon > >>> In any case, at least something is used over the bare wire :) >> UNIX pipes are a great invention! :-) > > Yeah, I suppose a pipe to ssh has enough bits to catch things these days. > Netcat might be different, ergo, at least f4 as already implemented. > > debug1: kex: server->client aes128-ctr hmac-sha1 none > debug1: kex: client->server aes128-ctr hmac-sha1 noneI''m interested in anecdotal evidence which suggests there is a problem as it is currently designed. Thus far, I believe the reports of send stream corruption on this forum have been attributed to other things. -- richard
> > Well, ok, and in my limited knowhow... zfs set checksum=sha256 only > > covers user scribbled data [POSIX file metadata, file contents, directory > > structure, ZVOL blocks] and not necessarily any zfs filesystem internals. > > metadata is fletcher4 except for the uberblocks which are self-checksummed > using sha256.Surely you''re referring to ''metadata'' as ''zfs filesystem internals'', not ''user scribbled data'', particularly here, stat(2) info [commonly called metadata]?> SHA256 is approximately 1/2 the speed of fletcher4, so the trade-off > does not consider only the checksum algorithm. For older machines, > the speed difference could be worse.Hah! Considering I get only 20MiB/s read and about half that on write, I''m used to the pain :) Of course different algos come with a price, ref: openssl speed.> > Yeah, I suppose a pipe to ssh has enough bits to catch things these days. > > Netcat might be different, ergo, at least f4 as already implemented. > > I''m interested in anecdotal evidence which suggests there is a > problem as it is currently designed. Thus far, I believe the reports > of send stream corruption on this forum have been attributed to > other things.An anti-anecdote of sorts... With all the petabytes I''ve stuffed over unchecked ethernet/wan I don''t think I''ve ever run into a confirmed case of data error there. Whether netcat/rsh/ssh/ftp/etc, it was always bad hardware. Especially the crap $megabuck SAN I ended up proving bad by wrapping backups with aes, sha and pki. Before then, I simply wasn''t watching anything closely. Though it never seemed that the layer0 [wire] through layer3 [ip] checksums would be all that strong compared to 128+ bits of real hash. I can''t seem to find a bit eqivalency for fletcher4. Though I don''t think that''s quite the way I should be thinking f4 works. Oh well, this thread''s baked. Thanks for zfs, it totally made my world better.
On Sat, Feb 06, 2010 at 09:22:57AM -0800, Richard Elling wrote:> I''m interested in anecdotal evidence which suggests there is a > problem as it is currently designed.I like to look at it differently: I''m not sure if there is a problem. I''d like to have a simple way to discover a problem, using the work zfs is already doing for me. So, I''d like two things from the "system" as a whole: - confidence that a send|recv which completes "successfully" has really delivered an exact copy. - verification that two datasets are the same, from a simple, quick, ideally cheap test. I can get some way to the former from understanding of the mechanisms used and analysis of their protective coverage and reasoning about the possible failure modes. Having the latter gets me the rest of the way there, and even most of the way there by itself. confidence < verification < assurance. So, for example, in early tests with send|recv, I''m sure many of us have run "rsync -nc .." comparison runs over the results. That''s easy, relatively quick, but not entirely as cheap as could be. "It would be very nice" if there was a simple dataset fingerprint that depended, merkle-style, on the entire contents of the dataset (snapshot) below, and that could be easily compared on sender and receiver. This (together with scrub) would provide the desied assurance that the two are indeed the same. Back to analysis and reasoning for a moment; I would have more confidence in send|recv if I knew the end-to-end protections extended to cover the on-disk checksums (since the on-disk copies are the important endpoints for this operation). I suspect this was a large part of the intent behind the OP''s question. As it stands from the current description, there are windows where errors might be introduced and not detected - in particular, if I have a protection gap via non-ECC RAM at either send or recv. I can cover many of the other gaps with pipeline tools, as discussed. This is a hard gap to cover, even for detection, without help from the actual zfs endpoints. Of course there are conflicting requirements, since we also want send|recv to facilitate recompression, reblocking, changing checksum method, etc etc. So lets turn the question around: what is the best way to verify that send|recv really has produced an identical copy? -- Dan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 194 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100208/a4108e76/attachment.bin>