On Tue, Feb 28, 2023 at 12:24:04PM +0100, Laszlo Ersek
wrote:> On 2/27/23 17:44, Richard W.M. Jones wrote:
> > On Mon, Feb 27, 2023 at 08:42:23AM -0600, Eric Blake wrote:
> >> Or intentionally choose a hash that can be computed out-of-order,
such
> >> as a Merkle Tree. But we'd need a standard setup for all
parties to
> >> agree on how the hash is to be computed and checked, if it is
going to
> >> be anything more than just a linear hash of the entire
guest-visible
> >> contents.
> >
> > Unfortunately I suspect that by far the easiest way for people who
> > host images to compute checksums is to run 'shaXXXsum' on them
or sign
> > them with a GPG signature, rather than engaging in a novel hash
> > function. Indeed that's what is happening now:
> >
> > https://alt.fedoraproject.org/en/verify.html
>
> If the output is produced with unordered writes, but the complete output
> needs to be verified with a hash *chain*, that still allows for some
> level of asynchrony. The start of the hashing need not be delayed until
> after the end of output, only after the start of output.
>
> For example, nbdcopy could maintain the highest offset up to which the
> output is contiguous, and on a separate thread, it could be hashing the
> output up to that offset.
>
> Considering a gigantic output, as yet unassembled blocks could likely
> not be buffered in memory (that's why the writes are unordered in the
> first place!), so the hashing thread would have to re-read the output
> via NBD. Whether that would cause performance to improve or to
> deteriorate is undecided IMO. If the far end of the output network block
> device can accommodate a reader that is independent of the writers, then
> this level of overlap is beneficial. Otherwise, this extra reader thread
> would just add more thrashing, and we'd be better off with a separate
> read-through once writing is complete.
In my mind I'm wondering if there's any mathematical result that lets
you combine each hash(block_i) into the final hash(block[1..N])
without needing to compute the hash of each block in order.
(This is what blkhash solves, but unfortunately the output isn't
compatible with standard hashes.)
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top