thr3ads.net - Libguestfs - [Libguestfs] Checksums and other verification [Feb 2023]

If this information is useful, please help other people find it:
Share via:

Richard W.M. Jones

2023-Feb-27 13:56 UTC

[Libguestfs] Checksums and other verification

https://github.com/kubevirt/containerized-data-importer/issues/1520

Hi Eric,

We had a question from the Kubevirt team related to the above issue.
The question is roughly if it's possible to calculate the checksum of
an image as an nbdkit filter and/or in the qemu block layer.

Supplemental #1: could qemu-img convert calculate a checksum as it goes
along?

Supplemental #2: could we detect various sorts of common errors, such
a webserver that is incorrectly configured and serves up an error page
containing "<html>"; or something which is supposed to be a disk
image
but does not "look like" (in some ill-defined sense) a disk image,
eg. it has no partition table.

I'm not sure if qemu has any existing features covering the above (and
I know for sure that nbdkit doesn't).

One issue is that calculating a checksum involves a linear scan of the
image, although we can at least skip holes.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit

Nir Soffer

2023-Feb-27 14:24 UTC

head link

[Libguestfs] Checksums and other verification

On Mon, Feb 27, 2023 at 3:56?PM Richard W.M. Jones <rjones at redhat.com>
wrote:>
>
> https://github.com/kubevirt/containerized-data-importer/issues/1520
>
> Hi Eric,
>
> We had a question from the Kubevirt team related to the above issue.
> The question is roughly if it's possible to calculate the checksum of
> an image as an nbdkit filter and/or in the qemu block layer.
>
> Supplemental #1: could qemu-img convert calculate a checksum as it goes
> along?
>
> Supplemental #2: could we detect various sorts of common errors, such
> a webserver that is incorrectly configured and serves up an error page
> containing "<html>"; or something which is supposed to be a
disk image
> but does not "look like" (in some ill-defined sense) a disk
image,
> eg. it has no partition table.
>
> I'm not sure if qemu has any existing features covering the above (and
> I know for sure that nbdkit doesn't).
>
> One issue is that calculating a checksum involves a linear scan of the
> image, although we can at least skip holes.
Kubvirt can use blksum
https://fosdem.org/2023/schedule/event/vai_blkhash_fast_disk/

But we need to package it for Fedora/CentOS Stream.

I also work on "qemu-img checksum", getting more reviews on this can
help:
Lastest version:
https://lists.nongnu.org/archive/html/qemu-block/2022-11/msg00971.html
Last reveiw are here:
https://lists.nongnu.org/archive/html/qemu-block/2022-12/

More work is needed on the testing framework changes.

Nir

Eric Blake

2023-Feb-27 14:42 UTC

head link

[Libguestfs] Checksums and other verification

On Mon, Feb 27, 2023 at 01:56:26PM +0000, Richard W.M. Jones
wrote:> 
> https://github.com/kubevirt/containerized-data-importer/issues/1520
> 
> Hi Eric,
> 
> We had a question from the Kubevirt team related to the above issue.
> The question is roughly if it's possible to calculate the checksum of
> an image as an nbdkit filter and/or in the qemu block layer.
In the qemu block layer - yes: see Nir's https://gitlab.com/nirs/blkhash

Note that there is a huge difference between a block-based checksum (a
checksum of the block data the guest will see) and a checksum of the
original file (bytes as visible on the source, although with non-raw
files, more than one image may hash to the same guest-visible contents
despite having different host checksums).

Also, it may prove to be more efficient to generate a Merkle Tree hash
of an image (an image is divided into smaller portions in a
binary-tree fanout, where the hash of the entire image is computed by
combining hashes of child nodes up to the root of the tree - which
allows downloading blocks out of order).  [You may be more familiar
with Merkle Trees than you realize - every git commit id is ultimately
a Merkle Tree hash of all prior commits]

As for nbdkit being able to do hashing as a filter, we don't have such
a filter now, but I think it would be technically possible to
implement one.  The trickiest part would be figuring out a way to
expose the checksum to the client once the client has finally read
through the entire image.  It would be easy to have nbdkit output the
resulting hash in a secondary file for consumption by the end client,
harder but potentially more useful would be extending the NBD protocol
itself to allow the NBD client to issue a query to the server to
provide the hash directly (or an indication that the hash is not yet
known because not all blocks have been hashed yet).
> 
> Supplemental #1: could qemu-img convert calculate a checksum as it goes
> along?
Nir's work on blkhash seems like that is doable.
> 
> Supplemental #2: could we detect various sorts of common errors, such
> a webserver that is incorrectly configured and serves up an error page
> containing "<html>"; or something which is supposed to be a
disk image
> but does not "look like" (in some ill-defined sense) a disk
image,
> eg. it has no partition table.
> 
> I'm not sure if qemu has any existing features covering the above (and
> I know for sure that nbdkit doesn't).
Indeed.  But adding a filter that does a pre-read of the plugin's
firsts 1M during .prepare to look for an expected signature (what is
sufficient, seeing if there is a partition table?) and refuses to let
the client connect if the plugin is serving wrong data seems fairly
straightforward.
> 
> One issue is that calculating a checksum involves a linear scan of the
> image, although we can at least skip holes.
Or intentionally choose a hash that can be computed out-of-order, such
as a Merkle Tree.  But we'd need a standard setup for all parties to
agree on how the hash is to be computed and checked, if it is going to
be anything more than just a linear hash of the entire guest-visible
contents.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Maybe Matching Threads

Search for more maybe matching threads

Libguestfs - Feb 2023 - Checksums and other verification

[Libguestfs] Checksums and other verification

[Libguestfs] Checksums and other verification

[Libguestfs] Checksums and other verification

Maybe Matching Threads