thr3ads.net - Libguestfs - Re: [Libguestfs] New extents structure proposal [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Martin Kletzander

2019-Mar-21 12:18 UTC

Re: [Libguestfs] New extents structure proposal

On Wed, Mar 20, 2019 at 12:11:57PM -0500, Eric Blake
wrote:>On 3/20/19 11:57 AM, Richard W.M. Jones wrote:
>
>>
>>>> Also an observation: qemu's nbd client only ever issues
block status
>>>> requests with the req-one flag set, so perhaps we should
optimize for
>>>> that case.
>>>
>>> I hope to get to the point where future qemu doesn't send the
req-one
>>> flag. There's several threads on the qemu list about how
qemu-img is
>>> slower than it needs to be because it is throwing away useful
>>> information, and where it is aggravated by the kernel's
abyssmal lseek()
>>> performance on tmpfs.  But until qemu learns useful caching,
you're
>>> right that most existing NBD clients that request block status do
so one
>>> extent at a time (because I don't know of any other existing
NBD clients
>>> that use BLOCK_STATUS yet).
>>
>> Is it ever possible to cache block status results?  What happens in
>> the (admittedly unusual) case where two writers are hitting the same
>> NBD server?  For example if the server is implementing a cluster
>> filesystem.
>
>For a read-only client: caching 'data' regions is okay, caching
'zero'
>or 'hole' regions is bad (because even though you are not modifying
the
>image, another writer might be; demoting 'hole' to 'data' is
safe - it
>merely pessimizes into a read() instead of skipping work; but caching
>'hole' that is later promoted to 'data' is wrong - it can
cause data
>lass if the client doesn't read the actual data).
>
>For a writing client: either you are an exclusive writer (and should
>know what you wrote, so the cache is the fact that you changed the state
>yourself) or you are on a cluster filesystem (at which point, your
>cluster system better have its own rules for how to resolve races
>inherent in multiple writers, where you shouldn't be relying on block
>status but on the cluster protocol in the first place).
>
Even for the reading client, you don't need to need to access a place on the
disk twice, even one access is racy because there can be a change between
BLOCK_STATUS and READ.  And that same thing happens in the plugins for files and
everything that someone else can access.  I don't think it is designed for
concurrent access.  Or is it?
>--
>Eric Blake, Principal Software Engineer
>Red Hat, Inc.           +1-919-301-3226
>Virtualization:  qemu.org | libvirt.org
>

Eric Blake

2019-Mar-21 14:01 UTC

head link

Re: [Libguestfs] New extents structure proposal

On 3/21/19 7:18 AM, Martin Kletzander wrote:
>> For a read-only client: caching 'data' regions is okay, caching
'zero'
>> or 'hole' regions is bad (because even though you are not
modifying the
>> image, another writer might be; demoting 'hole' to
'data' is safe - it
>> merely pessimizes into a read() instead of skipping work; but caching
>> 'hole' that is later promoted to 'data' is wrong - it
can cause data
>> lass if the client doesn't read the actual data).
>>
>> For a writing client: either you are an exclusive writer (and should
>> know what you wrote, so the cache is the fact that you changed the
state
>> yourself) or you are on a cluster filesystem (at which point, your
>> cluster system better have its own rules for how to resolve races
>> inherent in multiple writers, where you shouldn't be relying on
block
>> status but on the cluster protocol in the first place).
>>
> 
> Even for the reading client, you don't need to need to access a place
on
> the
> disk twice, even one access is racy because there can be a change between
> BLOCK_STATUS and READ.  And that same thing happens in the plugins for
> files and
> everything that someone else can access.  I don't think it is designed
for
> concurrent access.  Or is it?
Indeed, there is always a TOCTTOU race when you rely on block status if
there is ever a concurrent writer.  But, is it dangerous?  Without a
block status, we can have either:

reader           writer
------------------------
read sector X as A
                 write sector X as B

or:

reader           writer
------------------------
                 write sector X as B
read sector X as B

with the two steps, we have one of:

reader           writer
------------------------
learn sector X has status Y
read sector X as A
                 write sector X as B

reader           writer
------------------------
learn sector X has status Y
                 write sector X as B
read sector X as B

reader           writer
------------------------
                 write sector X as B
learn sector X has status Y
read sector X as B

where the shortcut is that if the reader sees status 'hole', it skips a
read. Had it done the read in spite of learning about a hole, it would
either see all 0s (contents A - but that's no different than read
winning the race without a status check), or the new content (contents B
just written by the writer - proving the hole status is out of date, but
no different from losing the race without a status check).

In general, trying to copy an image while it is being modified is not
going to work reliably; the main point of block status is to make
copying more efficient, but when copying, you are assuming no concurrent
modifications.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Martin Kletzander

2019-Mar-21 14:10 UTC

head link

Re: [Libguestfs] New extents structure proposal

On Thu, Mar 21, 2019 at 09:01:26AM -0500, Eric Blake
wrote:>On 3/21/19 7:18 AM, Martin Kletzander wrote:
>
>>> For a read-only client: caching 'data' regions is okay,
caching 'zero'
>>> or 'hole' regions is bad (because even though you are not
modifying the
>>> image, another writer might be; demoting 'hole' to
'data' is safe - it
>>> merely pessimizes into a read() instead of skipping work; but
caching
>>> 'hole' that is later promoted to 'data' is wrong -
it can cause data
>>> lass if the client doesn't read the actual data).
>>>
>>> For a writing client: either you are an exclusive writer (and
should
>>> know what you wrote, so the cache is the fact that you changed the
state
>>> yourself) or you are on a cluster filesystem (at which point, your
>>> cluster system better have its own rules for how to resolve races
>>> inherent in multiple writers, where you shouldn't be relying on
block
>>> status but on the cluster protocol in the first place).
>>>
>>
>> Even for the reading client, you don't need to need to access a
place on
>> the
>> disk twice, even one access is racy because there can be a change
between
>> BLOCK_STATUS and READ.  And that same thing happens in the plugins for
>> files and
>> everything that someone else can access.  I don't think it is
designed for
>> concurrent access.  Or is it?
>
>Indeed, there is always a TOCTTOU race when you rely on block status if
>there is ever a concurrent writer.  But, is it dangerous?  Without a
>block status, we can have either:
>
>reader           writer
>------------------------
>read sector X as A
>                 write sector X as B
>
>or:
>
>reader           writer
>------------------------
>                 write sector X as B
>read sector X as B
>
>with the two steps, we have one of:
>
>reader           writer
>------------------------
>learn sector X has status Y
>read sector X as A
>                 write sector X as B
>
>reader           writer
>------------------------
>learn sector X has status Y
>                 write sector X as B
>read sector X as B
>
>reader           writer
>------------------------
>                 write sector X as B
>learn sector X has status Y
>read sector X as B
>
>where the shortcut is that if the reader sees status 'hole', it
skips a
>read. Had it done the read in spite of learning about a hole, it would
>either see all 0s (contents A - but that's no different than read
>winning the race without a status check), or the new content (contents B
>just written by the writer - proving the hole status is out of date, but
>no different from losing the race without a status check).
>
>In general, trying to copy an image while it is being modified is not
>going to work reliably; the main point of block status is to make
>copying more efficient, but when copying, you are assuming no concurrent
>modifications.
>
Yes, basically it should not be used for communication.  So we are on the same
page, I just wanted to confirm that.
>--
>Eric Blake, Principal Software Engineer
>Red Hat, Inc.           +1-919-301-3226
>Virtualization:  qemu.org | libvirt.org
>

Apparently Analagous Threads

Search for more maybe matching threads

Libguestfs - Mar 2019 - Re: New extents structure proposal

Re: [Libguestfs] New extents structure proposal

Re: [Libguestfs] New extents structure proposal

Re: [Libguestfs] New extents structure proposal

Apparently Analagous Threads