thr3ads.net - libvirt users - Re: Permission to disk set wrong when restoring from memory snapshot? [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Liran Rotenberg

2020-Jun-09 13:00 UTC

Re: Permission to disk set wrong when restoring from memory snapshot?

On Tue, Jun 9, 2020 at 3:46 PM Peter Krempa <pkrempa@redhat.com> wrote:
> On Tue, Jun 09, 2020 at 15:38:53 +0300, Liran Rotenberg wrote:
> > Hi all,
> > Passing on Bug 1840609 <
> https://bugzilla.redhat.com/show_bug.cgi?id=1840609>
> > - Wake up from hibernation failed:internal error: unable to execute
QEMU
> > command 'cont': Failed to get "write" lock.
> >
> > In Ovirt/RHV there is a specific flow that prevents the VM from
starting
> on
> > the first host.
> > The result is:
> > 2020-06-09T12:12:58.111610Z qemu-kvm: Failed to get "write"
lock
> > Is another process using the image
> >
>
[/rhev/data-center/3b67fb92-906b-11ea-bb36-482ae35a5f83/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe]?
> > 2020-06-09T12:12:58.668140Z qemu-kvm: terminating on signal 15 from
pid
> > 177876 (<unknown process>)
>
> This error comes from qemu's internal file locking. It usually means
> that there is another qemu or qemu-img which has the given image open.
>
> Is there anything which would access the image at that specific time or
> slightly around?
>I don't think so. The volumes are created and added to the volume chain on
the VM metadata domxml(two snapshots created). Then the user restores the
latest snapshot and deletes them (while the VM is down) - they are removed.
The VM is set with the volume and going up, restoring the memory. The
mounting place (in /rhev/data-center) points to the same disk and volume.
On the first run I see the new place /rhev/data-center/<uuid>... that I
can't tell why or where it comes from. It is set with 'rw', while
the
normal destination to the shared NFS is only with 'r'.
>
> It might be a race condition from something trying to modify the image
> perhaps combined with propagation of locks via NFS.
>I can't see any race from the RHV point of view. The strangest thing is why
it works on the second host?
In RHV after the VM is up we remove the memory disk, but that doesn't
happen in this case, since the VM wasn't up.

Peter Krempa

2020-Jun-09 13:14 UTC

head link

Re: Permission to disk set wrong when restoring from memory snapshot?

On Tue, Jun 09, 2020 at 16:00:02 +0300, Liran Rotenberg
wrote:> On Tue, Jun 9, 2020 at 3:46 PM Peter Krempa <pkrempa@redhat.com>
wrote:
> 
> > On Tue, Jun 09, 2020 at 15:38:53 +0300, Liran Rotenberg wrote:
> > > Hi all,
> > > Passing on Bug 1840609 <
> > https://bugzilla.redhat.com/show_bug.cgi?id=1840609>
> > > - Wake up from hibernation failed:internal error: unable to
execute QEMU
> > > command 'cont': Failed to get "write" lock.
> > >
> > > In Ovirt/RHV there is a specific flow that prevents the VM from
starting
> > on
> > > the first host.
> > > The result is:
> > > 2020-06-09T12:12:58.111610Z qemu-kvm: Failed to get
"write" lock
> > > Is another process using the image
> > >
> >
[/rhev/data-center/3b67fb92-906b-11ea-bb36-482ae35a5f83/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe]?
> > > 2020-06-09T12:12:58.668140Z qemu-kvm: terminating on signal 15
from pid
> > > 177876 (<unknown process>)
> >
> > This error comes from qemu's internal file locking. It usually
means
> > that there is another qemu or qemu-img which has the given image open.
> >
> > Is there anything which would access the image at that specific time
or
> > slightly around?
> >
> I don't think so. The volumes are created and added to the volume chain
on
> the VM metadata domxml(two snapshots created). Then the user restores the
> latest snapshot and deletes them (while the VM is down) - they are removed.
> The VM is set with the volume and going up, restoring the memory. The
> mounting place (in /rhev/data-center) points to the same disk and volume.
> On the first run I see the new place /rhev/data-center/<uuid>... that
I
> can't tell why or where it comes from. It is set with 'rw',
while the
> normal destination to the shared NFS is only with 'r'.
That's definitely something not related to the locking itself.

Please attach the XML document used to start the VM in both places
(working and non-working). I can't tell if there's a difference without
seeing those.

The difference might very well be in the XMLs.
> > It might be a race condition from something trying to modify the image
> > perhaps combined with propagation of locks via NFS.
> >
> I can't see any race from the RHV point of view. The strangest thing is
why
> it works on the second host?
> In RHV after the VM is up we remove the memory disk, but that doesn't
> happen in this case, since the VM wasn't up.
I'd speculate that it's timing. There's more time for the locks to
propagate or an offending process to terminate. The error message
definitely does not look like a permission issue, and
qemuSetupImagePathCgroup:75 is in the end a no-op on on non-device
paths (makes sense only for LVM logical volumes, partitions etc).

Liran Rotenberg

2020-Jun-09 13:41 UTC

head link

Re: Permission to disk set wrong when restoring from memory snapshot?

On Tue, Jun 9, 2020 at 4:14 PM Peter Krempa <pkrempa@redhat.com> wrote:
> On Tue, Jun 09, 2020 at 16:00:02 +0300, Liran Rotenberg wrote:
> > On Tue, Jun 9, 2020 at 3:46 PM Peter Krempa <pkrempa@redhat.com>
wrote:
> >
> > > On Tue, Jun 09, 2020 at 15:38:53 +0300, Liran Rotenberg wrote:
> > > > Hi all,
> > > > Passing on Bug 1840609 <
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1840609>
> > > > - Wake up from hibernation failed:internal error: unable to
execute
> QEMU
> > > > command 'cont': Failed to get "write"
lock.
> > > >
> > > > In Ovirt/RHV there is a specific flow that prevents the VM
from
> starting
> > > on
> > > > the first host.
> > > > The result is:
> > > > 2020-06-09T12:12:58.111610Z qemu-kvm: Failed to get
"write" lock
> > > > Is another process using the image
> > > >
> > >
>
[/rhev/data-center/3b67fb92-906b-11ea-bb36-482ae35a5f83/4fd23357-6047-46c9-aa81-ba6a12a9e8bd/images/0191384a-3e0a-472f-a889-d95622cb6916/7f553f44-db08-480e-8c86-cbdeccedfafe]?
> > > > 2020-06-09T12:12:58.668140Z qemu-kvm: terminating on signal
15 from
> pid
> > > > 177876 (<unknown process>)
> > >
> > > This error comes from qemu's internal file locking. It
usually means
> > > that there is another qemu or qemu-img which has the given image
open.
> > >
> > > Is there anything which would access the image at that specific
time or
> > > slightly around?
> > >
> > I don't think so. The volumes are created and added to the volume
chain
> on
> > the VM metadata domxml(two snapshots created). Then the user restores
the
> > latest snapshot and deletes them (while the VM is down) - they are
> removed.
> > The VM is set with the volume and going up, restoring the memory. The
> > mounting place (in /rhev/data-center) points to the same disk and
volume.
> > On the first run I see the new place /rhev/data-center/<uuid>...
that I
> > can't tell why or where it comes from. It is set with
'rw', while the
> > normal destination to the shared NFS is only with 'r'.
>
> That's definitely something not related to the locking itself.
>
> Please attach the XML document used to start the VM in both places
> (working and non-working). I can't tell if there's a difference
without
> seeing those.
>
> The difference might very well be in the XMLs.
>I added a snippet from the engine log. The two first domxmls are from the
first attempt on 'host_mixed_1' resulting in:
"unable to execute QEMU command 'cont': Failed to get
"write" lock.">From this point, it's the second attempt on 'host_mixed_2'
(2020-06-0915:13:00,308+03 in log time).
>
> > > It might be a race condition from something trying to modify the
image
> > > perhaps combined with propagation of locks via NFS.
> > >
> > I can't see any race from the RHV point of view. The strangest
thing is
> why
> > it works on the second host?
> > In RHV after the VM is up we remove the memory disk, but that
doesn't
> > happen in this case, since the VM wasn't up.
>
> I'd speculate that it's timing. There's more time for the locks
to
> propagate or an offending process to terminate. The error message
> definitely does not look like a permission issue, and
> qemuSetupImagePathCgroup:75 is in the end a no-op on on non-device
> paths (makes sense only for LVM logical volumes, partitions etc).
>
>

Reasonably Related Threads

Search for more apparently analagous threads

libvirt users - Jun 2020 - Re: Permission to disk set wrong when restoring from memory snapshot?

Re: Permission to disk set wrong when restoring from memory snapshot?

Re: Permission to disk set wrong when restoring from memory snapshot?

Re: Permission to disk set wrong when restoring from memory snapshot?

Reasonably Related Threads