I have an automatic process setup. It's still pretty new and obviously in need of better error handling because now I find myself in a bad state. I run snapshot-create-as across all my vms, then do zfs replication to the target backup system, then blockcommit everything. virsh snapshot-create-as --domain $vm snap --diskspec $DISK,file=$VMPREFIX/"$vm"-snap.qcow2 --disk-only --atomic --no-metadata --quiesce ... virsh blockcommit $vm $DISK --active --pivot Normally this works fine, though something went wrong on the 20th. something happened to make the blockcommit fail, but the -snap file got deleted (note to self - check return code from blockcommit command!) So now I'm in a state. The domain i still running. but it's running off the -snapshot that is in the xml. I googled around for how to recover a blockcommit from a deleted snapshot, but didn't find anything. (pointers welcome) [root@vm1 images]# virsh domblklist serv1r2 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/serv1r2-snap.qcow2 fda - hdb /var/lib/libvirt/images/virtio-win-0.1.126.iso I can see the size increasing on the deleted file in lsof: qemu-kvm 48994 49033 qemu 97u REG 0,44 1855913984 1078 /var/lib/libvirt/images/serv1r2-snap.qcow2 (deleted) ... qemu-kvm 48994 49033 qemu 97u REG 0,44 1856110592 1078 /var/lib/libvirt/images/serv1r2-snap.qcow2 (deleted) -- so, do I need to rollback the zfs snapshot image or is there some other way to recover from this snafu? Thanks.
On Thu, Jun 22, 2017 at 11:02:41 -0400, Doug Hughes wrote: [...]> virsh blockcommit $vm $DISK --active --pivot > > Normally this works fine, though something went wrong on the 20th. > something happened to make the blockcommit fail, but the -snap file got > deleted (note to self - check return code from blockcommit command!) > > So now I'm in a state. The domain i still running. but it's running off > the -snapshot that is in the xml. I googled around for how to recover a > blockcommit from a deleted snapshot, but didn't find anything. (pointers > welcome)While it's still running you are lucky as the kernel did not delete the file until you shutdown the qemu process. There are these options: 1) if you don't care about the snapshot hierarchy, but want to save the full disk contents of the VM, you can use the block copy operation and copy everything into a new image. 2) If you removed the top level image (which it seems) you can even use shallow block copy, if you pre-create the new file properly. Then you can even keep the top layer properly. 3) If it's any other than top layer, with a new-enugh qemu and some QMP command magic, any layer can be restored, but that will require some work. Please let me know which option do you want to do and I can guide you further. Please just don't turn off the VM :)
On Fri, Jun 30, 2017 at 12:05:47 +0200, Peter Krempa wrote:> On Thu, Jun 22, 2017 at 11:02:41 -0400, Doug Hughes wrote: > > [...] > > > virsh blockcommit $vm $DISK --active --pivot > > > > Normally this works fine, though something went wrong on the 20th. > > something happened to make the blockcommit fail, but the -snap file got > > deleted (note to self - check return code from blockcommit command!) > > > > So now I'm in a state. The domain i still running. but it's running off > > the -snapshot that is in the xml. I googled around for how to recover a > > blockcommit from a deleted snapshot, but didn't find anything. (pointers > > welcome)In fact, it's way simpler. If libvirt still knows about the overlay image (this is necessary only so that it can say the proper things to qemu) you can re-do the block commit: $ virsh list Id Name State ---------------------------------------------------- 3 fedora23 running $ virsh snapshot-create-as --disk-only --no-metadata fedora23 Domain snapshot 1498817916 created $ virsh domblklist fedora23 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/fedora23.1498817916 hda - $ rm /var/lib/libvirt/images/fedora23.1498817916 $ ls /var/lib/libvirt/images/fedora23.1498817916 ls: cannot access '/var/lib/libvirt/images/fedora23.1498817916': No such file or directory $ virsh blockcommit --active --pivot fedora23 vda Successfully pivoted $ virsh domblklist fedora23 Target Source ------------------------------------------------ vda /var/lib/libvirt/images/fedora23.qcow2 hda -
On Jun 30, 2017 6:22 AM, "Peter Krempa" <pkrempa@redhat.com> wrote:>> On Fri, Jun 30, 2017 at 12:05:47 +0200, Peter Krempa wrote: > > On Thu, Jun 22, 2017 at 11:02:41 -0400, Doug Hughes wrote: > > > > [...] > > > > > virsh blockcommit $vm $DISK --active --pivot > > > > > > Normally this works fine, though something went wrong on the 20th. > > > something happened to make the blockcommit fail, but the -snap filegot> > > deleted (note to self - check return code from blockcommit command!) > > > > > > So now I'm in a state. The domain i still running. but it's runningoff> > > the -snapshot that is in the xml. I googled around for how to recovera> > > blockcommit from a deleted snapshot, but didn't find anything.(pointers> > > welcome) > > In fact, it's way simpler. If libvirt still knows about the overlay > image (this is necessary only so that it can say the proper things to > qemu) you can re-do the block commit: > > $ virsh list > Id Name State > ---------------------------------------------------- > 3 fedora23 running > > $ virsh snapshot-create-as --disk-only --no-metadata fedora23 > Domain snapshot 1498817916 created > $ virsh domblklist fedora23 > Target Source > ------------------------------------------------ > vda /var/lib/libvirt/images/fedora23.1498817916 > hda - > > $ rm /var/lib/libvirt/images/fedora23.1498817916 > $ ls /var/lib/libvirt/images/fedora23.1498817916 > ls: cannot access '/var/lib/libvirt/images/fedora23.1498817916': No suchfile or directory> $ virsh blockcommit --active --pivot fedora23 vda > > Successfully pivoted > $ virsh domblklist fedora23 > Target Source > ------------------------------------------------ > vda /var/lib/libvirt/images/fedora23.qcow2 > hda - >Thanks for the reply! The original image is still there. Only the 1st and only top-level snapshot is deleted. the blockcommit fails though: [root@vm1 ~]# virsh blockcommit serv1r2 vda --active --pivot error: block copy still active: disk 'vda' already in active block job
On Fri, Jun 30, 2017 at 09:23:29 -0400, Doug Hughes wrote:> On Jun 30, 2017 6:22 AM, "Peter Krempa" <pkrempa@redhat.com> wrote: > > On Fri, Jun 30, 2017 at 12:05:47 +0200, Peter Krempa wrote: > > > On Thu, Jun 22, 2017 at 11:02:41 -0400, Doug Hughes wrote:[...]> file or directory > > $ virsh blockcommit --active --pivot fedora23 vda > > > > Successfully pivoted > > $ virsh domblklist fedora23 > > Target Source > > ------------------------------------------------ > > vda /var/lib/libvirt/images/fedora23.qcow2 > > hda - > > > > Thanks for the reply! The original image is still there. Only the 1st and > only top-level snapshot is deleted. > the blockcommit fails though: > [root@vm1 ~]# virsh blockcommit serv1r2 vda --active --pivot > error: block copy still active: disk 'vda' already in active block jobWhat does 'virsh blockjob serv1r2 vda' report? If it's something like this: $ virsh blockjob fedora23 vda Active Block Commit: [100 %] Then you just need to finalize the block job by: virsh blockjob --pivot serv1r2 vda If that's not the case, please post what virsh blockjob is reporting and the <disk> part of the XML. In case where libvirt falsely thinks that the job did not finish I'll guide you how to recover it, since restarting libvirtd would make libvirt clear the block job busy flag, but libvirt would lose the metadata.