Andreas Olsowski
2011-Sep-09 16:41 UTC
[Xen-users] LVM (lvcreate -s) snapshot hangs when trying to access /dev/dm-X
Hello folks, before i write this one off as a bug and subsequently hand it in to [xen-devel] or [linux-lvm] im trying to gather some info. So i am looking from anyone running a current xen environment using lvm as guest storage. ** environment ** I am running xen/stable-2.6.32.x (2.6.32.45 atm, kernel.org is down) with xen4.1.2-rc1 on debian 6.0.2 (stable/squeeze). For each guest i have a logical volume for the root-filesystem and one for swap. ** circumstances ** Each morning a cronjob runs a backup script for all running guests on my xen hosts. This backup script, before and after checking volume and mountpoint availability does the following: - create snapshot - mount snapshot - tar the mounted volume - umount the snapshot - remove the snapshot ** problem prelude ** Now in the past this worked fine, and by past i mean 2.6.31.x-dom0 with Xen-4.0.x on debian 5.0(stable/lenny?) Ever sinced i upgraded my environment i''ve been having trouble with LVM hanging on snapshot creation for the FIRST guest in the list. I have yet to catch this in the act as i only added -vvvv to the command this week on one server and the only occurance of this bug after that was on a server where i did not add -vvvv ... go figure. At the moment i owe you (and myself) the real output of a lvcreate -vvvv triggering this block. Since i get one or two each week, its only a question of time until it happens. (Maybe i can provoke it by running 1000s of backup rotations without mounting the volume or tarballing it.) ** problem ** For now im gonna stick with the aftermath: After the initial process that ran in to the block, no lvm command can be successfully run anymore. (i did remove the /var/lock/lvm/ files) Sending signals to the blocking process does not get rid of it. Every command that does the same init-stuff as lvcreate and lvs is left to hang/block once it reaches the device (for example /dev/dm-14): http://pastebin.com/3f7Q3ALb The output in that paste documents pretty much the default stuff that is run on every lvm command. There are no entries made in any of the system logfiles pointing towards an obvious problem. At that point the guest is still fine, it can I/O to that device. When i try to shutdown the domain it does not "power off" due to the fact, that xen runs into the same block. When i destroy the guest, xl list shows its state as "(null) .... ---p-s" ** recovery ** I can recover by forcefully removing the block device with "dmsetup --force remove". After that, not only can i kill the processes and the guest disappears from "xl list". "lvchange -aey xen-data/myguest-root" works. Now i can create a snapshot and my backup script can successfully backup the volume again. ** questions ** This may very well be a problem with the lvm version of debian, it may be a problem with the old device mapper modules of 2.6.32, a combination of both OR its a problem with the xen hypervisor, io handling of the xen kernel code or a comination of those. Has anyone of you ever encountered this or a similar problem before? Did i miss related mails on [xen-devel] and [xen-users] that could help me fix this issue? What do you think where the problem may be (hypervisor, kernel or lvm userland utils)? If you are successfully running xen4.1 with 2.6.32 and LVM2, doint pretty much the same backup procedure as i do and have never encountered this, please let me know. Input is greatly appreciated. with best regards Andreas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users