Nathan March
2015-Mar-12 18:11 UTC
[CentOS-virt] Tapdisk processes being left behind when hvm domu's migrate/shutdown
Hi All,
I'm seeing tapdisk processes not being terminated after a HVM vm is shutdown
or migrated away. I don't see this problem with linux paravirt domu's,
just windows hvm ones.
xl.cfg:
name = 'nathanwin'
memory = 4096
vcpus = 2
disk = [ 'file:/mnt/gtc_disk_p1/nathanwin/drive_c,hda,w' ]
vif = [ 'mac=00:16:3D:01:03:E0,bridge=vlan208' ]
builder = "hvm"
kernel = "/usr/lib/xen/boot/hvmloader"
localtime = 0
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "destroy"
vnc = 1
vncunused = 1
cpuid = [
'0:eax=00000000000000000000000000001011',
'1:eax=00000000000000100000011011000010,ecx=10000011101110100010001000000011,edx=00010111100010111111101111111111',
'2:eax=01010101000000110101101000000001',
'7,0:eax=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,ebx=00000000000000000000000000000000,ecx=00000000000000000000000000000000,edx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
'13,1:eax=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0',
'10:ebx=00000000000000000000000000000000',
'11:edx=00000000000000000000000000000000',
'2147483650:eax=01100101011101000110111001001001,ebx=00101001010100100010100001101100,ecx=01101111011001010101100000100000,edx=00101001010100100010100001101110',
'2147483651:eax=01010101010100000100001100100000,ebx=00100000001000000010000000100000,ecx=00100000001000000010000000100000,edx=01001100001000000010000000100000',
'2147483652:eax=00110000001101000011011000110101,ebx=00100000010000000010000000100000,ecx=00110111001100100010111000110010,edx=00000000011110100100100001000111',
'2147483656:eax=00000000000000000011000000101000',
]
Starting with the VM running initially on another host, I migrate it in:
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/1450)
Loading new save file <incoming migration stream> (new xl fmt info
0x0/0x0/1450)
Savefile contains xl domain config
WARNING: ignoring "kernel" directive for HVM guest. Use
"firmware_override" instead if you really want a non-default firmware
xc: progress: Reloading memory pages: 56320/1114193 5%
xc: progress: Reloading memory pages: 1003520/1114193 90%
DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c
DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev0
DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c
DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev2
migration target: Transfer complete, requesting permission to start domain.
migration sender: Target has acknowledged transfer.
migration sender: Giving target permission to start.
migration target: Got permission, starting domain.
migration target: Domain started successsfully.
migration sender: Target reports successful startup.
DEBUG libxl__device_destroy_tapdisk 66
type=aio:/mnt/gtc_disk_p1/nathanwin/drive_c
disk=:/mnt/gtc_disk_p1/nathanwin/drive_c
Migration successful.
and now I have 2 tapdisk procs:
gtc-vana-005 ~ # ps auxf | grep tapdisk
root 32491 0.1 0.2 20364 4636 ? SLs 11:06 0:00 tapdisk
root 32520 0.0 0.2 20364 4636 ? SLs 11:06 0:00 tapdisk
Which seems odd given that the VM in question only has a single disk attached to
it and the qemu proc indicates it's using tapdev2:
root 32524 0.4 0.7 323208 15040 ? SLsl 11:06 0:00
/usr/lib/xen/bin/qemu-system-i386 -xen-domid 3 -chardev
socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-3,server,nowait -mon
chardev=libxl-cmd,mode=control -nodefaults -name nathanwin--incoming -vnc
127.0.0.1:0,to=99 -device cirrus-vga -global vga.vram_size_mb=8 -boot order=cda
-smp 2,maxcpus=2 -device rtl8139,id=nic0,netdev=net0,mac=00:16:3d:01:03:e0
-netdev type=tap,id=net0,ifname=vif3.0-emu,script=no,downscript=no -incoming
fd:13 -machine xenfv -m 4088 -drive
file=/dev/xen/blktap-2/tapdev2,if=ide,index=0,media=disk,format=raw,cache=writeback
gtc-vana-005 ~ # lsof -p 32520 | grep blktap-2
tapdisk 32520 root mem CHR 246,2 886671
/dev/xen/blktap-2/blktap2
tapdisk 32520 root 19u CHR 246,2 0t0 886671
/dev/xen/blktap-2/blktap2
gtc-vana-005 ~ # lsof -p 32491 | grep blktap-2
tapdisk 32491 root mem CHR 246,0 903999
/dev/xen/blktap-2/blktap0
tapdisk 32491 root 14u CHR 246,0 0t0 903999
/dev/xen/blktap-2/blktap0
I then migrate this VM off to another host:
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/1450)
Loading new save file <incoming migration stream> (new xl fmt info
0x0/0x0/1450)
Savefile contains xl domain config
WARNING: ignoring "kernel" directive for HVM guest. Use
"firmware_override" instead if you really want a non-default firmware
xc: progress: Reloading memory pages: 56320/1114193 5%
xc: progress: Reloading memory pages: 1003520/1114193 90%
DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c
DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev2
DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c
DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev3
migration target: Transfer complete, requesting permission to start domain.
migration sender: Target has acknowledged transfer.
migration sender: Giving target permission to start.
migration target: Got permission, starting domain.
migration target: Domain started successsfully.
migration sender: Target reports successful startup.
DEBUG libxl__device_destroy_tapdisk 66
type=aio:/mnt/gtc_disk_p1/nathanwin/drive_c
disk=:/mnt/gtc_disk_p1/nathanwin/drive_c
Migration successful.
and I'm down to one tapdisk proc that didn't get cleaned up:
gtc-vana-005 ~ # ps auxf | grep tapdisk
root 32520 0.0 0.2 20364 4636 ? SLs 11:06 0:00 tapdisk
So it seems like xen is creating a second tapdisk proc on startup for some
reason when it doesn't need to, then on cleanup it's only killing one of
the two procs.
Any thoughts? This is on the latest 4.4.1-7.el6 packages.
Thanks!
- Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.centos.org/pipermail/centos-virt/attachments/20150312/ddf6d05d/attachment.html>
George Dunlap
2015-Mar-12 18:56 UTC
[CentOS-virt] Tapdisk processes being left behind when hvm domu's migrate/shutdown
On Thu, Mar 12, 2015 at 6:11 PM, Nathan March <nathan at gt.net> wrote:> Hi All, > > > > I'm seeing tapdisk processes not being terminated after a HVM vm is shutdown > or migrated away. I don't see this problem with linux paravirt domu's, just > windows hvm ones.Interesting -- actually you get the same effect just starting and shutting down a guest. It creates two tapdisk processes, but on shutdown only destroys one. I'll look into it. -George
George Dunlap
2015-Mar-13 14:20 UTC
[CentOS-virt] Tapdisk processes being left behind when hvm domu's migrate/shutdown
On Thu, Mar 12, 2015 at 6:56 PM, George Dunlap <dunlapg at umich.edu> wrote:> On Thu, Mar 12, 2015 at 6:11 PM, Nathan March <nathan at gt.net> wrote: >> Hi All, >> >> >> >> I'm seeing tapdisk processes not being terminated after a HVM vm is shutdown >> or migrated away. I don't see this problem with linux paravirt domu's, just >> windows hvm ones. > > Interesting -- actually you get the same effect just starting and > shutting down a guest. It creates two tapdisk processes, but on > shutdown only destroys one. > > I'll look into it.OK, this turns out to be a bug in the patch on the patchqueue used to import XenServers "blktap 2.5". Let me see if I can work up a quick fix... otherwise it may have to wait until I get a chance to integrate blktap 2.5 properly upstream. -George
Possibly Parallel Threads
- Tapdisk processes being left behind when hvm domu's migrate/shutdown
- [PATCH] libxl: attempt to cleanup tapdisk processes on disk backend destroy
- A little confusion between "tapdisk" and "tapdisk-ioemu"
- domU can not start in Xen 4.0.1-rc3-pre using tapdisk
- domU can not start in Xen 4.0.1-rc3-pre using tapdisk