Hello. Today i''m try to research, why most of our vm (pvm) not shutdown correctly (we use shutdown with timeout, then it arrives we destroy vm), and write simple test case to check that domU watches works correctly: xenstore write /local/domain/DOMID/control/shutdown testflag; xenstore read /local/domain/DOMID/control/shutdown If userspace watches works correctly - xenstore read returns empty, becouse domU kernel respond only to known actions and printk and clean path on unknown actions. I can''t see difference in various kernel versions (centos/opensuse/debian and other), we use oxenstored old version (that shipped with 4.0.1_21326_06-0.4.1 on sles) How can i debug this issue and get more info on this problem? P.S. Some domains, that has small uptime respond to my check and shutdown/migrate correctly. -- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
Konrad Rzeszutek Wilk
2012-Jan-17 20:15 UTC
Re: shutdown/migration stop working on paravirt domU
On Tue, Jan 17, 2012 at 12:58:14PM +0400, Vasiliy Tolstov wrote:> Hello. Today i''m try to research, why most of our vm (pvm) not > shutdown correctly (we use shutdown with timeout, then it arrives we > destroy vm), and write simple test case to check that domU watchesWhich kernel version of DomU?> works correctly: > > xenstore write /local/domain/DOMID/control/shutdown testflag; > xenstore read /local/domain/DOMID/control/shutdown > > If userspace watches works correctly - xenstore read returns empty, > becouse domU kernel respond only to known actions and printk and clean > path on unknown actions. > I can''t see difference in various kernel versions > (centos/opensuse/debian and other), we use oxenstored old version > (that shipped with 4.0.1_21326_06-0.4.1 on sles) > How can i debug this issue and get more info on this problem?Play a bit with xenctx to get an idea where the guest is stuck at.> P.S. Some domains, that has small uptime respond to my check and > shutdown/migrate correctly. > > -- > Vasiliy Tolstov, > Clodo.ru > e-mail: v.tolstov@selfip.ru > jabber: vase@selfip.ru > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Vasiliy Tolstov
2012-Jan-17 21:16 UTC
Re: shutdown/migration stop working on paravirt domU
2012/1/18 Konrad Rzeszutek Wilk <konrad@darnok.org>:> On Tue, Jan 17, 2012 at 12:58:14PM +0400, Vasiliy Tolstov wrote: >> Hello. Today i''m try to research, why most of our vm (pvm) not >> shutdown correctly (we use shutdown with timeout, then it arrives we >> destroy vm), and write simple test case to check that domU watches > > Which kernel version of DomU?2.6.32.26 from xen git tree and 2.6.18-194.26.1.el5xen> > Play a bit with xenctx to get an idea where the guest is stuck at. >Ok, nice. When i need to run xenctx and what i need to see in its output? -- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru
Konrad Rzeszutek Wilk
2012-Jan-19 19:50 UTC
Re: shutdown/migration stop working on paravirt domU
On Wed, Jan 18, 2012 at 01:16:10AM +0400, Vasiliy Tolstov wrote:> 2012/1/18 Konrad Rzeszutek Wilk <konrad@darnok.org>: > > On Tue, Jan 17, 2012 at 12:58:14PM +0400, Vasiliy Tolstov wrote: > >> Hello. Today i''m try to research, why most of our vm (pvm) not > >> shutdown correctly (we use shutdown with timeout, then it arrives we > >> destroy vm), and write simple test case to check that domU watches > > > > Which kernel version of DomU? > > 2.6.32.26 from xen git tree and 2.6.18-194.26.1.el5xenFirst order is to upgrade your kernel and see if the problem exists with a newer 2.6.32.X tree. then also try 3.0 or 3.1 kernel.> > > > > > Play a bit with xenctx to get an idea where the guest is stuck at. > > > > Ok, nice. When i need to run xenctx and what i need to see in its output?<sigh> Google for it. There should be tons of examples of how to use it to figure out where the guest is stuck at.
Vasiliy Tolstov
2012-Jan-20 07:41 UTC
Re: shutdown/migration stop working on paravirt domU
2012/1/19 Konrad Rzeszutek Wilk <konrad@darnok.org>:>> > >> > Which kernel version of DomU? >> >> 2.6.32.26 from xen git tree and 2.6.18-194.26.1.el5xen > > First order is to upgrade your kernel and see if the problem exists > with a newer 2.6.32.X tree. then also try 3.0 or 3.1 kernel. >>This is possible, but my question is - why on some version of the kernel on same node some damins coud not properly watch xenstore?>> >> > >> > Play a bit with xenctx to get an idea where the guest is stuck at. >> > >> >> Ok, nice. When i need to run xenctx and what i need to see in its output? > > <sigh> Google for it. There should be tons of examples of how to use it > to figure out where the guest is stuck at.Ok. I create two equal domU with memory=512M and maxmemory = 512M, copy System.map to dom0 and check xenctx output. Domains runs on same node and only differs in uptime. Domain, that correct works on xenstore watches: /usr/lib64/xen/bin/xenctx -s /root/System.map-2.6.32.26 --stack-trace -a 178 rip: ffffffff810093aa hypercall_page+0x3aa flags: 00001246 i z p rsp: ffffffff81539f00 rax: 0000000000000000 rcx: ffffffff810093aa rdx: ffffffff810093aa rbx: ffffffff81595ac0 rsi: 0000000000000000 rdi: 0000000000000001 rbp: ffffffff81539f18 r8: 0000000000000000 r9: ffff880003130398 r10: 0000000000000000 r11: 0000000000000246 r12: 6db6db6db6db6db7 r13: ffffffff815e1cc0 r14: ffffffff815e3720 r15: 0000000000000000 cs: e033 ss: e02b ds: 0000 es: 0000 fs: 0000 @ 00007f4f534bd740 gs: 0000 @ ffff8800030cd000/0000000000000000 cr0: 8005003b cr2: 7f8497d2d130 cr3: 8644bd000 cr4: 00002660 dr0: 00000000 dr1: 00000000 dr2: 00000000 dr3: 00000000 dr6: ffff0ff0 dr7: 00000400 Code (instr addr ffffffff810093aa) cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc Stack: ffff8800030c0000 0000000000000000 ffffffff8100e19b ffffffff81539f28 ffffffff8101835c ffffffff81539f48 ffffffff81010320 ffffffff815e3720 0000000000000000 ffffffff81539f58 ffffffff813d2ab2 ffffffff81539f98 ffffffff815afd34 ffffffff81539f98 ffffffff815e3720 00000000016c6818 Stack Trace: * [<ffffffff810093aa>] hypercall_page+0x3aa <-- ffff8800030c0000 0000000000000000 [<ffffffff8100e19b>] xen_safe_halt+0x10 ffffffff81539f28 [<ffffffff8101835c>] default_idle+0x25 ffffffff81539f48 [<ffffffff81010320>] cpu_idle+0x58 ffffffff815e3720 0000000000000000 ffffffff81539f58 [<ffffffff813d2ab2>] rest_init+0x66 ffffffff81539f98 [<ffffffff815afd34>] start_kernel+0x3a2 ffffffff81539f98 ffffffff815e3720 00000000016c6818 0000000000000000 0000000000000000 0000000000000000 ffffffff81539fb8 [<ffffffff815af2c3>] x86_64_start_reservations+0xae ffffffff815a65a0 ffffffff81001000 ffffffff81539ff8 [<ffffffff815b1762>] xen_start_kernel+0x41d 809822011f898975 000106a500100800 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Domain that not notified on xenstore watches: /usr/lib64/xen/bin/xenctx -s /root/System.map-2.6.32.26 --stack-trace -a 676 rip: ffffffff810093aa hypercall_page+0x3aa flags: 00001246 i z p rsp: ffffffff81539f00 rax: 0000000000000000 rcx: ffffffff810093aa rdx: ffffffff810093aa rbx: ffffffff81595ac0 rsi: 0000000000000000 rdi: 0000000000000001 rbp: ffffffff81539f18 r8: 0000000000000000 r9: ffffffff81690380 r10: 0000000102299e62 r11: 0000000000000246 r12: 6db6db6db6db6db7 r13: ffffffff815e1cc0 r14: ffffffff815e3720 r15: 0000000000000000 cs: e033 ss: e02b ds: 0000 es: 0000 fs: 0000 @ 00007fa607454700 gs: 0000 @ ffff880002f36000/0000000000000000 cr0: 8005003b cr2: 7f07e7a480a0 cr3: b3c37f000 cr4: 00002660 dr0: 00000000 dr1: 00000000 dr2: 00000000 dr3: 00000000 dr6: ffff0ff0 dr7: 00000400 Code (instr addr ffffffff810093aa) cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc Stack: ffff880002f30000 0000000000000000 ffffffff8100e19b ffffffff81539f28 ffffffff8101835c ffffffff81539f48 ffffffff81010320 ffffffff815e3720 0000000000000000 ffffffff81539f58 ffffffff813d2ab2 ffffffff81539f98 ffffffff815afd34 ffffffff81539f98 ffffffff815e3720 00000000016c6818 Stack Trace: * [<ffffffff810093aa>] hypercall_page+0x3aa <-- ffff880002f30000 0000000000000000 [<ffffffff8100e19b>] xen_safe_halt+0x10 ffffffff81539f28 [<ffffffff8101835c>] default_idle+0x25 ffffffff81539f48 [<ffffffff81010320>] cpu_idle+0x58 ffffffff815e3720 0000000000000000 ffffffff81539f58 [<ffffffff813d2ab2>] rest_init+0x66 ffffffff81539f98 [<ffffffff815afd34>] start_kernel+0x3a2 ffffffff81539f98 ffffffff815e3720 00000000016c6818 0000000000000000 0000000000000000 0000000000000000 ffffffff81539fb8 [<ffffffff815af2c3>] x86_64_start_reservations+0xae ffffffff815a65a0 ffffffff81001000 ffffffff81539ff8 [<ffffffff815b1762>] xen_start_kernel+0x41d 809822011f898975 000106a500100800 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 -- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vasiliy Tolstov
2012-Jan-20 11:02 UTC
Re: shutdown/migration stop working on paravirt domU
2012/1/20 Vasiliy Tolstov <v.tolstov@selfip.ru>:> 2012/1/19 Konrad Rzeszutek Wilk <konrad@darnok.org>: >>> > >>> > Which kernel version of DomU? >>> >>> 2.6.32.26 from xen git tree and 2.6.18-194.26.1.el5xen >> >> First order is to upgrade your kernel and see if the problem exists >> with a newer 2.6.32.X tree. then also try 3.0 or 3.1 kernel. >>> > > > This is possible, but my question is - why on some version of the > kernel on same node some damins coud not properly watch xenstore? > >>> >>> > >>> > Play a bit with xenctx to get an idea where the guest is stuck at. >>> > >>> >>> Ok, nice. When i need to run xenctx and what i need to see in its output? >> >> <sigh> Google for it. There should be tons of examples of how to use it >> to figure out where the guest is stuck at. > > Ok. I create two equal domU with memory=512M and maxmemory = 512M, > copy System.map to dom0 and check xenctx output. Domains runs on same > node and only differs in uptime.Some new data is arrived - in domU that can't respond to watches xenstore-read data produce process deadlock with simple trace: dmesg: INFO: task xenstore-read:19555 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. xenstore-read D 001d2d2891ba31d0 0 19555 19529 (NOTLB) ffff88000e373e28 0000000000000286 ffff88000e373e68 ffffffff8031b6e4 0000000000000008 ffff88000a6f5820 ffffffff804f4b00 0000000000090715 ffff88000a6f5a08 ffffffff00000000 Call Trace: [<ffffffff8031b6e4>] avc_has_perm+0x46/0x58 [<ffffffff8031bfa4>] inode_has_perm+0x56/0x63 [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8031c045>] file_has_perm+0x94/0xa3 [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14 [<ffffffff803b946a>] xenbus_dev_request_and_reply+0x26/0x81 [<ffffffff803bb8e3>] xenbus_dev_write+0xd2/0x301 [<ffffffff80217377>] vfs_write+0xce/0x174 [<ffffffff80217bc4>] sys_write+0x45/0x6e [<ffffffff802602f9>] tracesys+0xab/0xb6 strace: [root@21-680 ~]# strace -ff -v -F xenstore-read control/shutdown execve("/usr/bin/xenstore-read", ["xenstore-read", "control/shutdown"], ["HOSTNAME=21-680.clodo.ru", "TERM=xterm", "SHELL=/bin/bash", "HISTSIZE=1000", "SSH_CLIENT=85.143.161.18 44053 2", "SSH_TTY=/dev/pts/4", "USER=root", "LS_COLORS=no=00:fi=00:di=00;34:l", "MAIL=/var/spool/mail/root", "PATH=/usr/local/sbin:/usr/local/", "INPUTRC=/etc/inputrc", "PWD=/root", "LANG=ru_RU.utf8", "SHLVL=1", "HOME=/root", "LOGNAME=root", "SSH_CONNECTION=85.143.161.18 440", "LESSOPEN=|/usr/bin/lesspipe.sh %", "G_BROKEN_FILENAMES=1", "_=/usr/bin/strace"]) = 0 brk(0) = 0x8469000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2755ddd000 uname({sysname="Linux", nodename="21-680.clodo.ru", release="2.6.18-194.26.1.el5xen", version="#1 SMP Tue Nov 9 13:35:30 EST 2010", machine="x86_64"}) = 0 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_dev=makedev(202, 1), st_ino=314595, st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=40, st_size=20348, st_atime=2012/01/20-10:11:58, st_mtime=2012/01/20-10:03:30, st_ctime=2012/01/20-10:03:30}) = 0 mmap(NULL, 20348, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b2755dde000 close(3) = 0 open("/usr/lib64/libxenstore.so.3.0", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \31\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_dev=makedev(202, 1), st_ino=322542, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=40, st_size=19544, st_atime=2012/01/20-10:11:58, st_mtime=2011/10/24-17:45:23, st_ctime=2012/01/11-00:11:16}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2755de3000 mmap(NULL, 2127040, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b2755fdf000 mprotect(0x2b2755fe3000, 2097152, PROT_NONE) = 0 mmap(0x2b27561e3000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x2b27561e3000 mmap(0x2b27561e4000, 9408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2b27561e4000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332\1\0\0\0\0\0"..., 832) = 832 fstat(3, {st_dev=makedev(202, 1), st_ino=106788, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=3368, st_size=1716720, st_atime=2012/01/20-10:11:58, st_mtime=2011/11/28-15:43:48, st_ctime=2012/01/11-00:10:36}) = 0 mmap(NULL, 3502424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b27561e7000 mprotect(0x2b2756335000, 2097152, PROT_NONE) = 0 mmap(0x2b2756535000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14e000) = 0x2b2756535000 mmap(0x2b275653a000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2b275653a000 close(3) = 0 open("/lib64/libpthread.so.0", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240W\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_dev=makedev(202, 1), st_ino=106801, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=288, st_size=142696, st_atime=2012/01/20-10:11:58, st_mtime=2011/11/28-15:43:49, st_ctime=2012/01/11-00:10:36}) = 0 mmap(NULL, 2204528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b275653f000 mprotect(0x2b2756555000, 2093056, PROT_NONE) = 0 mmap(0x2b2756754000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x2b2756754000 mmap(0x2b2756756000, 13168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x2b2756756000 close(3) = 0 open("/lib64/libgcc_s.so.1", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\36\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_dev=makedev(202, 1), st_ino=106983, st_mode=S_IFREG|0755, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=120, st_size=56072, st_atime=2012/01/20-10:11:58, st_mtime=2011/07/22-09:06:44, st_ctime=2012/01/10-23:56:46}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b275675a000 mmap(NULL, 2151784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b275675b000 mprotect(0x2b2756768000, 2097152, PROT_NONE) = 0 mmap(0x2b2756968000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0x2b2756968000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2756969000 arch_prctl(ARCH_SET_FS, 0x2b27569696e0) = 0 mprotect(0x2b2756754000, 4096, PROT_READ) = 0 mprotect(0x2b2756535000, 16384, PROT_READ) = 0 mprotect(0x2b2755fdd000, 4096, PROT_READ) = 0 munmap(0x2b2755dde000, 20348) = 0 set_tid_address(0x2b2756969770) = 19603 set_robust_list(0x2b2756969780, 0x18) = 0 futex(0x7fff80b838dc, FUTEX_WAKE_PRIVATE, 1) = 0 rt_sigaction(SIGRTMIN, {0x2b2756544380, [], SA_RESTORER|SA_SIGINFO, 0x2b275654db70}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {0x2b27565442b0, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x2b275654db70}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0 stat("/proc/xen/xenbus", {st_dev=makedev(0, 3), st_ino=4026534667, st_mode=S_IFREG|0400, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, st_atime=2012/01/20-10:04:03, st_mtime=2012/01/20-10:04:03, st_ctime=2012/01/20-10:04:03}) = 0 open("/proc/xen/xenbus", O_RDWR) = 3 brk(0) = 0x8469000 brk(0x848a000) = 0x848a000 rt_sigaction(SIGPIPE, {0x1, [], SA_RESTORER, 0x2b27562172d0}, {SIG_DFL, [], 0}, 8) = 0 write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\21\0\0\0", 16) = 16 write(3, "control/shutdown\0", 17 -- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Vasiliy Tolstov
2012-Jan-23 14:46 UTC
Re: shutdown/migration stop working on paravirt domU
2012/1/20 Vasiliy Tolstov <v.tolstov@selfip.ru>:> 2012/1/20 Vasiliy Tolstov <v.tolstov@selfip.ru>: >> 2012/1/19 Konrad Rzeszutek Wilk <konrad@darnok.org>: >>>> > >>>> > Which kernel version of DomU? >>>> >>>> 2.6.32.26 from xen git tree and 2.6.18-194.26.1.el5xen >>> >>> First order is to upgrade your kernel and see if the problem exists >>> with a newer 2.6.32.X tree. then also try 3.0 or 3.1 kernel. >>>>I'm finally find the problem. If domU write to xenstore some incomplete command by writin binary data to /proc/xen/xenbus (for example "\2\0\0\0\0\0\0\0\0\0\0\0\21\0\0\0") and exists, watches stop working and second attemp to read/write some from domU to xenstore failed with process lock that tries to write message... Can this be solved (or solved already in some version of xenbus driver code?). -- Vasiliy Tolstov, Clodo.ru e-mail: v.tolstov@selfip.ru jabber: vase@selfip.ru _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel