Andrew Cooper
2013-Nov-28 12:31 UTC
Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
Hello, I have recently positivly identified b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure. Serial log is attached. The crash is completely deterministic, and is from an IBM xSeries 3530 M4 server. Given the crash and bad patch, I suspect it is more to do with the NUMA/memory layout than the specifics of the server. Dario: Being your patch, do you have any ideas? George: Regarding the release, if a fix cant easily be found, it might be worth considering reverting the change. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Dario Faggioli
2013-Nov-28 13:05 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
On gio, 2013-11-28 at 12:31 +0000, Andrew Cooper wrote:> Hello, >Hi,> I have recently positivly identified > b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure. > > Serial log is attached. The crash is completely deterministic, and is > from an IBM xSeries 3530 M4 server. > > Given the crash and bad patch, I suspect it is more to do with the > NUMA/memory layout than the specifics of the server. > > Dario: Being your patch, do you have any ideas? >Wow... Not out of the top of my head... Can you try (or tell me how to do that on that box) the attached debug patch? I know it''s gross, but given where and how early the crash happens, it shouldn''t be too bad, and it would hopefully at least tell if d->node_affinity contains garbage. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2013-Nov-28 15:09 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
On 11/28/2013 12:31 PM, Andrew Cooper wrote:> Hello, > > I have recently positivly identified > b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure. > > Serial log is attached. The crash is completely deterministic, and is > from an IBM xSeries 3530 M4 server. > > Given the crash and bad patch, I suspect it is more to do with the > NUMA/memory layout than the specifics of the server. > > Dario: Being your patch, do you have any ideas?Do you have a xen-syms you can use to find out what line the crash happened at? Dom0 should have auto_node_affinity set at this point; so before this patch you''d have: nodemask = NODEMASK_MASK_NONE; [set nodes in nodemask from cpumask] d->node_affinity=nodemask After, you have: nodes_clear(d->node_affinity) [set nodes in d->node_affinity from cpumask] Everything looks like it should be the same. Can you try just reverting what''s in the positive side of the if()? I.e., adding back in nodemask=NODE_MASK_NONE at the top, and the nodemask copying, and see what happens? -George
Dario Faggioli
2013-Nov-28 15:14 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
On gio, 2013-11-28 at 15:09 +0000, George Dunlap wrote:> On 11/28/2013 12:31 PM, Andrew Cooper wrote: > Do you have a xen-syms you can use to find out what line the crash > happened at? >Yep, that would be helpful...> Dom0 should have auto_node_affinity set at this point; so before this > patch you''d have: > nodemask = NODEMASK_MASK_NONE; > [set nodes in nodemask from cpumask] > d->node_affinity=nodemask > > After, you have: > nodes_clear(d->node_affinity) > [set nodes in d->node_affinity from cpumask] > > Everything looks like it should be the same. >Exactly! I really don''t see what could be happening. Anyway, I''ve now access to the machine and can try doing some debugging...> Can you try just reverting what''s in the positive side of the if()? > I.e., adding back in nodemask=NODE_MASK_NONE at the top, and the > nodemask copying, and see what happens? >...Including this, and let you know. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andrew Cooper
2013-Nov-28 15:16 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 28/11/13 15:14, Dario Faggioli wrote:> On gio, 2013-11-28 at 15:09 +0000, George Dunlap wrote: >> On 11/28/2013 12:31 PM, Andrew Cooper wrote: >> Do you have a xen-syms you can use to find out what line the crash >> happened at? >> > Yep, that would be helpful...The exact location of the crash was xen/include/xen/mm.h:188 in page_list_add_tail() static inline void page_list_add_tail(struct page_info *page, struct page_list_head *head) { page->list.next = PAGE_LIST_NULL; if ( head->next ) { page->list.prev = page_to_pdx(head->tail); head->tail->list.next = page_to_pdx(page); <--- crash on this line. } else { page->list.prev = PAGE_LIST_NULL; head->next = page; } head->tail = page; } ~Andrew -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iQIcBAEBAgAGBQJSl15AAAoJEGjbQLT+DFRo8tQP/1bVKK8bpwrWB1QU7Ka3KCiD Are1jpklsPRVBLdTcYHlQ6nYGt+V7++QRhHhbBpEATdq5GCrlT3EacDGKNw9ECRt aRVWGJXbUdgwGJxy3CSQbN4OwiefjowYJn9Za/MMNlXAxzMOaqf/duYak8gRtbeY M5JRciDAjVrniMMqMem0dUXOaPUTvOPTleFlix9almYer9j6qPDF9ei7BcD9buLm HgFAs3laNS4V90jqXIiK85GC5bSVTiAN8PJROGApfScp4DDRGC3m+Bg4JBi5cvMI s0++BBm2LGKSJT/uAIU8z0+GJB8W+za766rY16bPc9km4EIeBgwsKEDU8SSIcRUV WbUVtJKBb8W4mQFfInpx0e70nuwJyQB8jIbof+joaFxdrEIPsduqIVEYT2/Gmzc4 kjUcrzMRlspJQXJU7JgewC8niQQ7ro3+F2uvhLjT/LJza80vcrBvD68qN8hRHp0v DsNXpdqimjRafMkpzzBftd5dBJR41AlOUQL2kwGbLBDFHMKexNCzQaq826gTFUCQ maP/XzluTwW2U2eiHQmR+fdyoLSQlBZcZAm5yh22sZMGWHTHeVB/AOawJRdfxyzC OW9fRFL0+SvYGCcGPnE2bU+ZKZtcqPNQjawph+HAQumagtJvnY35ABmWqcDP8/CQ mzvibxQxVkdOPWfqulG6 =2Io0 -----END PGP SIGNATURE-----
Andrew Cooper
2013-Nov-28 21:17 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
On 28/11/13 12:31, Andrew Cooper wrote:> Hello, > > I have recently positivly identified > b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure. > > Serial log is attached. The crash is completely deterministic, and is > from an IBM xSeries 3530 M4 server. > > Given the crash and bad patch, I suspect it is more to do with the > NUMA/memory layout than the specifics of the server. > > Dario: Being your patch, do you have any ideas? > > George: Regarding the release, if a fix cant easily be found, it might > be worth considering reverting the change. > > ~AndrewFollowing some further debugging, this is rather more complicated than I initially thought. There is some form of memory corruption; depending on which exact underlying changeset I base the XenServer patch queue on, or which pages are present in the queue, I get crashes in different locations, including faults from mis-aligned instructions including stack traces which are completely bogus. The saving grace is that the crashes appear to be completely deterministic for a given binary. (although this sever is slower than treacle to boot) ~Andrew
George Dunlap
2013-Nov-28 23:30 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
On 11/28/2013 09:17 PM, Andrew Cooper wrote:> On 28/11/13 12:31, Andrew Cooper wrote: >> Hello, >> >> I have recently positivly identified >> b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure. >> >> Serial log is attached. The crash is completely deterministic, and is >> from an IBM xSeries 3530 M4 server. >> >> Given the crash and bad patch, I suspect it is more to do with the >> NUMA/memory layout than the specifics of the server. >> >> Dario: Being your patch, do you have any ideas? >> >> George: Regarding the release, if a fix cant easily be found, it might >> be worth considering reverting the change. >> >> ~Andrew > > Following some further debugging, this is rather more complicated than I > initially thought. > > There is some form of memory corruption; depending on which exact > underlying changeset I base the XenServer patch queue on, or which pages > are present in the queue, I get crashes in different locations, > including faults from mis-aligned instructions including stack traces > which are completely bogus. > > The saving grace is that the crashes appear to be completely > deterministic for a given binary. (although this sever is slower than > treacle to boot)Well, one thing that patch certainly *does* do is remove a very large chunk of zeroed bytes from the stack (doing the work directly in the domain struct rather than doing it on the stack and then copying it in); so it''s possible you''re got an uninitialized variable somewhere... -George
Ian Campbell
2013-Nov-29 10:51 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
On Thu, 2013-11-28 at 21:17 +0000, Andrew Cooper wrote:> the XenServer patch queue onAre you positive that the bug is in the underlying Xen tree and not some interaction with a patch in your queue? A boot time issue ought to be reasonably easy to test with a bare tree. Ian.
Andrew Cooper
2013-Nov-29 11:04 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
On 29/11/13 10:51, Ian Campbell wrote:> On Thu, 2013-11-28 at 21:17 +0000, Andrew Cooper wrote: >> the XenServer patch queue on > Are you positive that the bug is in the underlying Xen tree and not some > interaction with a patch in your queue? > > A boot time issue ought to be reasonably easy to test with a bare tree. > > Ian. >I am not sure of anything at the moment, although I have found one instance of a crash with none of the XenServer patch queue whatsoever. At the moment, I have narrowed the problem down to a handful of instructions writing 0s into a well-formed region of the stack. Clearly, this is not correct, and every tweak of the debugging causes the problem to jump around. ~Andrew
Andrew Cooper
2013-Dec-02 14:01 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
On 29/11/13 11:04, Andrew Cooper wrote:> On 29/11/13 10:51, Ian Campbell wrote: >> On Thu, 2013-11-28 at 21:17 +0000, Andrew Cooper wrote: >>> the XenServer patch queue on >> Are you positive that the bug is in the underlying Xen tree and not some >> interaction with a patch in your queue? >> >> A boot time issue ought to be reasonably easy to test with a bare tree. >> >> Ian. >> > I am not sure of anything at the moment, although I have found one > instance of a crash with none of the XenServer patch queue whatsoever. > > At the moment, I have narrowed the problem down to a handful of > instructions writing 0s into a well-formed region of the stack. > Clearly, this is not correct, and every tweak of the debugging causes > the problem to jump around. > > ~AndrewAfter some more investigation, this is not a regression at all, although the patch is directly relevant to identifying the problem. PXELINUX 4.04 2011-04-18 Copyright (C) 1994-2011 H. Peter Anvin et al boot: Loading xenrt/xen-minnow.gz... ok Loading xenrt/vmlinuz... ok After multiboot magic check Opcode from 0x105fef: 97 0e 00 00 49 8d be b0 Before lret into trampoline Opcode from 0x105fef: 97 0e 00 00 49 8d be b0 After (failed) conditional jmp to start_secondary Opcode from 0xffff830000105fef: 97 0e 86 00 49 8d be b0 __ __ _ _ _____ _ \ \/ /___ _ __ | || | |___ / / | \ // _ \ ''_ \ | || |_ |_ \ | | Something between entering the trampoline and emerging in 64bit mode is corrupting a single byte at phys 0x105ff1 from its correct value to a value of 0x86. The corruption disappears if the "no-real-mode" is used. Currently the BIOS is trying to be updated, but the intersection of operating systems which will successfully boot, and will successfully run the IBM update tool is rather low. ~Andrew
Jan Beulich
2013-Dec-02 14:36 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
>>> On 02.12.13 at 15:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > After some more investigation, this is not a regression at all, although > the patch is directly relevant to identifying the problem. > > PXELINUX 4.04 2011-04-18 Copyright (C) 1994-2011 H. Peter Anvin et al > boot: > Loading xenrt/xen-minnow.gz... ok > Loading xenrt/vmlinuz... ok > After multiboot magic check > Opcode from 0x105fef: 97 0e 00 00 49 8d be b0 > Before lret into trampoline > Opcode from 0x105fef: 97 0e 00 00 49 8d be b0 > After (failed) conditional jmp to start_secondary > Opcode from 0xffff830000105fef: 97 0e 86 00 49 8d be b0 > __ __ _ _ _____ _ > \ \/ /___ _ __ | || | |___ / / | > \ // _ \ ''_ \ | || |_ |_ \ | | > > > Something between entering the trampoline and emerging in 64bit mode is > corrupting a single byte at phys 0x105ff1 from its correct value to a > value of 0x86. > > The corruption disappears if the "no-real-mode" is used.And I''d say the primary suspect is /* * Declare that our target operating mode is long mode. * Initialise 32-bit registers since some buggy BIOSes depend on it. */ movl $0xec00,%eax # declare target operating mode movl $0x0002,%ebx # long mode int $0x15 considering that 0x86 is a relatively common "function not implemented" indicator for BIOS, namely INT 15, functions. As a possible workaround I''d consider trying a) zeroing %esp rather than just %sp a few lines up from the above quoted code b) zeroing the high halves of all registers Jan
Andrew Cooper
2013-Dec-03 19:53 UTC
Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"
On 02/12/13 14:36, Jan Beulich wrote:>>>> On 02.12.13 at 15:01, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> After some more investigation, this is not a regression at all, although >> the patch is directly relevant to identifying the problem. >> >> PXELINUX 4.04 2011-04-18 Copyright (C) 1994-2011 H. Peter Anvin et al >> boot: >> Loading xenrt/xen-minnow.gz... ok >> Loading xenrt/vmlinuz... ok >> After multiboot magic check >> Opcode from 0x105fef: 97 0e 00 00 49 8d be b0 >> Before lret into trampoline >> Opcode from 0x105fef: 97 0e 00 00 49 8d be b0 >> After (failed) conditional jmp to start_secondary >> Opcode from 0xffff830000105fef: 97 0e 86 00 49 8d be b0 >> __ __ _ _ _____ _ >> \ \/ /___ _ __ | || | |___ / / | >> \ // _ \ ''_ \ | || |_ |_ \ | | >> >> >> Something between entering the trampoline and emerging in 64bit mode is >> corrupting a single byte at phys 0x105ff1 from its correct value to a >> value of 0x86. >> >> The corruption disappears if the "no-real-mode" is used. > And I''d say the primary suspect is > > /* > * Declare that our target operating mode is long mode. > * Initialise 32-bit registers since some buggy BIOSes depend on it. > */ > movl $0xec00,%eax # declare target operating mode > movl $0x0002,%ebx # long mode > int $0x15 > > considering that 0x86 is a relatively common "function not > implemented" indicator for BIOS, namely INT 15, functions. > > As a possible workaround I''d consider trying > a) zeroing %esp rather than just %sp a few lines up from the > above quoted code > b) zeroing the high halves of all registers > > Jan >Your suspicion would be entirely correct. I have positively identified this `int $0x15` call as corrupting the memory. The byte is fine immediately before and bad immediately afterwards. I have further confirmed that zeroing all 32bits of the GPRs before entering the interrupt fixes the issue. In an attempt to understand what is going on, I stuck in more debugging for the entire register/selector state before and after, to see whether anything looked like a smoking gun. (XEN) Pre-state: (XEN) eax 00007600 ebx 00000000 ecx 00000000 edx 00007600 (XEN) esi 0028b0c4 edi 00078a80 esp 00080000 ebp 00000000 (XEN) cs 7600 ds 7600 es 7600 fs 0028 gs 0028 ss 7600 If the GPRs are left as are the post state looks like: (XEN) Post-state: (XEN) eax 00008600 ebx 00000000 ecx 00000000 edx 00007600 (XEN) esi 0028b0c4 edi 00078a70 esp 00080000 ebp 00000000 (XEN) cs 7600 ds 7600 es 7600 fs 0028 gs 0028 ss 7600 If the GPRs are zeroed as much as possible, the post state looks like: (XEN) Post-state: (XEN) eax 00008600 ebx 00000000 ecx 00000000 edx 00000000 (XEN) esi 00000000 edi 00000000 esp 00000000 ebp 00000000 (XEN) cs 7600 ds 7600 es 7600 fs 0028 gs 0028 ss 7600 In both cases, the carry flag is set, which is consistent with the return value of 0x86 is %ah. I iterated through the registers, and proved that it was esp specifically which was the problem. I shall submit a patch against trampoline.S shortly. ~Andrew