thr3ads.net - Xen devel - Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode" [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Andrew Cooper

2013-Nov-28 12:31 UTC

Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Hello,

I have recently positivly identified
b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure.

Serial log is attached.  The crash is completely deterministic, and is
from an IBM xSeries 3530 M4 server.

Given the crash and bad patch, I suspect it is more to do with the
NUMA/memory layout than the specifics of the server.

Dario: Being your patch, do you have any ideas?

George: Regarding the release, if a fix cant easily be found, it might
be worth considering reverting the change.

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Dario Faggioli

2013-Nov-28 13:05 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

On gio, 2013-11-28 at 12:31 +0000, Andrew Cooper wrote:> Hello,
> Hi,
> I have recently positivly identified
> b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure.
> 
> Serial log is attached.  The crash is completely deterministic, and is
> from an IBM xSeries 3530 M4 server.
> 
> Given the crash and bad patch, I suspect it is more to do with the
> NUMA/memory layout than the specifics of the server.
> 
> Dario: Being your patch, do you have any ideas?
> Wow... Not out of the top of my head... Can you try (or tell me how to
do that on that box) the attached debug patch?

I know it''s gross, but given where and how early the crash happens, it
shouldn''t be too bad, and it would hopefully at least tell if
d->node_affinity contains garbage.

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2013-Nov-28 15:09 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

On 11/28/2013 12:31 PM, Andrew Cooper wrote:> Hello,
>
> I have recently positivly identified
> b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure.
>
> Serial log is attached.  The crash is completely deterministic, and is
> from an IBM xSeries 3530 M4 server.
>
> Given the crash and bad patch, I suspect it is more to do with the
> NUMA/memory layout than the specifics of the server.
>
> Dario: Being your patch, do you have any ideas?
Do you have a xen-syms you can use to find out what line the crash 
happened at?

Dom0 should have auto_node_affinity set at this point; so before this 
patch you''d have:
  nodemask = NODEMASK_MASK_NONE;
  [set nodes in nodemask from cpumask]
  d->node_affinity=nodemask

After, you have:
  nodes_clear(d->node_affinity)
  [set nodes in d->node_affinity from cpumask]

Everything looks like it should be the same.

Can you try just reverting what''s in the positive side of the if()? 
I.e., adding back in nodemask=NODE_MASK_NONE at the top, and the 
nodemask copying, and see what happens?

  -George

Dario Faggioli

2013-Nov-28 15:14 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

On gio, 2013-11-28 at 15:09 +0000, George Dunlap wrote:> On 11/28/2013 12:31 PM, Andrew Cooper wrote:
> Do you have a xen-syms you can use to find out what line the crash 
> happened at?
> Yep, that would be helpful...
> Dom0 should have auto_node_affinity set at this point; so before this 
> patch you''d have:
>   nodemask = NODEMASK_MASK_NONE;
>   [set nodes in nodemask from cpumask]
>   d->node_affinity=nodemask
> 
> After, you have:
>   nodes_clear(d->node_affinity)
>   [set nodes in d->node_affinity from cpumask]
> 
> Everything looks like it should be the same.
> Exactly! I really don''t see what could be happening.

Anyway, I''ve now access to the machine and can try doing some
debugging...
> Can you try just reverting what''s in the positive side of the
if()?
> I.e., adding back in nodemask=NODE_MASK_NONE at the top, and the 
> nodemask copying, and see what happens?
> ...Including this, and let you know.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Andrew Cooper

2013-Nov-28 15:16 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28/11/13 15:14, Dario Faggioli wrote:> On gio, 2013-11-28 at 15:09 +0000, George Dunlap wrote:
>> On 11/28/2013 12:31 PM, Andrew Cooper wrote:
>> Do you have a xen-syms you can use to find out what line the crash
>> happened at?
>>
> Yep, that would be helpful...
The exact location of the crash was xen/include/xen/mm.h:188 in
page_list_add_tail()

static inline void
page_list_add_tail(struct page_info *page, struct page_list_head *head)
{
    page->list.next = PAGE_LIST_NULL;
    if ( head->next )
    {
         page->list.prev = page_to_pdx(head->tail);
         head->tail->list.next = page_to_pdx(page);  <--- crash on this
line.
    }
    else
    {
        page->list.prev = PAGE_LIST_NULL;
        head->next = page;
    }
    head->tail = page;
}

~Andrew
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iQIcBAEBAgAGBQJSl15AAAoJEGjbQLT+DFRo8tQP/1bVKK8bpwrWB1QU7Ka3KCiD
Are1jpklsPRVBLdTcYHlQ6nYGt+V7++QRhHhbBpEATdq5GCrlT3EacDGKNw9ECRt
aRVWGJXbUdgwGJxy3CSQbN4OwiefjowYJn9Za/MMNlXAxzMOaqf/duYak8gRtbeY
M5JRciDAjVrniMMqMem0dUXOaPUTvOPTleFlix9almYer9j6qPDF9ei7BcD9buLm
HgFAs3laNS4V90jqXIiK85GC5bSVTiAN8PJROGApfScp4DDRGC3m+Bg4JBi5cvMI
s0++BBm2LGKSJT/uAIU8z0+GJB8W+za766rY16bPc9km4EIeBgwsKEDU8SSIcRUV
WbUVtJKBb8W4mQFfInpx0e70nuwJyQB8jIbof+joaFxdrEIPsduqIVEYT2/Gmzc4
kjUcrzMRlspJQXJU7JgewC8niQQ7ro3+F2uvhLjT/LJza80vcrBvD68qN8hRHp0v
DsNXpdqimjRafMkpzzBftd5dBJR41AlOUQL2kwGbLBDFHMKexNCzQaq826gTFUCQ
maP/XzluTwW2U2eiHQmR+fdyoLSQlBZcZAm5yh22sZMGWHTHeVB/AOawJRdfxyzC
OW9fRFL0+SvYGCcGPnE2bU+ZKZtcqPNQjawph+HAQumagtJvnY35ABmWqcDP8/CQ
mzvibxQxVkdOPWfqulG6
=2Io0
-----END PGP SIGNATURE-----

Andrew Cooper

2013-Nov-28 21:17 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

On 28/11/13 12:31, Andrew Cooper wrote:> Hello,
>
> I have recently positivly identified
> b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure.
>
> Serial log is attached.  The crash is completely deterministic, and is
> from an IBM xSeries 3530 M4 server.
>
> Given the crash and bad patch, I suspect it is more to do with the
> NUMA/memory layout than the specifics of the server.
>
> Dario: Being your patch, do you have any ideas?
>
> George: Regarding the release, if a fix cant easily be found, it might
> be worth considering reverting the change.
>
> ~Andrew
Following some further debugging, this is rather more complicated than I
initially thought.

There is some form of memory corruption; depending on which exact
underlying changeset I base the XenServer patch queue on, or which pages
are present in the queue, I get crashes in different locations,
including faults from mis-aligned instructions including stack traces
which are completely bogus.

The saving grace is that the crashes appear to be completely
deterministic for a given binary.  (although this sever is slower than
treacle to boot)

~Andrew

George Dunlap

2013-Nov-28 23:30 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

On 11/28/2013 09:17 PM, Andrew Cooper wrote:> On 28/11/13 12:31, Andrew Cooper wrote:
>> Hello,
>>
>> I have recently positivly identified
>> b54a623efbcf5bff25c55117add1b4427b4e2f1b as causing a boot failure.
>>
>> Serial log is attached.  The crash is completely deterministic, and is
>> from an IBM xSeries 3530 M4 server.
>>
>> Given the crash and bad patch, I suspect it is more to do with the
>> NUMA/memory layout than the specifics of the server.
>>
>> Dario: Being your patch, do you have any ideas?
>>
>> George: Regarding the release, if a fix cant easily be found, it might
>> be worth considering reverting the change.
>>
>> ~Andrew
>
> Following some further debugging, this is rather more complicated than I
> initially thought.
>
> There is some form of memory corruption; depending on which exact
> underlying changeset I base the XenServer patch queue on, or which pages
> are present in the queue, I get crashes in different locations,
> including faults from mis-aligned instructions including stack traces
> which are completely bogus.
>
> The saving grace is that the crashes appear to be completely
> deterministic for a given binary.  (although this sever is slower than
> treacle to boot)
Well, one thing that patch certainly *does* do is remove a very large 
chunk of zeroed bytes from the stack (doing the work directly in the 
domain struct rather than doing it on the stack and then copying it in); 
so it''s possible you''re got an uninitialized variable
somewhere...

  -George

Ian Campbell

2013-Nov-29 10:51 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

On Thu, 2013-11-28 at 21:17 +0000, Andrew Cooper wrote:> the XenServer patch queue on
Are you positive that the bug is in the underlying Xen tree and not some
interaction with a patch in your queue?

A boot time issue ought to be reasonably easy to test with a bare tree.

Ian.

Andrew Cooper

2013-Nov-29 11:04 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

On 29/11/13 10:51, Ian Campbell wrote:> On Thu, 2013-11-28 at 21:17 +0000, Andrew Cooper wrote:
>> the XenServer patch queue on
> Are you positive that the bug is in the underlying Xen tree and not some
> interaction with a patch in your queue?
>
> A boot time issue ought to be reasonably easy to test with a bare tree.
>
> Ian.
>
I am not sure of anything at the moment, although I have found one
instance of a crash with none of the XenServer patch queue whatsoever.

At the moment, I have narrowed the problem down to a handful of
instructions writing 0s into a well-formed region of the stack. 
Clearly, this is not correct, and every tweak of the debugging causes
the problem to jump around.

~Andrew

Andrew Cooper

2013-Dec-02 14:01 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

On 29/11/13 11:04, Andrew Cooper wrote:> On 29/11/13 10:51, Ian Campbell wrote:
>> On Thu, 2013-11-28 at 21:17 +0000, Andrew Cooper wrote:
>>> the XenServer patch queue on
>> Are you positive that the bug is in the underlying Xen tree and not
some
>> interaction with a patch in your queue?
>>
>> A boot time issue ought to be reasonably easy to test with a bare tree.
>>
>> Ian.
>>
> I am not sure of anything at the moment, although I have found one
> instance of a crash with none of the XenServer patch queue whatsoever.
>
> At the moment, I have narrowed the problem down to a handful of
> instructions writing 0s into a well-formed region of the stack. 
> Clearly, this is not correct, and every tweak of the debugging causes
> the problem to jump around.
>
> ~Andrew
After some more investigation, this is not a regression at all, although
the patch is directly relevant to identifying the problem.

PXELINUX 4.04 2011-04-18  Copyright (C) 1994-2011 H. Peter Anvin et al
boot:
Loading xenrt/xen-minnow.gz... ok
Loading xenrt/vmlinuz... ok
After multiboot magic check
Opcode from 0x105fef: 97 0e 00 00 49 8d be b0
Before lret into trampoline
Opcode from 0x105fef: 97 0e 00 00 49 8d be b0
After (failed) conditional jmp to start_secondary
Opcode from 0xffff830000105fef: 97 0e 86 00 49 8d be b0
 __  __            _  _    _____  _
 \ \/ /___ _ __   | || |  |___ / / |
  \  // _ \ ''_ \  | || |_   |_ \ | |


Something between entering the trampoline and emerging in 64bit mode is
corrupting a single byte at phys 0x105ff1 from its correct value to a
value of 0x86.

The corruption disappears if the "no-real-mode" is used.

Currently the BIOS is trying to be updated, but the intersection of
operating systems which will successfully boot, and will successfully
run the IBM update tool is rather low.

~Andrew

Jan Beulich

2013-Dec-02 14:36 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

>>> On 02.12.13 at 15:01, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> After some more investigation, this is not a regression at all, although
> the patch is directly relevant to identifying the problem.
> 
> PXELINUX 4.04 2011-04-18  Copyright (C) 1994-2011 H. Peter Anvin et al
> boot:
> Loading xenrt/xen-minnow.gz... ok
> Loading xenrt/vmlinuz... ok
> After multiboot magic check
> Opcode from 0x105fef: 97 0e 00 00 49 8d be b0
> Before lret into trampoline
> Opcode from 0x105fef: 97 0e 00 00 49 8d be b0
> After (failed) conditional jmp to start_secondary
> Opcode from 0xffff830000105fef: 97 0e 86 00 49 8d be b0
>  __  __            _  _    _____  _
>  \ \/ /___ _ __   | || |  |___ / / |
>   \  // _ \ ''_ \  | || |_   |_ \ | |
> 
> 
> Something between entering the trampoline and emerging in 64bit mode is
> corrupting a single byte at phys 0x105ff1 from its correct value to a
> value of 0x86.
> 
> The corruption disappears if the "no-real-mode" is used.
And I''d say the primary suspect is

        /*
         * Declare that our target operating mode is long mode.
         * Initialise 32-bit registers since some buggy BIOSes depend on it.
         */
        movl    $0xec00,%eax      # declare target operating mode
        movl    $0x0002,%ebx      # long mode
        int     $0x15

considering that 0x86 is a relatively common "function not
implemented" indicator for BIOS, namely INT 15, functions.

As a possible workaround I''d consider trying
a) zeroing %esp rather than just %sp a few lines up from the
above quoted code
b) zeroing the high halves of all registers

Jan

Andrew Cooper

2013-Dec-03 19:53 UTC

head link

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

On 02/12/13 14:36, Jan Beulich wrote:>>>> On 02.12.13 at 15:01, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> After some more investigation, this is not a regression at all,
although
>> the patch is directly relevant to identifying the problem.
>>
>> PXELINUX 4.04 2011-04-18  Copyright (C) 1994-2011 H. Peter Anvin et al
>> boot:
>> Loading xenrt/xen-minnow.gz... ok
>> Loading xenrt/vmlinuz... ok
>> After multiboot magic check
>> Opcode from 0x105fef: 97 0e 00 00 49 8d be b0
>> Before lret into trampoline
>> Opcode from 0x105fef: 97 0e 00 00 49 8d be b0
>> After (failed) conditional jmp to start_secondary
>> Opcode from 0xffff830000105fef: 97 0e 86 00 49 8d be b0
>>  __  __            _  _    _____  _
>>  \ \/ /___ _ __   | || |  |___ / / |
>>   \  // _ \ ''_ \  | || |_   |_ \ | |
>>
>>
>> Something between entering the trampoline and emerging in 64bit mode is
>> corrupting a single byte at phys 0x105ff1 from its correct value to a
>> value of 0x86.
>>
>> The corruption disappears if the "no-real-mode" is used.
> And I''d say the primary suspect is
>
>         /*
>          * Declare that our target operating mode is long mode.
>          * Initialise 32-bit registers since some buggy BIOSes depend on
it.
>          */
>         movl    $0xec00,%eax      # declare target operating mode
>         movl    $0x0002,%ebx      # long mode
>         int     $0x15
>
> considering that 0x86 is a relatively common "function not
> implemented" indicator for BIOS, namely INT 15, functions.
>
> As a possible workaround I''d consider trying
> a) zeroing %esp rather than just %sp a few lines up from the
> above quoted code
> b) zeroing the high halves of all registers
>
> Jan
>
Your suspicion would be entirely correct.  I have positively identified
this `int $0x15` call as corrupting the memory.  The byte is fine
immediately before and bad immediately afterwards.

I have further confirmed that zeroing all 32bits of the GPRs before
entering the interrupt fixes the issue.

In an attempt to understand what is going on, I stuck in more debugging
for the entire register/selector state before and after, to see whether
anything looked like a smoking gun.

(XEN) Pre-state:
(XEN) eax 00007600 ebx 00000000 ecx 00000000 edx 00007600
(XEN) esi 0028b0c4 edi 00078a80 esp 00080000 ebp 00000000
(XEN) cs 7600 ds 7600 es 7600 fs 0028 gs 0028 ss 7600

If the GPRs are left as are the post state looks like:

(XEN) Post-state:
(XEN) eax 00008600 ebx 00000000 ecx 00000000 edx 00007600
(XEN) esi 0028b0c4 edi 00078a70 esp 00080000 ebp 00000000
(XEN) cs 7600 ds 7600 es 7600 fs 0028 gs 0028 ss 7600

If the GPRs are zeroed as much as possible, the post state looks like:

(XEN) Post-state:
(XEN) eax 00008600 ebx 00000000 ecx 00000000 edx 00000000
(XEN) esi 00000000 edi 00000000 esp 00000000 ebp 00000000
(XEN) cs 7600 ds 7600 es 7600 fs 0028 gs 0028 ss 7600

In both cases, the carry flag is set, which is consistent with the
return value of 0x86 is %ah.


I iterated through the registers, and proved that it was esp
specifically which was the problem.

I shall submit a patch against trampoline.S shortly.

~Andrew

Xen devel - Nov 2013 - Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in 'auto' mode"

Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"

Re: Xen-4.3 and -unstable regression from changeset "numa-sched: leave node-affinity alone if not in ''auto'' mode"