thr3ads.net - Xen devel - [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Andre Przywara

2011-Jan-27 23:18 UTC

[Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Hi,

when I boot my machine without restricting Dom0 (dom0_mem= 
dom0_max_vcpus=) I get an _hypervisor_ crash when I run
# xl cpupool-numa-split
If Dom0''s resources are limited on the Xen cmdline, everything works
fine.
The crashdump points to a scheduling problem with weights, so I assume 
the NUMA distribution algorithm some fools the hypervisor completely.

I will investigate this further tomorrow, but maybe someone has some 
good idea.

Regards,
Andre.

root@dosorca:/data/images# xl cpupool-numa-split
(XEN) Xen BUG at sched_credit.c:990
(XEN) ----[ Xen-4.1.0-rc2-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: 0000000000000010   rbx: 0000000000000f00   rcx: 0000000000000100
(XEN) rdx: 0000000000001000   rsi: ffff830437ffa600   rdi: 0000000000000010
(XEN) rbp: ffff82c480297e10   rsp: ffff82c480297d80   r8:  0000000000000100
(XEN) r9:  0000000000000006   r10: ffff82c4802d4100   r11: 000000afc7df0edf
(XEN) r12: ffff830437ffa5e0   r13: ffff82c480117fd9   r14: ffff830437f9f2e8
(XEN) r15: ffff830434321ec0   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 000000080df4e000   cr2: ffff88179af79618
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480297d80:
(XEN)    0000000000000282 fffffed4802d3f80 0000000000000eff ffff830437ffa5e0
(XEN)    ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0 0000000000000282
(XEN)    ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00 0000000000000000
(XEN)    ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c480117fd9
(XEN)    ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40 ffff82c480125f34
(XEN)    0000000000000002 ffff830437ffa600 ffff82c4802d3f80 000000afb6f8667f
(XEN)    ffff82c480297e90 ffff82c480126259 ffff82c48024ae20 ffff82c4802d3f80
(XEN)    ffff830437f9f2e0 0000000000000000 0000000000000000 ffff82c4802b0880
(XEN)    ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123327
(XEN)    ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20 ffff82c480297f18
(XEN)    000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801233a2
(XEN)    ffff82c480297f10 ffff82c4801563f5 0000000000000000 ffff8300c7cd6000
(XEN)    0000000000000000 ffff8300c7ad4000 ffff82c480297d48 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8503f10
(XEN)    ffff8817a8503fd8 0000000000000246 ffff8817a8503e80 ffff880000000001
(XEN)    0000000000000000 0000000000000000 ffffffff810093aa 000000aafab2f86e
(XEN)    00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa
(XEN)    000000000000e033 0000000000000246 ffff8817a8503ef8 000000000000e02b
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c4801180f8>] csched_acct+0x11f/0x419
(XEN)    [<ffff82c480125f34>] execute_timer+0x4e/0x6c
(XEN)    [<ffff82c480126259>] timer_softirq_action+0xf2/0x245
(XEN)    [<ffff82c480123327>] __do_softirq+0x88/0x99
(XEN)    [<ffff82c4801233a2>] do_softirq+0x6a/0x7a
(XEN)    [<ffff82c4801563f5>] idle_loop+0x6a/0x6f
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at sched_credit.c:990
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...


-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Jan-28 06:47 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 01/28/11 00:18, Andre Przywara wrote:> Hi,
>
> when I boot my machine without restricting Dom0 (dom0_mem>
dom0_max_vcpus=) I get an _hypervisor_ crash when I run
> # xl cpupool-numa-split
> If Dom0''s resources are limited on the Xen cmdline, everything
works fine.
> The crashdump points to a scheduling problem with weights, so I assume
> the NUMA distribution algorithm some fools the hypervisor completely.
>
> I will investigate this further tomorrow, but maybe someone has some
> good idea.
I''ve seen this once with an older cpupool version on a 24 processor
machine.
It was NOT related to NUMA, but did occur only on reboot after a Dom0 panic.
The machine had an init script creating a cpupool and populating it with
cpus. The machine was in a panic loop due to the BUG in sched_acct then until
it was resetted manually. After the reset the problem was gone.

As I was never able to reproduce the problem later (the same software is
running on dozens of machines!), I assumed there was a problem related to
the first Dom0 panic, may be some destroyed BIOS tables.

Can the crash be reproduced easily?


Juergen
>
> Regards,
> Andre.
>
> root@dosorca:/data/images# xl cpupool-numa-split
> (XEN) Xen BUG at sched_credit.c:990
> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419
> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
> (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100
> (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010
> (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100
> (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf
> (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8
> (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0
> (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618
> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff82c480297d80:
> (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff ffff830437ffa5e0
> (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0 0000000000000282
> (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00 0000000000000000
> (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c480117fd9
> (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40 ffff82c480125f34
> (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80 000000afb6f8667f
> (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20 ffff82c4802d3f80
> (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000 ffff82c4802b0880
> (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123327
> (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20 ffff82c480297f18
> (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801233a2
> (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000 ffff8300c7cd6000
> (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48 0000000000000000
> (XEN) 0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8503f10
> (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80 ffff880000000001
> (XEN) 0000000000000000 0000000000000000 ffffffff810093aa 000000aafab2f86e
> (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa
> (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8 000000000000e02b
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN) [<ffff82c4801180f8>] csched_acct+0x11f/0x419
> (XEN) [<ffff82c480125f34>] execute_timer+0x4e/0x6c
> (XEN) [<ffff82c480126259>] timer_softirq_action+0xf2/0x245
> (XEN) [<ffff82c480123327>] __do_softirq+0x88/0x99
> (XEN) [<ffff82c4801233a2>] do_softirq+0x6a/0x7a
> (XEN) [<ffff82c4801563f5>] idle_loop+0x6a/0x6f
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Xen BUG at sched_credit.c:990
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
>

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Jan-28 11:07 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen Gross wrote:> On 01/28/11 00:18, Andre Przywara wrote:
>> Hi,
>>
>> when I boot my machine without restricting Dom0 (dom0_mem>>
dom0_max_vcpus=) I get an _hypervisor_ crash when I run
>> # xl cpupool-numa-split
>> If Dom0''s resources are limited on the Xen cmdline, everything
works fine.
>> The crashdump points to a scheduling problem with weights, so I assume
>> the NUMA distribution algorithm some fools the hypervisor completely.
>>
>> I will investigate this further tomorrow, but maybe someone has some
>> good idea.
> 
> I''ve seen this once with an older cpupool version on a 24
processor machine.
> It was NOT related to NUMA, but did occur only on reboot after a Dom0
panic.
> The machine had an init script creating a cpupool and populating it with
> cpus. The machine was in a panic loop due to the BUG in sched_acct then
until
> it was resetted manually. After the reset the problem was gone.
> 
> As I was never able to reproduce the problem later (the same software is
> running on dozens of machines!), I assumed there was a problem related to
> the first Dom0 panic, may be some destroyed BIOS tables.
> 
> Can the crash be reproduced easily?Yes.
If I don''t specify dom0_max_vcpus= and dom0_mem= on the Xen cmdline, I 
can reliably trigger the crash with xl cpupool-numa-split.
Omitting dom0_max_vcpus only does not suffice.

Will continue after lunch-break ;-)

Regards,
Andre.
> 
> 
> Juergen
> 
>> Regards,
>> Andre.
>>
>> root@dosorca:/data/images# xl cpupool-numa-split
>> (XEN) Xen BUG at sched_credit.c:990
>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419
>> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
>> (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100
>> (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010
>> (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100
>> (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf
>> (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8
>> (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0
>> (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618
>> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c480297d80:
>> (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff
ffff830437ffa5e0
>> (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0
0000000000000282
>> (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00
0000000000000000
>> (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0
ffff82c480117fd9
>> (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40
ffff82c480125f34
>> (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80
000000afb6f8667f
>> (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20
ffff82c4802d3f80
>> (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000
ffff82c4802b0880
>> (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0
ffff82c480123327
>> (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20
ffff82c480297f18
>> (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0
ffff82c4801233a2
>> (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000
ffff8300c7cd6000
>> (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48
0000000000000000
>> (XEN) 0000000000000000 0000000000000000 ffffffff81a69060
ffff8817a8503f10
>> (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80
ffff880000000001
>> (XEN) 0000000000000000 0000000000000000 ffffffff810093aa
000000aafab2f86e
>> (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000
ffffffff810093aa
>> (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8
000000000000e02b
>> (XEN) 0000000000000000 0000000000000000 0000000000000000
0000000000000000
>> (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000
0000000000000000
>> (XEN) Xen call trace:
>> (XEN) [<ffff82c4801180f8>] csched_acct+0x11f/0x419
>> (XEN) [<ffff82c480125f34>] execute_timer+0x4e/0x6c
>> (XEN) [<ffff82c480126259>] timer_softirq_action+0xf2/0x245
>> (XEN) [<ffff82c480123327>] __do_softirq+0x88/0x99
>> (XEN) [<ffff82c4801233a2>] do_softirq+0x6a/0x7a
>> (XEN) [<ffff82c4801563f5>] idle_loop+0x6a/0x6f
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Xen BUG at sched_credit.c:990
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>>
> 
> 

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Jan-28 11:13 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Hmm, strange... looks like it has something to do with the code which
keeps track of which vcpus are earning credits.  You say this is done
immediately after boot, with no VMs running other than dom0?

What are the dom0_max_vcpus and dom0_mem settings required to make it work?

 -George

On Fri, Jan 28, 2011 at 6:47 AM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:> On 01/28/11 00:18, Andre Przywara wrote:
>>
>> Hi,
>>
>> when I boot my machine without restricting Dom0 (dom0_mem>>
dom0_max_vcpus=) I get an _hypervisor_ crash when I run
>> # xl cpupool-numa-split
>> If Dom0''s resources are limited on the Xen cmdline, everything
works fine.
>> The crashdump points to a scheduling problem with weights, so I assume
>> the NUMA distribution algorithm some fools the hypervisor completely.
>>
>> I will investigate this further tomorrow, but maybe someone has some
>> good idea.
>
> I''ve seen this once with an older cpupool version on a 24
processor machine.
> It was NOT related to NUMA, but did occur only on reboot after a Dom0
panic.
> The machine had an init script creating a cpupool and populating it with
> cpus. The machine was in a panic loop due to the BUG in sched_acct then
> until
> it was resetted manually. After the reset the problem was gone.
>
> As I was never able to reproduce the problem later (the same software is
> running on dozens of machines!), I assumed there was a problem related to
> the first Dom0 panic, may be some destroyed BIOS tables.
>
> Can the crash be reproduced easily?
>
>
> Juergen
>
>>
>> Regards,
>> Andre.
>>
>> root@dosorca:/data/images# xl cpupool-numa-split
>> (XEN) Xen BUG at sched_credit.c:990
>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419
>> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
>> (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100
>> (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010
>> (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100
>> (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf
>> (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8
>> (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0
>> (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618
>> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c480297d80:
>> (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff
ffff830437ffa5e0
>> (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0
0000000000000282
>> (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00
0000000000000000
>> (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0
ffff82c480117fd9
>> (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40
ffff82c480125f34
>> (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80
000000afb6f8667f
>> (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20
ffff82c4802d3f80
>> (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000
ffff82c4802b0880
>> (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0
ffff82c480123327
>> (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20
ffff82c480297f18
>> (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0
ffff82c4801233a2
>> (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000
ffff8300c7cd6000
>> (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48
0000000000000000
>> (XEN) 0000000000000000 0000000000000000 ffffffff81a69060
ffff8817a8503f10
>> (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80
ffff880000000001
>> (XEN) 0000000000000000 0000000000000000 ffffffff810093aa
000000aafab2f86e
>> (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000
ffffffff810093aa
>> (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8
000000000000e02b
>> (XEN) 0000000000000000 0000000000000000 0000000000000000
0000000000000000
>> (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000
0000000000000000
>> (XEN) Xen call trace:
>> (XEN) [<ffff82c4801180f8>] csched_acct+0x11f/0x419
>> (XEN) [<ffff82c480125f34>] execute_timer+0x4e/0x6c
>> (XEN) [<ffff82c480126259>] timer_softirq_action+0xf2/0x245
>> (XEN) [<ffff82c480123327>] __do_softirq+0x88/0x99
>> (XEN) [<ffff82c4801233a2>] do_softirq+0x6a/0x7a
>> (XEN) [<ffff82c4801563f5>] idle_loop+0x6a/0x6f
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Xen BUG at sched_credit.c:990
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>>
>
>
> --
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions              e-mail:
> juergen.gross@ts.fujitsu.com
> Domagkstr. 28                           Internet: ts.fujitsu.com
> D-80807 Muenchen                 Company details:
> ts.fujitsu.com/imprint.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Jan-28 11:44 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 01/28/11 12:07, Andre Przywara wrote:> Juergen Gross wrote:
>> On 01/28/11 00:18, Andre Przywara wrote:
>>> Hi,
>>>
>>> when I boot my machine without restricting Dom0
(dom0_mem>>> dom0_max_vcpus=) I get an _hypervisor_ crash when I run
>>> # xl cpupool-numa-split
>>> If Dom0''s resources are limited on the Xen cmdline,
everything works
>>> fine.
>>> The crashdump points to a scheduling problem with weights, so I
assume
>>> the NUMA distribution algorithm some fools the hypervisor
completely.
>>>
>>> I will investigate this further tomorrow, but maybe someone has
some
>>> good idea.
>>
>> I''ve seen this once with an older cpupool version on a 24
processor
>> machine.
>> It was NOT related to NUMA, but did occur only on reboot after a Dom0
>> panic.
>> The machine had an init script creating a cpupool and populating it
with
>> cpus. The machine was in a panic loop due to the BUG in sched_acct
>> then until
>> it was resetted manually. After the reset the problem was gone.
>>
>> As I was never able to reproduce the problem later (the same software
is
>> running on dozens of machines!), I assumed there was a problem related
to
>> the first Dom0 panic, may be some destroyed BIOS tables.
>>
>> Can the crash be reproduced easily?
> Yes.
> If I don''t specify dom0_max_vcpus= and dom0_mem= on the Xen
cmdline, I
> can reliably trigger the crash with xl cpupool-numa-split.
> Omitting dom0_max_vcpus only does not suffice.
Do I understand correctly?
No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ?

Could you try this patch?

diff -r b59f04eb8978 xen/common/schedule.c
--- a/xen/common/schedule.c     Fri Jan 21 18:06:23 2011 +0000
+++ b/xen/common/schedule.c     Fri Jan 28 12:42:46 2011 +0100
@@ -1301,7 +1301,9 @@ void schedule_cpu_switch(unsigned int cp

      idle = idle_vcpu[cpu];
      ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
+    BUG_ON(ppriv == NULL);
      vpriv = SCHED_OP(new_ops, alloc_vdata, idle,
idle->domain->sched_priv);
+    BUG_ON(vpriv == NULL);

      pcpu_schedule_lock_irqsave(cpu, flags);



-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Jan-28 13:05 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

George Dunlap wrote:> Hmm, strange... looks like it has something to do with the code which
> keeps track of which vcpus are earning credits.  You say this is done
> immediately after boot, with no VMs running other than dom0?Right, after Dom0''s prompt I just start xl cpupool-numa-split and the 
machine crashes.> 
> What are the dom0_max_vcpus and dom0_mem settings required to make it work?dom0_mem=8192M dom0_max_vcpus=6: works
dom0_mem=8192M: works
dom0_max_vcpus=6: works
(no settings): crashes
dom0_mem=20480M dom0_max_vcpus=8: works
The machine has 8 nodes with 6 CPUs each, the nodes have alternating 16G 
and 8GB memory (4 12-core (MCM aka dual-node) Opterons with 96GB RAM in 
total).
If I try to reproduce the actions of xl numa-split via a shell script it 
also crashes, just before the creation of the last pool. I will insert 
some instrumentation to the code to find the offending action.

Regards,
Andre.
> On Fri, Jan 28, 2011 at 6:47 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com> wrote:
>> On 01/28/11 00:18, Andre Przywara wrote:
>>> Hi,
>>>
>>> when I boot my machine without restricting Dom0
(dom0_mem>>> dom0_max_vcpus=) I get an _hypervisor_ crash when I run
>>> # xl cpupool-numa-split
>>> If Dom0''s resources are limited on the Xen cmdline,
everything works fine.
>>> The crashdump points to a scheduling problem with weights, so I
assume
>>> the NUMA distribution algorithm some fools the hypervisor
completely.
>>>
>>> I will investigate this further tomorrow, but maybe someone has
some
>>> good idea.
>> I''ve seen this once with an older cpupool version on a 24
processor machine.
>> It was NOT related to NUMA, but did occur only on reboot after a Dom0
panic.
>> The machine had an init script creating a cpupool and populating it
with
>> cpus. The machine was in a panic loop due to the BUG in sched_acct then
>> until
>> it was resetted manually. After the reset the problem was gone.
>>
>> As I was never able to reproduce the problem later (the same software
is
>> running on dozens of machines!), I assumed there was a problem related
to
>> the first Dom0 panic, may be some destroyed BIOS tables.
>>
>> Can the crash be reproduced easily?
>>
>>
>> Juergen
>>
>>> Regards,
>>> Andre.
>>>
>>> root@dosorca:/data/images# xl cpupool-numa-split
>>> (XEN) Xen BUG at sched_credit.c:990
>>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419
>>> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
>>> (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx:
0000000000000100
>>> (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi:
0000000000000010
>>> (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8:
0000000000000100
>>> (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11:
000000afc7df0edf
>>> (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14:
ffff830437f9f2e8
>>> (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4:
00000000000006f0
>>> (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618
>>> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
>>> (XEN) Xen stack trace from rsp=ffff82c480297d80:
>>> (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff
ffff830437ffa5e0
>>> (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0
0000000000000282
>>> (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00
0000000000000000
>>> (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0
ffff82c480117fd9
>>> (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40
ffff82c480125f34
>>> (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80
000000afb6f8667f
>>> (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20
ffff82c4802d3f80
>>> (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000
ffff82c4802b0880
>>> (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0
ffff82c480123327
>>> (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20
ffff82c480297f18
>>> (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0
ffff82c4801233a2
>>> (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000
ffff8300c7cd6000
>>> (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48
0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 ffffffff81a69060
ffff8817a8503f10
>>> (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80
ffff880000000001
>>> (XEN) 0000000000000000 0000000000000000 ffffffff810093aa
000000aafab2f86e
>>> (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000
ffffffff810093aa
>>> (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8
000000000000e02b
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000
0000000000000000
>>> (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000
0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN) [<ffff82c4801180f8>] csched_acct+0x11f/0x419
>>> (XEN) [<ffff82c480125f34>] execute_timer+0x4e/0x6c
>>> (XEN) [<ffff82c480126259>] timer_softirq_action+0xf2/0x245
>>> (XEN) [<ffff82c480123327>] __do_softirq+0x88/0x99
>>> (XEN) [<ffff82c4801233a2>] do_softirq+0x6a/0x7a
>>> (XEN) [<ffff82c4801563f5>] idle_loop+0x6a/0x6f
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Xen BUG at sched_credit.c:990
>>> (XEN) ****************************************
>>> (XEN)
>>> (XEN) Reboot in five seconds...
>>>
>>>
>>
>> --
>> Juergen Gross                 Principal Developer Operating Systems
>> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222
2967
>> Fujitsu Technology Solutions              e-mail:
>> juergen.gross@ts.fujitsu.com
>> Domagkstr. 28                           Internet: ts.fujitsu.com
>> D-80807 Muenchen                 Company details:
>> ts.fujitsu.com/imprint.html
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
> 

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Jan-28 13:14 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

> 
> Do I understand correctly?
> No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ?Yes, see my previous mail to George.
> 
> Could you try this patch?Ok, the crash dump is as follows:
(XEN) Xen BUG at sched_credit.c:384
(XEN) ----[ Xen-4.1.0-rc2-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f
(XEN) RFLAGS: 0000000000010093   CONTEXT: hypervisor
(XEN) rax: ffff830434322000   rbx: ffff830434418748   rcx: 0000000000000024
(XEN) rdx: ffff82c4802d3ec0   rsi: 0000000000000003   rdi: ffff8304343c9100
(XEN) rbp: ffff83043457fce8   rsp: ffff83043457fca8   r8:  0000000000000001
(XEN) r9:  ffff830434418748   r10: ffff82c48021a0a0   r11: 0000000000000286
(XEN) r12: 0000000000000024   r13: ffff83123a3b2b60   r14: ffff830434418730
(XEN) r15: 0000000000000024   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 00000008061df000   cr2: ffff8817a21f87a0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83043457fca8:
(XEN)    ffff83043457fcb8 ffff83123a3b2b60 0000000000000286 0000000000000024
(XEN)    ffff830434418820 ffff83123a3b2a70 0000000000000024 ffff82c4802b0880
(XEN)    ffff83043457fd58 ffff82c48011fa63 ffff82f60102aa80 0000000000081554
(XEN)    ffff8300c7cfa000 0000000000000000 0000400000000000 ffff82c480248e00
(XEN)    0000000000000002 0000000000000024 ffff830434418820 0000000000305000
(XEN)    ffff82c4802550e4 ffff82c4802b0880 ffff83043457fd78 ffff82c48010188c
(XEN)    ffff83043457fe40 0000000000000024 ffff83043457fdb8 ffff82c480101b94
(XEN)    ffff83043457fdb8 ffff82c4801836f2 fffffffe00000286 ffff83043457ff18
(XEN)    0000000002170004 0000000000305000 ffff83043457fef8 ffff82c480125281
(XEN)    ffff83043457fdd8 0000000180153c9d 0000000000000000 ffff82c4801068f8
(XEN)    0000000000000296 ffff8300c7e0a1c8 aaaaaaaaaaaaaaaa 0000000000000000
(XEN)    ffff88007d1ac170 ffff88007d1ac170 ffff83043457fef8 ffff82c480113d8a
(XEN)    ffff83043457fe78 ffff83043457fe88 0000000800000012 0000000600000004
(XEN)    0000000000000000 ffffffff00000024 0000000000000000 00007fac2e0e5a00
(XEN)    0000000002170000 0000000000000000 0000000000000000 ffffffffffffffff
(XEN)    0000000000000000 0000000000000080 000000000000002f 0000000002170004
(XEN)    0000000002172004 0000000002174004 00007fff878f1c80 0000000000000033
(XEN)    ffff83043457fed8 ffff8300c7e0a000 00007fff878f1b30 0000000000305000
(XEN)    0000000000000003 0000000000000003 00007cfbcba800c7 ffff82c480207dd8
(XEN)    ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003
(XEN) Xen call trace:
(XEN)    [<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f
(XEN)    [<ffff82c48011fa63>] schedule_cpu_switch+0x75/0x1eb
(XEN)    [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b
(XEN)    [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461
(XEN)    [<ffff82c480125281>] do_sysctl+0x921/0xa30
(XEN)    [<ffff82c480207dd8>] syscall_enter+0xc8/0x122
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) Xen BUG at sched_credit.c:384
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

Regards,
Andre.
> 
> diff -r b59f04eb8978 xen/common/schedule.c
> --- a/xen/common/schedule.c     Fri Jan 21 18:06:23 2011 +0000
> +++ b/xen/common/schedule.c     Fri Jan 28 12:42:46 2011 +0100
> @@ -1301,7 +1301,9 @@ void schedule_cpu_switch(unsigned int cp
> 
>       idle = idle_vcpu[cpu];
>       ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
> +    BUG_ON(ppriv == NULL);
>       vpriv = SCHED_OP(new_ops, alloc_vdata, idle,
idle->domain->sched_priv);
> +    BUG_ON(vpriv == NULL);
> 
>       pcpu_schedule_lock_irqsave(cpu, flags);
> 
> 
> 

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Jan-31 07:04 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 01/28/11 14:14, Andre Przywara wrote:>>
>> Do I understand correctly?
>> No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ?
> Yes, see my previous mail to George.
>
>>
>> Could you try this patch?
> Ok, the crash dump is as follows:
Hmm, is the new crash reproducable as well?
Seems not to be directly related to my diagnosis patch...

Currently I have no NUMA machine available. I tried to use numa=fake=...
boot parameter, but this seems to fake only NUMA memory nodes, all cpus are
still in node 0:

(XEN) ''u'' pressed -> dumping numa info (now-0x120:5D5E0203)
(XEN) idx0 -> NODE0 start->0 size->524288
(XEN) phys_to_nid(0000000000001000) -> 0 should be 0
(XEN) idx1 -> NODE1 start->524288 size->524288
(XEN) phys_to_nid(0000000080001000) -> 1 should be 1
(XEN) idx2 -> NODE2 start->1048576 size->524288
(XEN) phys_to_nid(0000000100001000) -> 2 should be 2
(XEN) idx3 -> NODE3 start->1572864 size->1835008
(XEN) phys_to_nid(0000000180001000) -> 3 should be 3
(XEN) CPU0 -> NODE0
(XEN) CPU1 -> NODE0
(XEN) CPU2 -> NODE0
(XEN) CPU3 -> NODE0
(XEN) Memory location of each domain:
(XEN) Domain 0 (total: 3003121):
(XEN)     Node 0: 433864
(XEN)     Node 1: 258522
(XEN)     Node 2: 514315
(XEN)     Node 3: 1796420

I suspect a problem with the __cpuinit stuff overwriting some node info.
Andre, could you check this? I hope to reproduce your problem on my machine.
> (XEN) Xen BUG at sched_credit.c:384
> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
> (XEN) CPU: 2
> (XEN) RIP: e008:[<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f
> (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor
> (XEN) rax: ffff830434322000 rbx: ffff830434418748 rcx: 0000000000000024
> (XEN) rdx: ffff82c4802d3ec0 rsi: 0000000000000003 rdi: ffff8304343c9100
> (XEN) rbp: ffff83043457fce8 rsp: ffff83043457fca8 r8: 0000000000000001
> (XEN) r9: ffff830434418748 r10: ffff82c48021a0a0 r11: 0000000000000286
> (XEN) r12: 0000000000000024 r13: ffff83123a3b2b60 r14: ffff830434418730
> (XEN) r15: 0000000000000024 cr0: 000000008005003b cr4: 00000000000006f0
> (XEN) cr3: 00000008061df000 cr2: ffff8817a21f87a0
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff83043457fca8:
> (XEN) ffff83043457fcb8 ffff83123a3b2b60 0000000000000286 0000000000000024
> (XEN) ffff830434418820 ffff83123a3b2a70 0000000000000024 ffff82c4802b0880
> (XEN) ffff83043457fd58 ffff82c48011fa63 ffff82f60102aa80 0000000000081554
> (XEN) ffff8300c7cfa000 0000000000000000 0000400000000000 ffff82c480248e00
> (XEN) 0000000000000002 0000000000000024 ffff830434418820 0000000000305000
> (XEN) ffff82c4802550e4 ffff82c4802b0880 ffff83043457fd78 ffff82c48010188c
> (XEN) ffff83043457fe40 0000000000000024 ffff83043457fdb8 ffff82c480101b94
> (XEN) ffff83043457fdb8 ffff82c4801836f2 fffffffe00000286 ffff83043457ff18
> (XEN) 0000000002170004 0000000000305000 ffff83043457fef8 ffff82c480125281
> (XEN) ffff83043457fdd8 0000000180153c9d 0000000000000000 ffff82c4801068f8
> (XEN) 0000000000000296 ffff8300c7e0a1c8 aaaaaaaaaaaaaaaa 0000000000000000
> (XEN) ffff88007d1ac170 ffff88007d1ac170 ffff83043457fef8 ffff82c480113d8a
> (XEN) ffff83043457fe78 ffff83043457fe88 0000000800000012 0000000600000004
> (XEN) 0000000000000000 ffffffff00000024 0000000000000000 00007fac2e0e5a00
> (XEN) 0000000002170000 0000000000000000 0000000000000000 ffffffffffffffff
> (XEN) 0000000000000000 0000000000000080 000000000000002f 0000000002170004
> (XEN) 0000000002172004 0000000002174004 00007fff878f1c80 0000000000000033
> (XEN) ffff83043457fed8 ffff8300c7e0a000 00007fff878f1b30 0000000000305000
> (XEN) 0000000000000003 0000000000000003 00007cfbcba800c7 ffff82c480207dd8
> (XEN) ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003
> (XEN) Xen call trace:
> (XEN) [<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f
> (XEN) [<ffff82c48011fa63>] schedule_cpu_switch+0x75/0x1eb
> (XEN) [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b
> (XEN) [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461
> (XEN) [<ffff82c480125281>] do_sysctl+0x921/0xa30
> (XEN) [<ffff82c480207dd8>] syscall_enter+0xc8/0x122
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 2:
> (XEN) Xen BUG at sched_credit.c:384
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...

Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Jan-31 14:59 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen Gross wrote:> On 01/28/11 14:14, Andre Przywara wrote:
>>> Do I understand correctly?
>>> No crash with only dom0_max_vcpus= and no crash with only dom0_mem=
?
>> Yes, see my previous mail to George.
>>
>>> Could you try this patch?
>> Ok, the crash dump is as follows:
> 
> Hmm, is the new crash reproducable as well?
> Seems not to be directly related to my diagnosis patch...Right, that was also my impression.

I seemed to get a bit further, though:
By accident I found that in c/s 22846 the issue is fixed, it works now 
without crashing. I bisected it down to my own patch, which disables the 
NODEID_MSR in Dom0. I could confirm this theory by a) applying this 
single line (clear_bit(NODEID_MSR)) to 22799 and _not_ seeing it crash 
and b) by removing this line from 22846 and seeing it crash.

So my theory is that Dom0 sees different nodes on its virtual CPUs via 
the physical NodeID MSR, but this association can (and will) be changed 
every moment by the Xen scheduler. So Dom0 will build a bogus topology 
based upon these values. As soon as all vCPUs of Dom0 are contained into 
one node (node 0, this is caused by the cpupool-numa-split call), the 
Xen scheduler somehow hicks up.
So it seems to be bad combination caused by the NodeID-MSR (on newer AMD 
platforms: sockets C32 and G34) and a NodeID MSR aware Dom0 (2.6.32.27).
Since this is a hypervisor crash, I assume that the bug is still there, 
only the current tip will make it much less likely to be triggered.

Hope that help, I will dig deeper now.

Regards,
Andre.
> 
> Currently I have no NUMA machine available. I tried to use numa=fake=...
> boot parameter, but this seems to fake only NUMA memory nodes, all cpus are
> still in node 0:
> 
> (XEN) ''u'' pressed -> dumping numa info
(now-0x120:5D5E0203)
> (XEN) idx0 -> NODE0 start->0 size->524288
> (XEN) phys_to_nid(0000000000001000) -> 0 should be 0
> (XEN) idx1 -> NODE1 start->524288 size->524288
> (XEN) phys_to_nid(0000000080001000) -> 1 should be 1
> (XEN) idx2 -> NODE2 start->1048576 size->524288
> (XEN) phys_to_nid(0000000100001000) -> 2 should be 2
> (XEN) idx3 -> NODE3 start->1572864 size->1835008
> (XEN) phys_to_nid(0000000180001000) -> 3 should be 3
> (XEN) CPU0 -> NODE0
> (XEN) CPU1 -> NODE0
> (XEN) CPU2 -> NODE0
> (XEN) CPU3 -> NODE0
> (XEN) Memory location of each domain:
> (XEN) Domain 0 (total: 3003121):
> (XEN)     Node 0: 433864
> (XEN)     Node 1: 258522
> (XEN)     Node 2: 514315
> (XEN)     Node 3: 1796420
> 
> I suspect a problem with the __cpuinit stuff overwriting some node info.
> Andre, could you check this? I hope to reproduce your problem on my
machine.
> 
>> (XEN) Xen BUG at sched_credit.c:384
>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
>> (XEN) CPU: 2
>> (XEN) RIP: e008:[<ffff82c480117fa0>]
csched_alloc_pdata+0x146/0x17f
>> (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor
>> (XEN) rax: ffff830434322000 rbx: ffff830434418748 rcx: 0000000000000024
>> (XEN) rdx: ffff82c4802d3ec0 rsi: 0000000000000003 rdi: ffff8304343c9100
>> (XEN) rbp: ffff83043457fce8 rsp: ffff83043457fca8 r8: 0000000000000001
>> (XEN) r9: ffff830434418748 r10: ffff82c48021a0a0 r11: 0000000000000286
>> (XEN) r12: 0000000000000024 r13: ffff83123a3b2b60 r14: ffff830434418730
>> (XEN) r15: 0000000000000024 cr0: 000000008005003b cr4: 00000000000006f0
>> (XEN) cr3: 00000008061df000 cr2: ffff8817a21f87a0
>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
>> (XEN) Xen stack trace from rsp=ffff83043457fca8:
>> (XEN) ffff83043457fcb8 ffff83123a3b2b60 0000000000000286
0000000000000024
>> (XEN) ffff830434418820 ffff83123a3b2a70 0000000000000024
ffff82c4802b0880
>> (XEN) ffff83043457fd58 ffff82c48011fa63 ffff82f60102aa80
0000000000081554
>> (XEN) ffff8300c7cfa000 0000000000000000 0000400000000000
ffff82c480248e00
>> (XEN) 0000000000000002 0000000000000024 ffff830434418820
0000000000305000
>> (XEN) ffff82c4802550e4 ffff82c4802b0880 ffff83043457fd78
ffff82c48010188c
>> (XEN) ffff83043457fe40 0000000000000024 ffff83043457fdb8
ffff82c480101b94
>> (XEN) ffff83043457fdb8 ffff82c4801836f2 fffffffe00000286
ffff83043457ff18
>> (XEN) 0000000002170004 0000000000305000 ffff83043457fef8
ffff82c480125281
>> (XEN) ffff83043457fdd8 0000000180153c9d 0000000000000000
ffff82c4801068f8
>> (XEN) 0000000000000296 ffff8300c7e0a1c8 aaaaaaaaaaaaaaaa
0000000000000000
>> (XEN) ffff88007d1ac170 ffff88007d1ac170 ffff83043457fef8
ffff82c480113d8a
>> (XEN) ffff83043457fe78 ffff83043457fe88 0000000800000012
0000000600000004
>> (XEN) 0000000000000000 ffffffff00000024 0000000000000000
00007fac2e0e5a00
>> (XEN) 0000000002170000 0000000000000000 0000000000000000
ffffffffffffffff
>> (XEN) 0000000000000000 0000000000000080 000000000000002f
0000000002170004
>> (XEN) 0000000002172004 0000000002174004 00007fff878f1c80
0000000000000033
>> (XEN) ffff83043457fed8 ffff8300c7e0a000 00007fff878f1b30
0000000000305000
>> (XEN) 0000000000000003 0000000000000003 00007cfbcba800c7
ffff82c480207dd8
>> (XEN) ffffffff8100946a 0000000000000023 0000000000000003
0000000000000003
>> (XEN) Xen call trace:
>> (XEN) [<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f
>> (XEN) [<ffff82c48011fa63>] schedule_cpu_switch+0x75/0x1eb
>> (XEN) [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b
>> (XEN) [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461
>> (XEN) [<ffff82c480125281>] do_sysctl+0x921/0xa30
>> (XEN) [<ffff82c480207dd8>] syscall_enter+0xc8/0x122
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 2:
>> (XEN) Xen BUG at sched_credit.c:384
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
> 
> 
> Juergen
> 

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Jan-31 15:28 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On Mon, Jan 31, 2011 at 2:59 PM, Andre Przywara <andre.przywara@amd.com>
wrote:> Right, that was also my impression.
>
> I seemed to get a bit further, though:
> By accident I found that in c/s 22846 the issue is fixed, it works now
> without crashing. I bisected it down to my own patch, which disables the
> NODEID_MSR in Dom0. I could confirm this theory by a) applying this single
> line (clear_bit(NODEID_MSR)) to 22799 and _not_ seeing it crash and b) by
> removing this line from 22846 and seeing it crash.
>
> So my theory is that Dom0 sees different nodes on its virtual CPUs via the
> physical NodeID MSR, but this association can (and will) be changed every
> moment by the Xen scheduler. So Dom0 will build a bogus topology based upon
> these values. As soon as all vCPUs of Dom0 are contained into one node
(node
> 0, this is caused by the cpupool-numa-split call), the Xen scheduler
somehow
> hicks up.
> So it seems to be bad combination caused by the NodeID-MSR (on newer AMD
> platforms: sockets C32 and G34) and a NodeID MSR aware Dom0 (2.6.32.27).
> Since this is a hypervisor crash, I assume that the bug is still there,
only
> the current tip will make it much less likely to be triggered.
>
> Hope that help, I will dig deeper now.
Thanks.  The crashes you''re getting are in fact very strange.  They
have to do with assumptions that the credit scheduler makes as part of
its accounting process.  It would only make sense for those to be
triggered if a vcpu was moved from one pool to another pool without
the proper accounting being done.  (Specifically, each vcpu is
classified as either "active" or "inactive"; and each
scheduler
instance keeps track of the total weight of all "active" vcpus.  The
BUGs you''re tripping over are saying that this invariant has been
violated.)  However, I''ve looked at the cpupools vcpu-migrate code,
and it looks like it does everything right.  So I''m a bit mystified.
My only thought is if possibly a cpumask somewhere that wasn''t getting
set properly, such that a vcpu was being run on a cpu from another
pool.

Unfortunately I can''t take a good look at this right now; hopefully
I''ll be able to take a look next week.

Andre, if you were keen, you might go through the credit code and put
in a bunch of ASSERTs that the current pcpu is in the mask of the
current vcpu; and that the current vcpu is assigned to the pool of the
current pcpu, and so on.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-01 16:32 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Hi folks,

I asked Stephan Diestelhorst for help and after I convinced him that 
removing credit and making SEDF the default again is not an option he 
worked together with me on that ;-) Many thanks for that!
We haven''t come to a final solution but could gather some debug data.
I will simply dump some data here, maybe somebody has got a clue. We 
will work further on this tomorrow.

First I replaced the BUG_ON with some printks to get some insight:
(XEN) sdom->active_vcpu_count: 18
(XEN) sdom->weight: 256
(XEN) weight_left: 4096, weight_total: 4096
(XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
(XEN) Xen BUG at sched_credit.c:591
(XEN) ----[ Xen-4.1.0-rc2-pre  x86_64  debug=y  Not tainted ]----

So that one shows that the number of VCPUs is not up-to-date with the 
computed weight sum, we have seen a difference of one or two VCPUs (in 
this case here the weight has been computed from 16 VCPUs). Also it 
shows that the assertion kicks in in the first iteration of the loop, 
where weight_left and weight_total are still equal.

So I additionally instrumented alloc_pdata and free_pdata, the 
unprefixed lines come from a shell script mimicking the functionality of 
cpupool-numa-split.
------------
Removing CPUs from Pool 0
Creating new pool
Using config file "cpupool.test"
cpupool name:   Pool-node6
scheduler:      credit
number of cpus: 1
(XEN) adding CPU 36, now 1 CPUs
(XEN) removing CPU 36, remaining: 17
Populating new pool
(XEN) sdom->active_vcpu_count: 9
(XEN) sdom->weight: 256
(XEN) weight_left: 2048, weight_total: 2048
(XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
(XEN) adding CPU 37, now 2 CPUs
(XEN) removing CPU 37, remaining: 16
(XEN) adding CPU 38, now 3 CPUs
(XEN) removing CPU 38, remaining: 15
(XEN) adding CPU 39, now 4 CPUs
(XEN) removing CPU 39, remaining: 14
(XEN) adding CPU 40, now 5 CPUs
(XEN) removing CPU 40, remaining: 13
(XEN) sdom->active_vcpu_count: 17
(XEN) sdom->weight: 256
(XEN) weight_left: 4096, weight_total: 4096
(XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
(XEN) adding CPU 41, now 6 CPUs
(XEN) removing CPU 41, remaining: 12
...
Two thing startled me:
1) There is quite some between the "Removing CPUs" message from the 
script and the actual HV printk showing it''s done, why is that not 
synchronous? Looking at the code it shows that 
__csched_vcpu_acct_start() is eventually triggered by a timer,
shouldn''t
that be triggered synchronously by add/removal events?
2) It clearly shows that each CPU gets added to the new pool _before_ it 
gets removed from the old one (Pool-0), isn''t that violating the
"only
one pool per CPU" rule? Even it that is fine for a short period of time, 
maybe the timer kicks in in this very moment resulting in violated 
invariants?

Yours confused,
Andre.

George Dunlap wrote:> On Mon, Jan 31, 2011 at 2:59 PM, Andre Przywara
<andre.przywara@amd.com> wrote:
>> Right, that was also my impression.
>>
>> I seemed to get a bit further, though:
>> By accident I found that in c/s 22846 the issue is fixed, it works now
>> without crashing. I bisected it down to my own patch, which disables
the
>> NODEID_MSR in Dom0. I could confirm this theory by a) applying this
single
>> line (clear_bit(NODEID_MSR)) to 22799 and _not_ seeing it crash and b)
by
>> removing this line from 22846 and seeing it crash.
>>
>> So my theory is that Dom0 sees different nodes on its virtual CPUs via
the
>> physical NodeID MSR, but this association can (and will) be changed
every
>> moment by the Xen scheduler. So Dom0 will build a bogus topology based
upon
>> these values. As soon as all vCPUs of Dom0 are contained into one node
(node
>> 0, this is caused by the cpupool-numa-split call), the Xen scheduler
somehow
>> hicks up.
>> So it seems to be bad combination caused by the NodeID-MSR (on newer
AMD
>> platforms: sockets C32 and G34) and a NodeID MSR aware Dom0
(2.6.32.27).
>> Since this is a hypervisor crash, I assume that the bug is still there,
only
>> the current tip will make it much less likely to be triggered.
>>
>> Hope that help, I will dig deeper now.
> 
> Thanks.  The crashes you''re getting are in fact very strange. 
They
> have to do with assumptions that the credit scheduler makes as part of
> its accounting process.  It would only make sense for those to be
> triggered if a vcpu was moved from one pool to another pool without
> the proper accounting being done.  (Specifically, each vcpu is
> classified as either "active" or "inactive"; and each
scheduler
> instance keeps track of the total weight of all "active" vcpus. 
The
> BUGs you''re tripping over are saying that this invariant has been
> violated.)  However, I''ve looked at the cpupools vcpu-migrate
code,
> and it looks like it does everything right.  So I''m a bit
mystified.
> My only thought is if possibly a cpumask somewhere that wasn''t
getting
> set properly, such that a vcpu was being run on a cpu from another
> pool.
> 
> Unfortunately I can''t take a good look at this right now;
hopefully
> I''ll be able to take a look next week.
> 
> Andre, if you were keen, you might go through the credit code and put
> in a bunch of ASSERTs that the current pcpu is in the mask of the
> current vcpu; and that the current vcpu is assigned to the pool of the
> current pcpu, and so on.
> 
>  -George
> 

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-02 06:27 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/01/11 17:32, Andre Przywara wrote:> Hi folks,
>
> I asked Stephan Diestelhorst for help and after I convinced him that
> removing credit and making SEDF the default again is not an option he
> worked together with me on that ;-) Many thanks for that!
> We haven''t come to a final solution but could gather some debug
data.
> I will simply dump some data here, maybe somebody has got a clue. We
> will work further on this tomorrow.
>
> First I replaced the BUG_ON with some printks to get some insight:
> (XEN) sdom->active_vcpu_count: 18
> (XEN) sdom->weight: 256
> (XEN) weight_left: 4096, weight_total: 4096
> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
> (XEN) Xen BUG at sched_credit.c:591
> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
>
> So that one shows that the number of VCPUs is not up-to-date with the
> computed weight sum, we have seen a difference of one or two VCPUs (in
> this case here the weight has been computed from 16 VCPUs). Also it
> shows that the assertion kicks in in the first iteration of the loop,
> where weight_left and weight_total are still equal.
>
> So I additionally instrumented alloc_pdata and free_pdata, the
> unprefixed lines come from a shell script mimicking the functionality of
> cpupool-numa-split.
> ------------
> Removing CPUs from Pool 0
> Creating new pool
> Using config file "cpupool.test"
> cpupool name: Pool-node6
> scheduler: credit
> number of cpus: 1
> (XEN) adding CPU 36, now 1 CPUs
> (XEN) removing CPU 36, remaining: 17
> Populating new pool
> (XEN) sdom->active_vcpu_count: 9
> (XEN) sdom->weight: 256
> (XEN) weight_left: 2048, weight_total: 2048
> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
> (XEN) adding CPU 37, now 2 CPUs
> (XEN) removing CPU 37, remaining: 16
> (XEN) adding CPU 38, now 3 CPUs
> (XEN) removing CPU 38, remaining: 15
> (XEN) adding CPU 39, now 4 CPUs
> (XEN) removing CPU 39, remaining: 14
> (XEN) adding CPU 40, now 5 CPUs
> (XEN) removing CPU 40, remaining: 13
> (XEN) sdom->active_vcpu_count: 17
> (XEN) sdom->weight: 256
> (XEN) weight_left: 4096, weight_total: 4096
> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
> (XEN) adding CPU 41, now 6 CPUs
> (XEN) removing CPU 41, remaining: 12
> ...
> Two thing startled me:
> 1) There is quite some between the "Removing CPUs" message from
the
> script and the actual HV printk showing it''s done, why is that not
> synchronous?
Removing cpus from Pool-0 requires no switching of the scheduler, so you
see no calls of alloc/free_pdata here.

 > Looking at the code it shows that> __csched_vcpu_acct_start() is eventually triggered by a timer,
shouldn''t
> that be triggered synchronously by add/removal events?
The vcpus are not moved explicitly, they are migrated by the normal
scheduler mechanisms, same as for vcpu-pin.
> 2) It clearly shows that each CPU gets added to the new pool _before_ it
> gets removed from the old one (Pool-0), isn''t that violating the
"only
> one pool per CPU" rule? Even it that is fine for a short period of
time,
> maybe the timer kicks in in this very moment resulting in violated
> invariants?
The sequence you are seeing seems to be okay. The alloc_pdata for the new pool
is called before the free_pdata for the old pool.

And the timer is not relevant, as only the idle vcpu should be running on the
moving cpu and the accounting stuff is never called during idle.



Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-02 08:49 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/02/11 07:27, Juergen Gross wrote:> On 02/01/11 17:32, Andre Przywara wrote:
>> Hi folks,
>>
>> I asked Stephan Diestelhorst for help and after I convinced him that
>> removing credit and making SEDF the default again is not an option he
>> worked together with me on that ;-) Many thanks for that!
>> We haven''t come to a final solution but could gather some
debug data.
>> I will simply dump some data here, maybe somebody has got a clue. We
>> will work further on this tomorrow.
>>
>> First I replaced the BUG_ON with some printks to get some insight:
>> (XEN) sdom->active_vcpu_count: 18
>> (XEN) sdom->weight: 256
>> (XEN) weight_left: 4096, weight_total: 4096
>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>> (XEN) Xen BUG at sched_credit.c:591
>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
>>
>> So that one shows that the number of VCPUs is not up-to-date with the
>> computed weight sum, we have seen a difference of one or two VCPUs (in
>> this case here the weight has been computed from 16 VCPUs). Also it
>> shows that the assertion kicks in in the first iteration of the loop,
>> where weight_left and weight_total are still equal.
>>
>> So I additionally instrumented alloc_pdata and free_pdata, the
>> unprefixed lines come from a shell script mimicking the functionality
of
>> cpupool-numa-split.
>> ------------
>> Removing CPUs from Pool 0
>> Creating new pool
>> Using config file "cpupool.test"
>> cpupool name: Pool-node6
>> scheduler: credit
>> number of cpus: 1
>> (XEN) adding CPU 36, now 1 CPUs
>> (XEN) removing CPU 36, remaining: 17
>> Populating new pool
>> (XEN) sdom->active_vcpu_count: 9
>> (XEN) sdom->weight: 256
>> (XEN) weight_left: 2048, weight_total: 2048
>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>> (XEN) adding CPU 37, now 2 CPUs
>> (XEN) removing CPU 37, remaining: 16
>> (XEN) adding CPU 38, now 3 CPUs
>> (XEN) removing CPU 38, remaining: 15
>> (XEN) adding CPU 39, now 4 CPUs
>> (XEN) removing CPU 39, remaining: 14
>> (XEN) adding CPU 40, now 5 CPUs
>> (XEN) removing CPU 40, remaining: 13
>> (XEN) sdom->active_vcpu_count: 17
>> (XEN) sdom->weight: 256
>> (XEN) weight_left: 4096, weight_total: 4096
>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>> (XEN) adding CPU 41, now 6 CPUs
>> (XEN) removing CPU 41, remaining: 12
>> ...
>> Two thing startled me:
>> 1) There is quite some between the "Removing CPUs" message
from the
>> script and the actual HV printk showing it''s done, why is that
not
>> synchronous?
>
> Removing cpus from Pool-0 requires no switching of the scheduler, so you
> see no calls of alloc/free_pdata here.
>
>  > Looking at the code it shows that
>> __csched_vcpu_acct_start() is eventually triggered by a timer,
shouldn''t
>> that be triggered synchronously by add/removal events?
>
> The vcpus are not moved explicitly, they are migrated by the normal
> scheduler mechanisms, same as for vcpu-pin.
>
>> 2) It clearly shows that each CPU gets added to the new pool _before_
it
>> gets removed from the old one (Pool-0), isn''t that violating
the "only
>> one pool per CPU" rule? Even it that is fine for a short period of
time,
>> maybe the timer kicks in in this very moment resulting in violated
>> invariants?
>
> The sequence you are seeing seems to be okay. The alloc_pdata for the
> new pool
> is called before the free_pdata for the old pool.
>
> And the timer is not relevant, as only the idle vcpu should be running
> on the
> moving cpu and the accounting stuff is never called during idle.
Uhh, this could be wrong!
The normal ticker doesn''t call accounting in idle and it is stopped
during
cpu move. The master_ticker is handled wrong, perhaps. I''ll check this
and
prepare a patch if necessary.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-02 10:05 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Hi Andre,

could you try the attached patch?
It should verify if your problems are due to the master ticker
kicking in at a time when the cpu is already gone from the cpupool.

I''m not sure if the patch is complete - Disabling the master ticker
in csched_tick_suspend might lead to problems with cstates. The
functionality is different, at least.

George, do you think this is correct?


Juergen

On 02/02/11 09:49, Juergen Gross wrote:> On 02/02/11 07:27, Juergen Gross wrote:
>> On 02/01/11 17:32, Andre Przywara wrote:
>>> Hi folks,
>>>
>>> I asked Stephan Diestelhorst for help and after I convinced him
that
>>> removing credit and making SEDF the default again is not an option
he
>>> worked together with me on that ;-) Many thanks for that!
>>> We haven''t come to a final solution but could gather some
debug data.
>>> I will simply dump some data here, maybe somebody has got a clue.
We
>>> will work further on this tomorrow.
>>>
>>> First I replaced the BUG_ON with some printks to get some insight:
>>> (XEN) sdom->active_vcpu_count: 18
>>> (XEN) sdom->weight: 256
>>> (XEN) weight_left: 4096, weight_total: 4096
>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>>> (XEN) Xen BUG at sched_credit.c:591
>>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
>>>
>>> So that one shows that the number of VCPUs is not up-to-date with
the
>>> computed weight sum, we have seen a difference of one or two VCPUs
(in
>>> this case here the weight has been computed from 16 VCPUs). Also it
>>> shows that the assertion kicks in in the first iteration of the
loop,
>>> where weight_left and weight_total are still equal.
>>>
>>> So I additionally instrumented alloc_pdata and free_pdata, the
>>> unprefixed lines come from a shell script mimicking the
functionality of
>>> cpupool-numa-split.
>>> ------------
>>> Removing CPUs from Pool 0
>>> Creating new pool
>>> Using config file "cpupool.test"
>>> cpupool name: Pool-node6
>>> scheduler: credit
>>> number of cpus: 1
>>> (XEN) adding CPU 36, now 1 CPUs
>>> (XEN) removing CPU 36, remaining: 17
>>> Populating new pool
>>> (XEN) sdom->active_vcpu_count: 9
>>> (XEN) sdom->weight: 256
>>> (XEN) weight_left: 2048, weight_total: 2048
>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>>> (XEN) adding CPU 37, now 2 CPUs
>>> (XEN) removing CPU 37, remaining: 16
>>> (XEN) adding CPU 38, now 3 CPUs
>>> (XEN) removing CPU 38, remaining: 15
>>> (XEN) adding CPU 39, now 4 CPUs
>>> (XEN) removing CPU 39, remaining: 14
>>> (XEN) adding CPU 40, now 5 CPUs
>>> (XEN) removing CPU 40, remaining: 13
>>> (XEN) sdom->active_vcpu_count: 17
>>> (XEN) sdom->weight: 256
>>> (XEN) weight_left: 4096, weight_total: 4096
>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>>> (XEN) adding CPU 41, now 6 CPUs
>>> (XEN) removing CPU 41, remaining: 12
>>> ...
>>> Two thing startled me:
>>> 1) There is quite some between the "Removing CPUs"
message from the
>>> script and the actual HV printk showing it''s done, why is
that not
>>> synchronous?
>>
>> Removing cpus from Pool-0 requires no switching of the scheduler, so
you
>> see no calls of alloc/free_pdata here.
>>
>> > Looking at the code it shows that
>>> __csched_vcpu_acct_start() is eventually triggered by a timer,
shouldn''t
>>> that be triggered synchronously by add/removal events?
>>
>> The vcpus are not moved explicitly, they are migrated by the normal
>> scheduler mechanisms, same as for vcpu-pin.
>>
>>> 2) It clearly shows that each CPU gets added to the new pool
_before_ it
>>> gets removed from the old one (Pool-0), isn''t that
violating the "only
>>> one pool per CPU" rule? Even it that is fine for a short
period of time,
>>> maybe the timer kicks in in this very moment resulting in violated
>>> invariants?
>>
>> The sequence you are seeing seems to be okay. The alloc_pdata for the
>> new pool
>> is called before the free_pdata for the old pool.
>>
>> And the timer is not relevant, as only the idle vcpu should be running
>> on the
>> moving cpu and the accounting stuff is never called during idle.
>
> Uhh, this could be wrong!
> The normal ticker doesn''t call accounting in idle and it is
stopped during
> cpu move. The master_ticker is handled wrong, perhaps. I''ll check
this and
> prepare a patch if necessary.
>
>
> Juergen
>

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-02 10:59 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen Gross wrote:> Hi Andre,
> 
> could you try the attached patch?
> It should verify if your problems are due to the master ticker
> kicking in at a time when the cpu is already gone from the cpupool.That''s what we found also yesterday. If the timer routine triggers 
before the timer is stopped but is actually _running_ afterwards, this 
could lead to problems.

Anyway, the hypervisor still crashes, now at a different BUG_ON():

     /* Start off idling... */
     BUG_ON(!is_idle_vcpu(per_cpu(schedule_data, cpu).curr));
     cpu_set(cpu, prv->idlers);

The complete crash dump was this:

(XEN) Xen BUG at sched_credit.c:389
(XEN) ----[ Xen-4.1.0-rc2-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82c480118020>] csched_alloc_pdata+0x146/0x197
(XEN) RFLAGS: 0000000000010093   CONTEXT: hypervisor
(XEN) rax: ffff830434322000   rbx: ffff830434492478   rcx: 0000000000000018
(XEN) rdx: ffff82c4802d3ec0   rsi: 0000000000000006   rdi: ffff83043445e100
(XEN) rbp: ffff83043456fce8   rsp: ffff83043456fca8   r8:  00000000deadbeef
(XEN) r9:  ffff830434492478   r10: ffff82c48021a1c0   r11: 0000000000000286
(XEN) r12: 0000000000000018   r13: ffff830a3c70c780   r14: ffff830434492460
(XEN) r15: 0000000000000018   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 0000000805bac000   cr2: 00007fbbdaf71116
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83043456fca8:
(XEN)    ffff83043456fcb8 ffff830a3c70c780 0000000000000286 0000000000000018
(XEN)    ffff830434492550 ffff830a3c70c690 0000000000000018 ffff82c4802b0880
(XEN)    ffff83043456fd58 ffff82c48011fbb3 ffff82f601020900 0000000000081048
(XEN)    ffff8300c7e42000 0000000000000000 0000800000000000 ffff82c480249000
(XEN)    0000000000000002 0000000000000018 ffff830434492550 0000000000305000
(XEN)    ffff82c4802550e4 ffff82c4802b0880 ffff83043456fd78 ffff82c48010188c
(XEN)    ffff83043456fe40 0000000000000018 ffff83043456fdb8 ffff82c480101b94
(XEN)    ffff83043456fdb8 ffff82c48018380a fffffffe00000286 ffff83043456ff18
(XEN)    0000000001669004 0000000000305000 ffff83043456fef8 ffff82c4801253c1
(XEN)    ffff83043456fde8 ffff8300c7ac0000 0000000000000000 0000000000000246
(XEN)    ffff83043456fe18 ffff82c480106c7f ffff830434577100 ffff8300c7ac0000
(XEN)    ffff83043456fe28 ffff82c480125de4 0000000000000003 ffff82c4802d3f80
(XEN)    ffff83043456fe78 0000000000000282 0000000800000012 0000000400000004
(XEN)    0000000000000000 ffffffff00000018 0000000000000000 00007f7e6a549a00
(XEN)    0000000001669000 0000000000000000 0000000000000000 ffffffffffffffff
(XEN)    0000000000000000 0000000000000080 000000000000002f 0000000001669004
(XEN)    000000000166b004 000000000166d004 00007fffa59ff250 0000000000000033
(XEN)    ffff83043456fed8 ffff8300c7ac0000 00007fffa59ff100 0000000000305000
(XEN)    0000000000000003 0000000000000003 00007cfbcba900c7 ffff82c480207ee8
(XEN)    ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003
(XEN) Xen call trace:
(XEN)    [<ffff82c480118020>] csched_alloc_pdata+0x146/0x197
(XEN)    [<ffff82c48011fbb3>] schedule_cpu_switch+0x75/0x1cd
(XEN)    [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b
(XEN)    [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461
(XEN)    [<ffff82c4801253c1>] do_sysctl+0x921/0xa30
(XEN)    [<ffff82c480207ee8>] syscall_enter+0xc8/0x122
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) Xen BUG at sched_credit.c:389
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...


Regards,
Andre.
> 
> I''m not sure if the patch is complete - Disabling the master
ticker
> in csched_tick_suspend might lead to problems with cstates. The
> functionality is different, at least.
> 
> George, do you think this is correct?
> 
> 
> Juergen
> 
> On 02/02/11 09:49, Juergen Gross wrote:
>> On 02/02/11 07:27, Juergen Gross wrote:
>>> On 02/01/11 17:32, Andre Przywara wrote:
>>>> Hi folks,
>>>>
>>>> I asked Stephan Diestelhorst for help and after I convinced him
that
>>>> removing credit and making SEDF the default again is not an
option he
>>>> worked together with me on that ;-) Many thanks for that!
>>>> We haven''t come to a final solution but could gather
some debug data.
>>>> I will simply dump some data here, maybe somebody has got a
clue. We
>>>> will work further on this tomorrow.
>>>>
>>>> First I replaced the BUG_ON with some printks to get some
insight:
>>>> (XEN) sdom->active_vcpu_count: 18
>>>> (XEN) sdom->weight: 256
>>>> (XEN) weight_left: 4096, weight_total: 4096
>>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>>>> (XEN) Xen BUG at sched_credit.c:591
>>>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
>>>>
>>>> So that one shows that the number of VCPUs is not up-to-date
with the
>>>> computed weight sum, we have seen a difference of one or two
VCPUs (in
>>>> this case here the weight has been computed from 16 VCPUs).
Also it
>>>> shows that the assertion kicks in in the first iteration of the
loop,
>>>> where weight_left and weight_total are still equal.
>>>>
>>>> So I additionally instrumented alloc_pdata and free_pdata, the
>>>> unprefixed lines come from a shell script mimicking the
functionality of
>>>> cpupool-numa-split.
>>>> ------------
>>>> Removing CPUs from Pool 0
>>>> Creating new pool
>>>> Using config file "cpupool.test"
>>>> cpupool name: Pool-node6
>>>> scheduler: credit
>>>> number of cpus: 1
>>>> (XEN) adding CPU 36, now 1 CPUs
>>>> (XEN) removing CPU 36, remaining: 17
>>>> Populating new pool
>>>> (XEN) sdom->active_vcpu_count: 9
>>>> (XEN) sdom->weight: 256
>>>> (XEN) weight_left: 2048, weight_total: 2048
>>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>>>> (XEN) adding CPU 37, now 2 CPUs
>>>> (XEN) removing CPU 37, remaining: 16
>>>> (XEN) adding CPU 38, now 3 CPUs
>>>> (XEN) removing CPU 38, remaining: 15
>>>> (XEN) adding CPU 39, now 4 CPUs
>>>> (XEN) removing CPU 39, remaining: 14
>>>> (XEN) adding CPU 40, now 5 CPUs
>>>> (XEN) removing CPU 40, remaining: 13
>>>> (XEN) sdom->active_vcpu_count: 17
>>>> (XEN) sdom->weight: 256
>>>> (XEN) weight_left: 4096, weight_total: 4096
>>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>>>> (XEN) adding CPU 41, now 6 CPUs
>>>> (XEN) removing CPU 41, remaining: 12
>>>> ...
>>>> Two thing startled me:
>>>> 1) There is quite some between the "Removing CPUs"
message from the
>>>> script and the actual HV printk showing it''s done, why
is that not
>>>> synchronous?
>>> Removing cpus from Pool-0 requires no switching of the scheduler,
so you
>>> see no calls of alloc/free_pdata here.
>>>
>>>> Looking at the code it shows that
>>>> __csched_vcpu_acct_start() is eventually triggered by a timer,
shouldn''t
>>>> that be triggered synchronously by add/removal events?
>>> The vcpus are not moved explicitly, they are migrated by the normal
>>> scheduler mechanisms, same as for vcpu-pin.
>>>
>>>> 2) It clearly shows that each CPU gets added to the new pool
_before_ it
>>>> gets removed from the old one (Pool-0), isn''t that
violating the "only
>>>> one pool per CPU" rule? Even it that is fine for a short
period of time,
>>>> maybe the timer kicks in in this very moment resulting in
violated
>>>> invariants?
>>> The sequence you are seeing seems to be okay. The alloc_pdata for
the
>>> new pool
>>> is called before the free_pdata for the old pool.
>>>
>>> And the timer is not relevant, as only the idle vcpu should be
running
>>> on the
>>> moving cpu and the accounting stuff is never called during idle.
>> Uhh, this could be wrong!
>> The normal ticker doesn''t call accounting in idle and it is
stopped during
>> cpu move. The master_ticker is handled wrong, perhaps. I''ll
check this and
>> prepare a patch if necessary.
>>
>>
>> Juergen
>>
> 
> 
-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2011-Feb-02 14:39 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Hi folks,
  long time no see. :-)

On Tuesday 01 February 2011 17:32:25 Andre Przywara
wrote:> I asked Stephan Diestelhorst for help and after I convinced him that 
> removing credit and making SEDF the default again is not an option he 
> worked together with me on that ;-) Many thanks for that!
> We haven''t come to a final solution but could gather some debug
data.
> I will simply dump some data here, maybe somebody has got a clue. We 
> will work further on this tomorrow.
Andre and I have been looking through this further, in particular sanity
checking the invariant

prv->weight >= sdom->weight * sdom->active_vcpu_count

each time someone tweaks the active vcpu count. This happens only in
__csched_vcpu_acct_start and __csched_vcpu_acct_stop_locked. We managed
to observe the broken invariant when splitting cpupoools.

We have the following theory of what happens:
* some vcpus of a particular domain are currently in the process of
  being moved to the new pool

* some are still left on the old pool (vcpus_old) and some are already
  in the new pool (vcpus_new)

* we now have vcpus_old->sdom = vcpus_new->sdom and following from this
  * vcpus_old->sdom->weight = vcpus_new->sdom->weight
  * vcpus_old->sdom->active_vcpu_count =
vcpus_new->sdom->active_vcpu_count

* active_vcpu_count thus does not represent the separation of the
  actual vpcus (may be the sum, only the old or new ones, does not
  matter)

* however, sched_old != sched_new, and thus 
  * sched_old->prv != sched_new->prv
  * sched_old->prv->weight != sched_new->prv->weight

* the prv->weight field hence sees the incremental move of VCPUs
  (through modifications in *acct_start and *acct_stop_locked)

* if at any point in this half-way migration, the scheduler wants to
  csched_acct, it erroneously checks the wrong active_vcpu_count

Workarounds / fixes (none tried):
* disable scheduler accounting while half-way migrating a domain
  (dom->pool_migrating flag and then checking in csched_acct)
* temporarily split the sdom structures while migrating to account for
  transient split of vcpus
* synchronously disable all vcpus, migrate and then re-enable

Caveats:
* prv->lock does not guarantee mutual exclusion between (same)
  schedulers of different pools

<rant>
The general locking policy vs the comment situation is a nightmare.
I know that we have some advanced data-structure folks here, but
intuitively reasoning about when specific things are atomic and
mutually excluded is a pain in the scheduler / cpupool code, see the
issue with the separate prv->locks above.

E.g. cpupool_unassign_cpu and cpupool_unassign_cpu_helper interplay:
* cpupool_unassign_cpu unlocks cpupool_lock
* sets up the continuation calling cpupool_unassign_cpu_helper
* cpupool_unassign_cpu_helper locks cpupool_lock
* while intuitively, one would think that both should see a consistent
  snapshot and hence freeing the lock in the middle is a bad idea
* also communicating continuation-local state through global variables
  mandates that only a single global continuation can be pending

* reading cpu outside of the lock protection in
  cpupool_unassign_cpu_helper also smells
</rant>

Despite the rant, it is amazing to see the ability to move running
things around through this remote continuation trick! In my (ancient)
balancer experiments I added hypervisor-threads just for side-
stepping this issue..

Stephan
-- 
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst@amd.com
Tel. +49 (0)351 448 356 719

Advanced Micro Devices GmbH
Einsteinring 24
85609 Aschheim
Germany
Geschaeftsfuehrer: Alberto Bozzo u. Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632, WEEE-Reg-Nr: DE 12919551

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-02 15:14 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/02/11 15:39, Stephan Diestelhorst wrote:> Hi folks,
>    long time no see. :-)
>
> On Tuesday 01 February 2011 17:32:25 Andre Przywara wrote:
>> I asked Stephan Diestelhorst for help and after I convinced him that
>> removing credit and making SEDF the default again is not an option he
>> worked together with me on that ;-) Many thanks for that!
>> We haven''t come to a final solution but could gather some
debug data.
>> I will simply dump some data here, maybe somebody has got a clue. We
>> will work further on this tomorrow.
>
> Andre and I have been looking through this further, in particular sanity
> checking the invariant
>
> prv->weight>= sdom->weight * sdom->active_vcpu_count
>
> each time someone tweaks the active vcpu count. This happens only in
> __csched_vcpu_acct_start and __csched_vcpu_acct_stop_locked. We managed
> to observe the broken invariant when splitting cpupoools.
>
> We have the following theory of what happens:
> * some vcpus of a particular domain are currently in the process of
>    being moved to the new pool
The only _vcpus_ to be moved between pools are the idle vcpus. And those
never contribute to accounting in credit scheduler.

We are moving _pcpus_ only (well, moving a domain between pools actually
moves vcpus as well, but then the domain is paused).
On the pcpu to be moved the idle vcpu should be running. Obviously you
have found a scenario where this isn''t true. I have no idea how this
could
happen, as other then idle vcpus are taken into account for scheduling
only if the pcpu is valid in the cpupool. And the pcpu is set valid after the
BUG_ON you have triggered in your tests.
>
> * some are still left on the old pool (vcpus_old) and some are already
>    in the new pool (vcpus_new)
>
> * we now have vcpus_old->sdom = vcpus_new->sdom and following from
this
>    * vcpus_old->sdom->weight = vcpus_new->sdom->weight
>    * vcpus_old->sdom->active_vcpu_count =
vcpus_new->sdom->active_vcpu_count
>
> * active_vcpu_count thus does not represent the separation of the
>    actual vpcus (may be the sum, only the old or new ones, does not
>    matter)
>
> * however, sched_old != sched_new, and thus
>    * sched_old->prv != sched_new->prv
>    * sched_old->prv->weight != sched_new->prv->weight
>
> * the prv->weight field hence sees the incremental move of VCPUs
>    (through modifications in *acct_start and *acct_stop_locked)
>
> * if at any point in this half-way migration, the scheduler wants to
>    csched_acct, it erroneously checks the wrong active_vcpu_count
>
> Workarounds / fixes (none tried):
> * disable scheduler accounting while half-way migrating a domain
>    (dom->pool_migrating flag and then checking in csched_acct)
> * temporarily split the sdom structures while migrating to account for
>    transient split of vcpus
> * synchronously disable all vcpus, migrate and then re-enable
>
> Caveats:
> * prv->lock does not guarantee mutual exclusion between (same)
>    schedulers of different pools
>
> <rant>
> The general locking policy vs the comment situation is a nightmare.
> I know that we have some advanced data-structure folks here, but
> intuitively reasoning about when specific things are atomic and
> mutually excluded is a pain in the scheduler / cpupool code, see the
> issue with the separate prv->locks above.
>
> E.g. cpupool_unassign_cpu and cpupool_unassign_cpu_helper interplay:
> * cpupool_unassign_cpu unlocks cpupool_lock
> * sets up the continuation calling cpupool_unassign_cpu_helper
> * cpupool_unassign_cpu_helper locks cpupool_lock
> * while intuitively, one would think that both should see a consistent
>    snapshot and hence freeing the lock in the middle is a bad idea
> * also communicating continuation-local state through global variables
>    mandates that only a single global continuation can be pending
>
> * reading cpu outside of the lock protection in
>    cpupool_unassign_cpu_helper also smells
> </rant>
>
> Despite the rant, it is amazing to see the ability to move running
> things around through this remote continuation trick! In my (ancient)
> balancer experiments I added hypervisor-threads just for side-
> stepping this issue..
I think the easiest way to solve the problem would be to move the cpu to the
new pool in a tasklet. This is possible now, because tasklets are always
executed in the idle vcpus.

OTOH I''d like to understand what is wrong with my current approach...


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephan Diestelhorst

2011-Feb-02 16:01 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On Wednesday 02 February 2011 16:14:25 Juergen Gross
wrote:> On 02/02/11 15:39, Stephan Diestelhorst wrote:
> > We have the following theory of what happens:
> > * some vcpus of a particular domain are currently in the process of
> >    being moved to the new pool
> 
> The only _vcpus_ to be moved between pools are the idle vcpus. And those
> never contribute to accounting in credit scheduler.
> 
> We are moving _pcpus_ only (well, moving a domain between pools actually
> moves vcpus as well, but then the domain is paused).
How do you ensure that the domain is paused and stays that way? Pausing
the domain was what I had in mind, too...
> > Despite the rant, it is amazing to see the ability to move running
> > things around through this remote continuation trick! In my (ancient)
> > balancer experiments I added hypervisor-threads just for side-
> > stepping this issue..
> 
> I think the easiest way to solve the problem would be to move the cpu to
the
> new pool in a tasklet. This is possible now, because tasklets are always
> executed in the idle vcpus.
Yep. That was exactly what I build. At the time stuff like that did
not exist (2005).
> OTOH I''d like to understand what is wrong with my current
approach...
Nothing, in fact I like it. In my rant I complained about the fact
that splitting the critical section accross this continuation looks
scary, basically causing some generic red lights to turn on :-) And
making reasoning about the correctness a little complicated, but that
may well be a local issue ;-)

Stephan

-- 
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst@amd.com
Tel. +49 (0)351 448 356 719

Advanced Micro Devices GmbH
Einsteinring 24
85609 Aschheim
Germany

Geschaeftsfuehrer: Alberto Bozzo u. Andrew Bowd; 
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632, WEEE-Reg-Nr: DE 12919551


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-03 05:57 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/02/11 17:01, Stephan Diestelhorst wrote:> On Wednesday 02 February 2011 16:14:25 Juergen Gross wrote:
>> On 02/02/11 15:39, Stephan Diestelhorst wrote:
>>> We have the following theory of what happens:
>>> * some vcpus of a particular domain are currently in the process of
>>>     being moved to the new pool
>>
>> The only _vcpus_ to be moved between pools are the idle vcpus. And
those
>> never contribute to accounting in credit scheduler.
>>
>> We are moving _pcpus_ only (well, moving a domain between pools
actually
>> moves vcpus as well, but then the domain is paused).
>
> How do you ensure that the domain is paused and stays that way? Pausing
> the domain was what I had in mind, too...
Look at sched_move_domain() in schedule.c: I''m calling domain_pause()
before moving the vcpus and domain_unpause() after that.
>
>>> Despite the rant, it is amazing to see the ability to move running
>>> things around through this remote continuation trick! In my
(ancient)
>>> balancer experiments I added hypervisor-threads just for side-
>>> stepping this issue..
>>
>> I think the easiest way to solve the problem would be to move the cpu
to the
>> new pool in a tasklet. This is possible now, because tasklets are
always
>> executed in the idle vcpus.
>
> Yep. That was exactly what I build. At the time stuff like that did
> not exist (2005).
>
>> OTOH I''d like to understand what is wrong with my current
approach...
>
> Nothing, in fact I like it. In my rant I complained about the fact
> that splitting the critical section accross this continuation looks
> scary, basically causing some generic red lights to turn on :-) And
> making reasoning about the correctness a little complicated, but that
> may well be a local issue ;-)
Perhaps you can help solving the miracle:

Could you replace the BUG_ON in sched_credit.c:389 with something like this:

if (!is_idle_vcpu(per_cpu(schedule_data, cpu).curr)) {
   extern void dump_runq(unsigned char key);
   struct vcpu *vc = per_cpu(schedule_data, cpu).curr;

   printk("+++ (%d.%d) instead idle vcpu on cpu %d\n",
vc->domain->domain_id,
           vc->vcpu_id, cpu);
   dump_runq(''q'');
   BUG();
}


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-03 09:18 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Andre, Stephan,

could you give the attached patch a try?
It moves the cpu assigning/unassigning into a tasklet always executed on the
cpu to be moved. This should avoid critical races.

Regarding Stephans rant:
You should be aware that the main critical sections are only in the tasklets.
The locking in the main routines is needed only to avoid the cpupool to be
destroyed in between.

I''m not sure whether the master_ticker patch is still needed. It seems
to
break something, as my machine hung up after several 100 cpu moves (without
the new patch). I''m still investigating this problem.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-04 14:09 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen Gross wrote:> Andre, Stephan,
> 
> could you give the attached patch a try?
> It moves the cpu assigning/unassigning into a tasklet always executed on
the
> cpu to be moved. This should avoid critical races.
Done. I checked it twice, but sadly it does not fix the issue. It still 
BUGs:
(XEN) Xen BUG at sched_credit.c:990
(XEN) ----[ Xen-4.1.0-rc3-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c480118208>] csched_acct+0x11f/0x419
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: 0000000000000010   rbx: 0000000000000f00   rcx: 0000000000000100
(XEN) rdx: 0000000000001000   rsi: ffff830437ffa600   rdi: 0000000000000010
(XEN) rbp: ffff82c480297e10   rsp: ffff82c480297d80   r8:  0000000000000100
(XEN) r9:  0000000000000006   r10: ffff82c4802d4100   r11: 0000017322fea49a
(XEN) r12: ffff830437ffa5e0   r13: ffff82c4801180e9   r14: ffff83043399f018
(XEN) r15: ffff830434321ec0   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 00000000c7c9c000   cr2: 0000000001ec8048
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480297d80:
(XEN)    ffff82c480297f18 fffffed4c7cd6000 ffff830000000eff ffff830437ffa5e0
(XEN)    ffff830437ffa5e8 ffff82c480297df8 ffff830437ffa5e0 0000000000000282
(XEN)    ffff830437ffa5e8 00001c200000000f 00000f0000000f00 0000000000000000
(XEN)    ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c4801180e9
(XEN)    ffff83043399f018 ffff83043399f010 ffff82c480297e40 ffff82c480126044
(XEN)    0000000000000002 ffff830437ffa600 ffff82c4802d3f80 00000173010849b7
(XEN)    ffff82c480297e90 ffff82c480126369 ffff82c48024aea0 ffff82c4802d3f80
(XEN)    ffff83043399f010 0000000000000000 0000000000000000 ffff82c4802b0880
(XEN)    ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123437
(XEN)    ffff8300c7e1e0f8 ffff82c480297f18 ffff82c48024aea0 ffff82c480297f18
(XEN)    0000017301008665 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801234b2
(XEN)    ffff82c480297f10 ffff82c4801564f5 0000000000000000 ffff8300c7cd6000
(XEN)    0000000000000000 ffff8300c7e1e000 ffff82c480297d48 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8553f10
(XEN)    ffff8817a8553fd8 0000000000000246 ffff8817a8553e80 ffff880000000001
(XEN)    0000000000000000 0000000000000000 ffffffff810093aa 000000000000e030
(XEN)    00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa
(XEN)    000000000000e033 0000000000000246 ffff8817a8553ef8 000000000000e02b
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c480118208>] csched_acct+0x11f/0x419
(XEN)    [<ffff82c480126044>] execute_timer+0x4e/0x6c
(XEN)    [<ffff82c480126369>] timer_softirq_action+0xf2/0x245
(XEN)    [<ffff82c480123437>] __do_softirq+0x88/0x99
(XEN)    [<ffff82c4801234b2>] do_softirq+0x6a/0x7a
(XEN)    [<ffff82c4801564f5>] idle_loop+0x6a/0x6f
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at sched_credit.c:990
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...


Stephan had created more printk debug patches, we will summarize the 
results soon.

Regards,
Andre.

> 
> Regarding Stephans rant:
> You should be aware that the main critical sections are only in the
tasklets.
> The locking in the main routines is needed only to avoid the cpupool to be
> destroyed in between.
> 
> I''m not sure whether the master_ticker patch is still needed. It
seems to
> break something, as my machine hung up after several 100 cpu moves (without
> the new patch). I''m still investigating this problem.
> 
> 
> Juergen
> 
> 

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-07 12:38 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen,

as promised some more debug data. This is from c/s 22858 with Stephans 
debug patch (attached).
We get the following dump when the hypervisor crashes, note that the 
first lock is different from the second and subsequent ones:

(XEN) sched_credit.c, 572: prv: ffff831836df2970 &prv->lock: 
ffff831836df2970 prv->weight: 256 sdom->active_vcpu_count: 3 
sdom->weight: 256
(XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: 
ffff830437ffa5e0 prv->weight: 768 sdom->active_vcpu_count: 4 
sdom->weight: 256
(XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: 
ffff830437ffa5e0 prv->weight: 1024 sdom->active_vcpu_count: 5 
sdom->weight: 256
(XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: 
ffff830437ffa5e0 prv->weight: 1280 sdom->active_vcpu_count: 6 
sdom->weight: 256

....

Hope that gives you an idea. I attach the whole log for your reference.

Regards,
Andre

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-07 13:32 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/07/11 13:38, Andre Przywara wrote:> Juergen,
>
> as promised some more debug data. This is from c/s 22858 with Stephans
> debug patch (attached).
> We get the following dump when the hypervisor crashes, note that the
> first lock is different from the second and subsequent ones:
>
> (XEN) sched_credit.c, 572: prv: ffff831836df2970 &prv->lock:
> ffff831836df2970 prv->weight: 256 sdom->active_vcpu_count: 3
> sdom->weight: 256
> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
> ffff830437ffa5e0 prv->weight: 768 sdom->active_vcpu_count: 4
> sdom->weight: 256
> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
> ffff830437ffa5e0 prv->weight: 1024 sdom->active_vcpu_count: 5
> sdom->weight: 256
> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
> ffff830437ffa5e0 prv->weight: 1280 sdom->active_vcpu_count: 6
> sdom->weight: 256
>
> ....
>
> Hope that gives you an idea. I attach the whole log for your reference.
Hmm, could it be your log wasn''t created with the attached patch?
I''m missing
Dom-Id and VCPU from the printk() above, which would be interesting (at least
I hope so)...
Additionally printing the local pcpu number would help, too.
And could you add a printk for the new prv address in csched_init()?

It would be nice if you could enable cpupool diag output. Please use the
attached patch (includes the previous patch for executing the cpu move on the
cpu to be moved, plus some diag printk corrections).


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Feb-07 15:55 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen,

What is supposed to happen if a domain is in cpupool0, and then all of
the cpus are taken out of cpupool0?  Is that possible?

It looks like there''s code in cpupools.c:cpupool_unassign_cpu() which
will move all VMs in a cpupool to cpupool0 before removing the last
cpu.  But what happens if cpupool0 is the pool that has become empty?
It seems like that breaks a lot of the assumptions; e.g.,
sched_move_domain() seems to assume that the pool we''re moving a VM to
actually has cpus.

While we''re at it, what''s with the "(cpu !=
cpu_moving_cpu)" in the
first half of cpupool_unassign_cpu()?  Under what conditions are you
anticipating cpupool_unassign_cpu() being called a second time before
the first completes?  If you have to abort the move because
schedule_cpu_switch() failed, wouldn''t it be better just to roll the
whole transaction back, rather than leaving it hanging in the middle?

Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0?  What
could possibly be the use of grabbing a random cpupool and then trying
to remove the specified cpu from it?

Andre, you might think about folding the attached patch into your debug patch.

 -George

On Mon, Feb 7, 2011 at 1:32 PM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:> On 02/07/11 13:38, Andre Przywara wrote:
>>
>> Juergen,
>>
>> as promised some more debug data. This is from c/s 22858 with Stephans
>> debug patch (attached).
>> We get the following dump when the hypervisor crashes, note that the
>> first lock is different from the second and subsequent ones:
>>
>> (XEN) sched_credit.c, 572: prv: ffff831836df2970 &prv->lock:
>> ffff831836df2970 prv->weight: 256 sdom->active_vcpu_count: 3
>> sdom->weight: 256
>> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
>> ffff830437ffa5e0 prv->weight: 768 sdom->active_vcpu_count: 4
>> sdom->weight: 256
>> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
>> ffff830437ffa5e0 prv->weight: 1024 sdom->active_vcpu_count: 5
>> sdom->weight: 256
>> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock:
>> ffff830437ffa5e0 prv->weight: 1280 sdom->active_vcpu_count: 6
>> sdom->weight: 256
>>
>> ....
>>
>> Hope that gives you an idea. I attach the whole log for your reference.
>
> Hmm, could it be your log wasn''t created with the attached patch?
I''m
> missing
> Dom-Id and VCPU from the printk() above, which would be interesting (at
> least
> I hope so)...
> Additionally printing the local pcpu number would help, too.
> And could you add a printk for the new prv address in csched_init()?
>
> It would be nice if you could enable cpupool diag output. Please use the
> attached patch (includes the previous patch for executing the cpu move on
> the
> cpu to be moved, plus some diag printk corrections).
>
>
> Juergen
>
> --
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions              e-mail:
> juergen.gross@ts.fujitsu.com
> Domagkstr. 28                           Internet: ts.fujitsu.com
> D-80807 Muenchen                 Company details:
> ts.fujitsu.com/imprint.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-08 05:43 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/07/11 16:55, George Dunlap wrote:> Juergen,
>
> What is supposed to happen if a domain is in cpupool0, and then all of
> the cpus are taken out of cpupool0?  Is that possible?
No. Cpupool0 can''t be without any cpu, as Dom0 is always member of
cpupool0.
>
> It looks like there''s code in cpupools.c:cpupool_unassign_cpu()
which
> will move all VMs in a cpupool to cpupool0 before removing the last
> cpu.  But what happens if cpupool0 is the pool that has become empty?
> It seems like that breaks a lot of the assumptions; e.g.,
> sched_move_domain() seems to assume that the pool we''re moving a
VM to
> actually has cpus.
The move of VMs to cpupool0 is done only for domains which are dying.
If there are any active domains in the cpupool, removing the last cpu from
it will be denied.
>
> While we''re at it, what''s with the "(cpu !=
cpu_moving_cpu)" in the
> first half of cpupool_unassign_cpu()?  Under what conditions are you
> anticipating cpupool_unassign_cpu() being called a second time before
> the first completes?  If you have to abort the move because
> schedule_cpu_switch() failed, wouldn''t it be better just to roll
the
> whole transaction back, rather than leaving it hanging in the middle?
Not really. It could take some time until all vcpus have been migrated to
another cpu. In this case -EAGAIN is returned and the cpu is already
removed from the cpumask of valid cpus for that cpupool to avoid scheduling
of other vcpus on that cpu. Without cpu_moving_cpu there would be no
forward progress guaranteed.
>
> Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0?  What
> could possibly be the use of grabbing a random cpupool and then trying
> to remove the specified cpu from it?
This is a very good question :-)
I think this should be fixed. Seems to be a copy and paste error. I''ll
send a
patch.


Thanks for your thoughts,


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Feb-08 12:08 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:> On 02/07/11 16:55, George Dunlap wrote:
>>
>> Juergen,
>>
>> What is supposed to happen if a domain is in cpupool0, and then all of
>> the cpus are taken out of cpupool0?  Is that possible?
>
> No. Cpupool0 can''t be without any cpu, as Dom0 is always member of
cpupool0.
If that''s the case, then since Andre is running this immediately after
boot, he shouldn''t be seeing any vcpus in the new pools; and all of
the dom0 vcpus should be migrated to cpupool0, right?  Is it possible
that migration process isn''t happening properly?

It looks like schedule.c:cpu_disable_scheduler() will try to migrate
all vcpus, and if it fails to migrate, it returns -EAGAIN so that the
tools will try again.  It''s probably worth instrumenting that whole
code-path to make sure it actually happens as we expect.  Are we
certain, for example, that if a hypercall continued on another cpu
will actually return the new error value properly?

Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why
is the cpu''s bit set in cpupool_free_cpus without checking to see if
the cpu_disable_scheduler() call actually worked?  Shouldn''t that also
be inside the if() statement?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Feb-08 12:14 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Andre,

Can you try again with the attached patch?

Thanks,
 -George

On Tue, Feb 8, 2011 at 12:08 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:> On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com> wrote:
>> On 02/07/11 16:55, George Dunlap wrote:
>>>
>>> Juergen,
>>>
>>> What is supposed to happen if a domain is in cpupool0, and then all
of
>>> the cpus are taken out of cpupool0?  Is that possible?
>>
>> No. Cpupool0 can''t be without any cpu, as Dom0 is always
member of cpupool0.
>
> If that''s the case, then since Andre is running this immediately
after
> boot, he shouldn''t be seeing any vcpus in the new pools; and all
of
> the dom0 vcpus should be migrated to cpupool0, right?  Is it possible
> that migration process isn''t happening properly?
>
> It looks like schedule.c:cpu_disable_scheduler() will try to migrate
> all vcpus, and if it fails to migrate, it returns -EAGAIN so that the
> tools will try again.  It''s probably worth instrumenting that
whole
> code-path to make sure it actually happens as we expect.  Are we
> certain, for example, that if a hypercall continued on another cpu
> will actually return the new error value properly?
>
> Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why
> is the cpu''s bit set in cpupool_free_cpus without checking to see
if
> the cpu_disable_scheduler() call actually worked?  Shouldn''t that
also
> be inside the if() statement?
>
>  -George
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-08 12:23 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/08/11 13:08, George Dunlap wrote:> On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com>  wrote:
>> On 02/07/11 16:55, George Dunlap wrote:
>>>
>>> Juergen,
>>>
>>> What is supposed to happen if a domain is in cpupool0, and then all
of
>>> the cpus are taken out of cpupool0?  Is that possible?
>>
>> No. Cpupool0 can''t be without any cpu, as Dom0 is always
member of cpupool0.
>
> If that''s the case, then since Andre is running this immediately
after
> boot, he shouldn''t be seeing any vcpus in the new pools; and all
of
> the dom0 vcpus should be migrated to cpupool0, right?  Is it possible
> that migration process isn''t happening properly?
Again: not the vcpus are migrated to cpupool0, but the physical cpus are
taken away from it, so the vcpus being active on the cpu to be moved MUST
be migrated to other cpus of cpupool0.
>
> It looks like schedule.c:cpu_disable_scheduler() will try to migrate
> all vcpus, and if it fails to migrate, it returns -EAGAIN so that the
> tools will try again.  It''s probably worth instrumenting that
whole
> code-path to make sure it actually happens as we expect.  Are we
> certain, for example, that if a hypercall continued on another cpu
> will actually return the new error value properly?
I have checked that and did never see any problem. And yes, I did see
the EAGAIN case happen.
With my test patch to execute the cpu_disable_scheduler() always on the
cpu to be moved this should not be a problem at all, since the tasklet
is always running in the idle vcpu.
>
> Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why
> is the cpu''s bit set in cpupool_free_cpus without checking to see
if
> the cpu_disable_scheduler() call actually worked?  Shouldn''t that
also
> be inside the if() statement?
No, I don''t think so. If removing a cpu fails permanently after
returning
-EAGAIN before, it should be addable to the original cpupool easily. This can
only be done, if it is flagged as free. Adding it to another cpupool will be
denied as cpupool_cpu_moving is still set.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-08 16:33 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

George Dunlap wrote:> Andre,
> 
> Can you try again with the attached patch?Sure. Unfortunately (or is this a good sign?) the "Migration failed" 
message didn''t trigger, I only saw various instances of the other 
printk, see the attached log file.
Migration is happening quite often, because Dom0 has 48 vCPUs and in the 
end they are squashed into less and less pCPUs. I guess that is the 
reason my I see it on my machine.

Regards,
Andre.
> 
> Thanks,
>  -George
> 
> On Tue, Feb 8, 2011 at 12:08 PM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
>> On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross
>> <juergen.gross@ts.fujitsu.com> wrote:
>>> On 02/07/11 16:55, George Dunlap wrote:
>>>> Juergen,
>>>>
>>>> What is supposed to happen if a domain is in cpupool0, and then
all of
>>>> the cpus are taken out of cpupool0?  Is that possible?
>>> No. Cpupool0 can''t be without any cpu, as Dom0 is always
member of cpupool0.
>> If that''s the case, then since Andre is running this
immediately after
>> boot, he shouldn''t be seeing any vcpus in the new pools; and
all of
>> the dom0 vcpus should be migrated to cpupool0, right?  Is it possible
>> that migration process isn''t happening properly?
>>
>> It looks like schedule.c:cpu_disable_scheduler() will try to migrate
>> all vcpus, and if it fails to migrate, it returns -EAGAIN so that the
>> tools will try again.  It''s probably worth instrumenting that
whole
>> code-path to make sure it actually happens as we expect.  Are we
>> certain, for example, that if a hypercall continued on another cpu
>> will actually return the new error value properly?
>>
>> Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why
>> is the cpu''s bit set in cpupool_free_cpus without checking to
see if
>> the cpu_disable_scheduler() call actually worked?  Shouldn''t
that also
>> be inside the if() statement?
>>
>>  -George
>>

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Feb-09 12:27 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara <andre.przywara@amd.com>
wrote:> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29
Interesting -- what seems to happen here is that as cpus are disabled,
vcpus are "shovelled" in an accumulative fashion from one cpu to the
next:
* v18,34,42 start on cpu 24.
* When 24 is brought down, they''re all migrated to 25; then when 25 is
brougth down, to 26, then to 27
* v24 is running on cpu 27, so when 27 is brought down, v24 is added to the mix
* v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29.

While that behavior may not be ideal, it should certainly be bug-free.

Another interesting thing to note is that the bug happened on pcpu 32,
but there were no advertised migrations from that cpu.

Andre, can you fold the attached patch into your testing?

Thanks for all your work on this.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Feb-09 12:27 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Sorry, forgot the patch...
 -G

On Wed, Feb 9, 2011 at 12:27 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara
<andre.przywara@amd.com> wrote:
>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29
>
> Interesting -- what seems to happen here is that as cpus are disabled,
> vcpus are "shovelled" in an accumulative fashion from one cpu to
the
> next:
> * v18,34,42 start on cpu 24.
> * When 24 is brought down, they''re all migrated to 25; then when
25 is
> brougth down, to 26, then to 27
> * v24 is running on cpu 27, so when 27 is brought down, v24 is added to the
mix
> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29.
>
> While that behavior may not be ideal, it should certainly be bug-free.
>
> Another interesting thing to note is that the bug happened on pcpu 32,
> but there were no advertised migrations from that cpu.
>
> Andre, can you fold the attached patch into your testing?
>
> Thanks for all your work on this.
>
>  -George
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-09 13:04 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/09/11 13:27, George Dunlap wrote:> Sorry, forgot the patch...
>   -G
>
> On Wed, Feb 9, 2011 at 12:27 PM, George Dunlap
> <George.Dunlap@eu.citrix.com>  wrote:
>> On Tue, Feb 8, 2011 at 4:33 PM, Andre
Przywara<andre.przywara@amd.com>  wrote:
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
>>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29
>>
>> Interesting -- what seems to happen here is that as cpus are disabled,
>> vcpus are "shovelled" in an accumulative fashion from one cpu
to the
>> next:
>> * v18,34,42 start on cpu 24.
>> * When 24 is brought down, they''re all migrated to 25; then
when 25 is
>> brougth down, to 26, then to 27
>> * v24 is running on cpu 27, so when 27 is brought down, v24 is added to
the mix
>> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu
29.
>>
>> While that behavior may not be ideal, it should certainly be bug-free.
>>
>> Another interesting thing to note is that the bug happened on pcpu 32,
>> but there were no advertised migrations from that cpu.
If I understand the configuration of Andre''s machine correctly, pcpu32
will
be the target of the next migrations. This pcpu is member of the next numa
node, correct?

Could it be there is a problem with the call of domain_update_node_affinity()
from cpu_disable_scheduler() ?

Hmm, I think this could really be the problem.
Andre, could you try the following patch?

diff -r f1fac30a531b xen/common/schedule.c
--- a/xen/common/schedule.c     Wed Feb 09 08:58:11 2011 +0000
+++ b/xen/common/schedule.c     Wed Feb 09 14:02:12 2011 +0100
@@ -491,6 +491,10 @@ int cpu_disable_scheduler(unsigned int c
                          v->domain->domain_id, v->vcpu_id);
                  cpus_setall(v->cpu_affinity);
                  affinity_broken = 1;
+            }
+            if ( cpus_weight(v->cpu_affinity) < NR_CPUS )
+            {
+                cpu_clear(cpu, v->cpu_affinity);
              }

              if ( v->processor == cpu )


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-09 13:39 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen Gross wrote:>>> Another interesting thing to note is that the bug happened on pcpu
32,
>>> but there were no advertised migrations from that cpu.
> 
> If I understand the configuration of Andre''s machine correctly,
pcpu32 will
> be the target of the next migrations. This pcpu is member of the next numa
> node, correct?No, this is a 6-core box, so the NUMA node span
pcpu30-35.> 
> Could it be there is a problem with the call of
domain_update_node_affinity()
> from cpu_disable_scheduler() ?
> 
> Hmm, I think this could really be the problem.
> Andre, could you try the following patch?Sorry, but that one didn''t help. It crashed with the well-known BUG_ON:
(XEN) Xen BUG at sched_credit.c:990
(which is the weight assert in csched_acct (c/s 22858))

Regards,
Andre.
> 
> diff -r f1fac30a531b xen/common/schedule.c
> --- a/xen/common/schedule.c     Wed Feb 09 08:58:11 2011 +0000
> +++ b/xen/common/schedule.c     Wed Feb 09 14:02:12 2011 +0100
> @@ -491,6 +491,10 @@ int cpu_disable_scheduler(unsigned int c
>                           v->domain->domain_id, v->vcpu_id);
>                   cpus_setall(v->cpu_affinity);
>                   affinity_broken = 1;
> +            }
> +            if ( cpus_weight(v->cpu_affinity) < NR_CPUS )
> +            {
> +                cpu_clear(cpu, v->cpu_affinity);
>               }
> 
>               if ( v->processor == cpu )
> 
> 
> Juergen
> 

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-09 13:51 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

George Dunlap wrote:> <George.Dunlap@eu.citrix.com> wrote:
>> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara
<andre.przywara@amd.com> wrote:
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
>>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29
>> Interesting -- what seems to happen here is that as cpus are disabled,
>> vcpus are "shovelled" in an accumulative fashion from one cpu
to the
>> next:
>> * v18,34,42 start on cpu 24.
>> * When 24 is brought down, they''re all migrated to 25; then
when 25 is
>> brougth down, to 26, then to 27
>> * v24 is running on cpu 27, so when 27 is brought down, v24 is added to
the mix
>> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu
29.
>>
>> While that behavior may not be ideal, it should certainly be bug-free.
>>
>> Another interesting thing to note is that the bug happened on pcpu 32,
>> but there were no advertised migrations from that cpu.
>>
>> Andre, can you fold the attached patch into your testing?Sorry, but that bug (and its output) didn''t trigger on two tries. 
Instead I now saw two occasions of the "migration failed, must retry 
later" message. Interestingly enough is does not seem to be fatal. The 
first time it triggers, the numa-split even completes, then after I roll 
it back and repeat it it shows again, but crashes later on that old 
BUG_ON().

See the attached log for more details.

Thanks for the try, anyway.

Regards,
Andre.

>>
>> Thanks for all your work on this.I am glad for all your help. I only start to really understand the 
scheduler, so your support is much appreciated.
>>
>>  -George
>>

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-09 14:21 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Andre, George,


What seems to be interesting: I think the problem did always occur when
a new cpupool was created and the first cpu was moved to it.

I think my previous assumption regarding the master_ticker was not too bad.
I think somehow the master_ticker of the new cpupool is becoming active
before the scheduler is really initialized properly. This could happen, if
enough time is spent between alloc_pdata for the cpu to be moved and the
critical section in schedule_cpu_switch().

The solution should be to activate the timers only if the scheduler is
ready for them.

George, do you think the master_ticker should be stopped in suspend_ticker
as well? I still see potential problems for entering deep C-States. I think
I''ll prepare a patch which will keep the master_ticker active for the
C-State case and migrate it for the schedule_cpu_switch() case.


Juergen

On 02/09/11 14:51, Andre Przywara wrote:> George Dunlap wrote:
>> <George.Dunlap@eu.citrix.com> wrote:
>>> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara
>>> <andre.przywara@amd.com> wrote:
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
>>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
>>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
>>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29
>>> Interesting -- what seems to happen here is that as cpus are
disabled,
>>> vcpus are "shovelled" in an accumulative fashion from one
cpu to the
>>> next:
>>> * v18,34,42 start on cpu 24.
>>> * When 24 is brought down, they''re all migrated to 25;
then when 25 is
>>> brougth down, to 26, then to 27
>>> * v24 is running on cpu 27, so when 27 is brought down, v24 is
added
>>> to the mix
>>> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto
>>> cpu 29.
>>>
>>> While that behavior may not be ideal, it should certainly be
bug-free.
>>>
>>> Another interesting thing to note is that the bug happened on pcpu
32,
>>> but there were no advertised migrations from that cpu.
>>>
>>> Andre, can you fold the attached patch into your testing?
> Sorry, but that bug (and its output) didn''t trigger on two tries.
> Instead I now saw two occasions of the "migration failed, must retry
> later" message. Interestingly enough is does not seem to be fatal. The
> first time it triggers, the numa-split even completes, then after I roll
> it back and repeat it it shows again, but crashes later on that old
> BUG_ON().
>
> See the attached log for more details.
>
> Thanks for the try, anyway.
>
> Regards,
> Andre.
>
>
>>>
>>> Thanks for all your work on this.
> I am glad for all your help. I only start to really understand the
> scheduler, so your support is much appreciated.
>
>>>
>>> -George
>>>
>
>

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-10 06:42 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/09/11 15:21, Juergen Gross wrote:> Andre, George,
>
>
> What seems to be interesting: I think the problem did always occur when
> a new cpupool was created and the first cpu was moved to it.
>
> I think my previous assumption regarding the master_ticker was not too bad.
> I think somehow the master_ticker of the new cpupool is becoming active
> before the scheduler is really initialized properly. This could happen, if
> enough time is spent between alloc_pdata for the cpu to be moved and the
> critical section in schedule_cpu_switch().
>
> The solution should be to activate the timers only if the scheduler is
> ready for them.
>
> George, do you think the master_ticker should be stopped in suspend_ticker
> as well? I still see potential problems for entering deep C-States. I think
> I''ll prepare a patch which will keep the master_ticker active for
the
> C-State case and migrate it for the schedule_cpu_switch() case.
Okay, here is a patch for this. It ran on my 4-core machine without any
problems.
Andre, could you give it a try?


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-10 09:25 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/10/2011 07:42 AM, Juergen Gross wrote:> On 02/09/11 15:21, Juergen Gross wrote:
>> Andre, George,
>>
>>
>> What seems to be interesting: I think the problem did always occur when
>> a new cpupool was created and the first cpu was moved to it.
>>
>> I think my previous assumption regarding the master_ticker was not too
bad.
>> I think somehow the master_ticker of the new cpupool is becoming active
>> before the scheduler is really initialized properly. This could happen,
if
>> enough time is spent between alloc_pdata for the cpu to be moved and
the
>> critical section in schedule_cpu_switch().
>>
>> The solution should be to activate the timers only if the scheduler is
>> ready for them.
>>
>> George, do you think the master_ticker should be stopped in
suspend_ticker
>> as well? I still see potential problems for entering deep C-States. I
think
>> I''ll prepare a patch which will keep the master_ticker active
for the
>> C-State case and migrate it for the schedule_cpu_switch() case.
>
> Okay, here is a patch for this. It ran on my 4-core machine without any
> problems.
> Andre, could you give it a try?Did, but unfortunately it crashed as always. Tried twice and made sure I 
booted the right kernel. Sorry.
The idea with the race between the timer and the state changing sounded 
very appealing, actually that was suspicious to me from the beginning.

I will add some code to dump the state of all cpupools to the BUG_ON to 
see in which situation we are when the bug triggers.

Regards,
Andre.

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-10 14:18 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Andre Przywara wrote:> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>> On 02/09/11 15:21, Juergen Gross wrote:
>>> Andre, George,
>>>
>>>
>>> What seems to be interesting: I think the problem did always occur
when
>>> a new cpupool was created and the first cpu was moved to it.
>>>
>>> I think my previous assumption regarding the master_ticker was not
too bad.
>>> I think somehow the master_ticker of the new cpupool is becoming
active
>>> before the scheduler is really initialized properly. This could
happen, if
>>> enough time is spent between alloc_pdata for the cpu to be moved
and the
>>> critical section in schedule_cpu_switch().
>>>
>>> The solution should be to activate the timers only if the scheduler
is
>>> ready for them.
>>>
>>> George, do you think the master_ticker should be stopped in
suspend_ticker
>>> as well? I still see potential problems for entering deep C-States.
I think
>>> I''ll prepare a patch which will keep the master_ticker
active for the
>>> C-State case and migrate it for the schedule_cpu_switch() case.
>> Okay, here is a patch for this. It ran on my 4-core machine without any
>> problems.
>> Andre, could you give it a try?
> Did, but unfortunately it crashed as always. Tried twice and made sure I 
> booted the right kernel. Sorry.
> The idea with the race between the timer and the state changing sounded 
> very appealing, actually that was suspicious to me from the beginning.
> 
> I will add some code to dump the state of all cpupools to the BUG_ON to 
> see in which situation we are when the bug triggers.OK, here is a first try of this, the patch iterates over all CPU pools 
and outputs some data if the BUG_ON
((sdom->weight * sdom->active_vcpu_count) > weight_left) condition
triggers:
(XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f
(XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
(XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
(XEN) Xen BUG at sched_credit.c:1010
....
The masks look proper (6 cores per node), the bug triggers when the 
first CPU is about to be(?) inserted.

HTH,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-11 06:17 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/10/11 15:18, Andre Przywara wrote:> Andre Przywara wrote:
>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>> Andre, George,
>>>>
>>>>
>>>> What seems to be interesting: I think the problem did always
occur when
>>>> a new cpupool was created and the first cpu was moved to it.
>>>>
>>>> I think my previous assumption regarding the master_ticker was
not
>>>> too bad.
>>>> I think somehow the master_ticker of the new cpupool is
becoming active
>>>> before the scheduler is really initialized properly. This could
>>>> happen, if
>>>> enough time is spent between alloc_pdata for the cpu to be
moved and
>>>> the
>>>> critical section in schedule_cpu_switch().
>>>>
>>>> The solution should be to activate the timers only if the
scheduler is
>>>> ready for them.
>>>>
>>>> George, do you think the master_ticker should be stopped in
>>>> suspend_ticker
>>>> as well? I still see potential problems for entering deep
C-States.
>>>> I think
>>>> I''ll prepare a patch which will keep the master_ticker
active for the
>>>> C-State case and migrate it for the schedule_cpu_switch() case.
>>> Okay, here is a patch for this. It ran on my 4-core machine without
any
>>> problems.
>>> Andre, could you give it a try?
>> Did, but unfortunately it crashed as always. Tried twice and made sure
>> I booted the right kernel. Sorry.
>> The idea with the race between the timer and the state changing
>> sounded very appealing, actually that was suspicious to me from the
>> beginning.
>>
>> I will add some code to dump the state of all cpupools to the BUG_ON
>> to see in which situation we are when the bug triggers.
> OK, here is a first try of this, the patch iterates over all CPU pools
> and outputs some data if the BUG_ON
> ((sdom->weight * sdom->active_vcpu_count) > weight_left) condition
> triggers:
> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f
> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
> (XEN) Xen BUG at sched_credit.c:1010
> ....
> The masks look proper (6 cores per node), the bug triggers when the
> first CPU is about to be(?) inserted.
Sure? I''m missing the cpu with mask 2000.
I''ll try to reproduce the problem on a larger machine here (24 cores, 4
numa
nodes).
Andre, can you give me your xen boot parameters? Which xen changeset are you
running, and do you have any additional patches in use?


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-11 07:39 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen Gross wrote:> On 02/10/11 15:18, Andre Przywara wrote:
>> Andre Przywara wrote:
>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>> Andre, George,
>>>>>
>>>>>
>>>>> What seems to be interesting: I think the problem did
always occur when
>>>>> a new cpupool was created and the first cpu was moved to
it.
>>>>>
>>>>> I think my previous assumption regarding the master_ticker
was not
>>>>> too bad.
>>>>> I think somehow the master_ticker of the new cpupool is
becoming active
>>>>> before the scheduler is really initialized properly. This
could
>>>>> happen, if
>>>>> enough time is spent between alloc_pdata for the cpu to be
moved and
>>>>> the
>>>>> critical section in schedule_cpu_switch().
>>>>>
>>>>> The solution should be to activate the timers only if the
scheduler is
>>>>> ready for them.
>>>>>
>>>>> George, do you think the master_ticker should be stopped in
>>>>> suspend_ticker
>>>>> as well? I still see potential problems for entering deep
C-States.
>>>>> I think
>>>>> I''ll prepare a patch which will keep the
master_ticker active for the
>>>>> C-State case and migrate it for the schedule_cpu_switch()
case.
>>>> Okay, here is a patch for this. It ran on my 4-core machine
without any
>>>> problems.
>>>> Andre, could you give it a try?
>>> Did, but unfortunately it crashed as always. Tried twice and made
sure
>>> I booted the right kernel. Sorry.
>>> The idea with the race between the timer and the state changing
>>> sounded very appealing, actually that was suspicious to me from the
>>> beginning.
>>>
>>> I will add some code to dump the state of all cpupools to the
BUG_ON
>>> to see in which situation we are when the bug triggers.
>> OK, here is a first try of this, the patch iterates over all CPU pools
>> and outputs some data if the BUG_ON
>> ((sdom->weight * sdom->active_vcpu_count) > weight_left)
condition
>> triggers:
>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f
>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
>> (XEN) Xen BUG at sched_credit.c:1010
>> ....
>> The masks look proper (6 cores per node), the bug triggers when the
>> first CPU is about to be(?) inserted.
> 
> Sure? I''m missing the cpu with mask 2000.
> I''ll try to reproduce the problem on a larger machine here (24
cores, 4 numa
> nodes).
> Andre, can you give me your xen boot parameters? Which xen changeset are
you
> running, and do you have any additional patches in use?
The grub lines:
kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200
module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 
console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0

All of my experiments are use c/s 22858 as a base.
If you use a AMD Magny-Cours box for your experiments (socket C32 or 
G34), you should add the following patch (removing the line)
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
          __clear_bit(X86_FEATURE_SKINIT % 32, &c);
          __clear_bit(X86_FEATURE_WDT % 32, &c);
          __clear_bit(X86_FEATURE_LWP % 32, &c);
-        __clear_bit(X86_FEATURE_NODEID_MSR % 32, &c);
          __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
          break;
      case 5: /* MONITOR/MWAIT */

This is not necessary (in fact that reverts my patch c/s 22815), but 
raises the probability to trigger the bug, probably because it increases 
the pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, 
try to create a guest with many VCPUs and squeeze it into a small CPU-pool.

Good luck ;-)
Andre.

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Feb-14 17:57 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

The good news is, I''ve managed to reproduce this on my local test
hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the
attached script.  It''s time to go home now, but I should be able to
dig something up tomorrow.

To use the script:
* Rename cpupool0 to "p0", and create an empty second pool,
"p1"
* You can modify elements by adding "arg=val" as arguments.
* Arguments are:
 + dryrun={true,false} Do the work, but don''t actually execute any xl
arguments.  Default false.
 + left: Number commands to execute.  Default 10.
 + maxcpus: highest numerical value for a cpu.  Default 7 (i.e., 0-7 is 8 cpus).
 + verbose={true,false} Print what you''re doing.  Default is true.

The script sometimes attempts to remove the last cpu from cpupool0; in
this case, libxl will print an error.  If the script gets an error
under that condition, it will ignore it; under any other condition, it
will print diagnostic information.

What finally crashed it for me was this command:
# ./cpupool-test.sh verbose=false left=1000

 -George

On Fri, Feb 11, 2011 at 7:39 AM, Andre Przywara <andre.przywara@amd.com>
wrote:> Juergen Gross wrote:
>>
>> On 02/10/11 15:18, Andre Przywara wrote:
>>>
>>> Andre Przywara wrote:
>>>>
>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>
>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>
>>>>>> Andre, George,
>>>>>>
>>>>>>
>>>>>> What seems to be interesting: I think the problem did
always occur
>>>>>> when
>>>>>> a new cpupool was created and the first cpu was moved
to it.
>>>>>>
>>>>>> I think my previous assumption regarding the
master_ticker was not
>>>>>> too bad.
>>>>>> I think somehow the master_ticker of the new cpupool is
becoming
>>>>>> active
>>>>>> before the scheduler is really initialized properly.
This could
>>>>>> happen, if
>>>>>> enough time is spent between alloc_pdata for the cpu to
be moved and
>>>>>> the
>>>>>> critical section in schedule_cpu_switch().
>>>>>>
>>>>>> The solution should be to activate the timers only if
the scheduler is
>>>>>> ready for them.
>>>>>>
>>>>>> George, do you think the master_ticker should be
stopped in
>>>>>> suspend_ticker
>>>>>> as well? I still see potential problems for entering
deep C-States.
>>>>>> I think
>>>>>> I''ll prepare a patch which will keep the
master_ticker active for the
>>>>>> C-State case and migrate it for the
schedule_cpu_switch() case.
>>>>>
>>>>> Okay, here is a patch for this. It ran on my 4-core machine
without any
>>>>> problems.
>>>>> Andre, could you give it a try?
>>>>
>>>> Did, but unfortunately it crashed as always. Tried twice and
made sure
>>>> I booted the right kernel. Sorry.
>>>> The idea with the race between the timer and the state changing
>>>> sounded very appealing, actually that was suspicious to me from
the
>>>> beginning.
>>>>
>>>> I will add some code to dump the state of all cpupools to the
BUG_ON
>>>> to see in which situation we are when the bug triggers.
>>>
>>> OK, here is a first try of this, the patch iterates over all CPU
pools
>>> and outputs some data if the BUG_ON
>>> ((sdom->weight * sdom->active_vcpu_count) > weight_left)
condition
>>> triggers:
>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask:
fffffffc003f
>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
>>> (XEN) Xen BUG at sched_credit.c:1010
>>> ....
>>> The masks look proper (6 cores per node), the bug triggers when the
>>> first CPU is about to be(?) inserted.
>>
>> Sure? I''m missing the cpu with mask 2000.
>> I''ll try to reproduce the problem on a larger machine here (24
cores, 4
>> numa
>> nodes).
>> Andre, can you give me your xen boot parameters? Which xen changeset
are
>> you
>> running, and do you have any additional patches in use?
>
> The grub lines:
> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200
> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>
> All of my experiments are use c/s 22858 as a base.
> If you use a AMD Magny-Cours box for your experiments (socket C32 or G34),
> you should add the following patch (removing the line)
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
>         __clear_bit(X86_FEATURE_SKINIT % 32, &c);
>         __clear_bit(X86_FEATURE_WDT % 32, &c);
>         __clear_bit(X86_FEATURE_LWP % 32, &c);
> -        __clear_bit(X86_FEATURE_NODEID_MSR % 32, &c);
>         __clear_bit(X86_FEATURE_TOPOEXT % 32, &c);
>         break;
>     case 5: /* MONITOR/MWAIT */
>
> This is not necessary (in fact that reverts my patch c/s 22815), but raises
> the probability to trigger the bug, probably because it increases the
> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, try to
> create a guest with many VCPUs and squeeze it into a small CPU-pool.
>
> Good luck ;-)
> Andre.
>
> --
> Andre Przywara
> AMD-OSRC (Dresden)
> Tel: x29712
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-15 07:22 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/14/11 18:57, George Dunlap wrote:> The good news is, I''ve managed to reproduce this on my local test
> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the
> attached script.  It''s time to go home now, but I should be able
to
> dig something up tomorrow.
>
> To use the script:
> * Rename cpupool0 to "p0", and create an empty second pool,
"p1"
> * You can modify elements by adding "arg=val" as arguments.
> * Arguments are:
>   + dryrun={true,false} Do the work, but don''t actually execute
any xl
> arguments.  Default false.
>   + left: Number commands to execute.  Default 10.
>   + maxcpus: highest numerical value for a cpu.  Default 7 (i.e., 0-7 is 8
cpus).
>   + verbose={true,false} Print what you''re doing.  Default is
true.
>
> The script sometimes attempts to remove the last cpu from cpupool0; in
> this case, libxl will print an error.  If the script gets an error
> under that condition, it will ignore it; under any other condition, it
> will print diagnostic information.
>
> What finally crashed it for me was this command:
> # ./cpupool-test.sh verbose=false left=1000
Nice!
With your script I finally managed to get the error, too. On my box (2 sockets
a 6 cores) I had to use

./cpupool-test.sh verbose=false left=10000 maxcpus=11

to trigger it.
Looking for more data now...


Juergen
>
>   -George
>
> On Fri, Feb 11, 2011 at 7:39 AM, Andre
Przywara<andre.przywara@amd.com>  wrote:
>> Juergen Gross wrote:
>>>
>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>
>>>> Andre Przywara wrote:
>>>>>
>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>
>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>
>>>>>>> Andre, George,
>>>>>>>
>>>>>>>
>>>>>>> What seems to be interesting: I think the problem
did always occur
>>>>>>> when
>>>>>>> a new cpupool was created and the first cpu was
moved to it.
>>>>>>>
>>>>>>> I think my previous assumption regarding the
master_ticker was not
>>>>>>> too bad.
>>>>>>> I think somehow the master_ticker of the new
cpupool is becoming
>>>>>>> active
>>>>>>> before the scheduler is really initialized
properly. This could
>>>>>>> happen, if
>>>>>>> enough time is spent between alloc_pdata for the
cpu to be moved and
>>>>>>> the
>>>>>>> critical section in schedule_cpu_switch().
>>>>>>>
>>>>>>> The solution should be to activate the timers only
if the scheduler is
>>>>>>> ready for them.
>>>>>>>
>>>>>>> George, do you think the master_ticker should be
stopped in
>>>>>>> suspend_ticker
>>>>>>> as well? I still see potential problems for
entering deep C-States.
>>>>>>> I think
>>>>>>> I''ll prepare a patch which will keep the
master_ticker active for the
>>>>>>> C-State case and migrate it for the
schedule_cpu_switch() case.
>>>>>>
>>>>>> Okay, here is a patch for this. It ran on my 4-core
machine without any
>>>>>> problems.
>>>>>> Andre, could you give it a try?
>>>>>
>>>>> Did, but unfortunately it crashed as always. Tried twice
and made sure
>>>>> I booted the right kernel. Sorry.
>>>>> The idea with the race between the timer and the state
changing
>>>>> sounded very appealing, actually that was suspicious to me
from the
>>>>> beginning.
>>>>>
>>>>> I will add some code to dump the state of all cpupools to
the BUG_ON
>>>>> to see in which situation we are when the bug triggers.
>>>>
>>>> OK, here is a first try of this, the patch iterates over all
CPU pools
>>>> and outputs some data if the BUG_ON
>>>> ((sdom->weight * sdom->active_vcpu_count)> 
weight_left) condition
>>>> triggers:
>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask:
fffffffc003f
>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0
>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000
>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>> ....
>>>> The masks look proper (6 cores per node), the bug triggers when
the
>>>> first CPU is about to be(?) inserted.
>>>
>>> Sure? I''m missing the cpu with mask 2000.
>>> I''ll try to reproduce the problem on a larger machine here
(24 cores, 4
>>> numa
>>> nodes).
>>> Andre, can you give me your xen boot parameters? Which xen
changeset are
>>> you
>>> running, and do you have any additional patches in use?
>>
>> The grub lines:
>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200
>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>
>> All of my experiments are use c/s 22858 as a base.
>> If you use a AMD Magny-Cours box for your experiments (socket C32 or
G34),
>> you should add the following patch (removing the line)
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs)
>>          __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>          __clear_bit(X86_FEATURE_WDT % 32,&c);
>>          __clear_bit(X86_FEATURE_LWP % 32,&c);
>> -        __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>          __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>          break;
>>      case 5: /* MONITOR/MWAIT */
>>
>> This is not necessary (in fact that reverts my patch c/s 22815), but
raises
>> the probability to trigger the bug, probably because it increases the
>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, try
to
>> create a guest with many VCPUs and squeeze it into a small CPU-pool.
>>
>> Good luck ;-)
>> Andre.
>>
>> --
>> Andre Przywara
>> AMD-OSRC (Dresden)
>> Tel: x29712
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-16 09:47 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Okay, I have some more data.

I activated cpupool_dprintk() and included checks in sched_credit.c to
test for weight inconsistencies. To reduce race possibilities I''ve
added
my patch to execute cpu assigning/unassigning always in a tasklet on the
cpu to be moved.

Here is the result:

(XEN) cpupool_unassign_cpu(pool=0,cpu=6)
(XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
(XEN) cpupool_unassign_cpu(pool=0,cpu=6)
(XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
(XEN) cpupool_assign_cpu(pool=0,cpu=1)
(XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
(XEN) cpupool_assign_cpu(cpu=1) ret 0
(XEN) cpupool_assign_cpu(pool=1,cpu=4)
(XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
(XEN) cpupool_assign_cpu(cpu=4) ret 0
(XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
(XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
(XEN) Xen BUG at sched_credit.c:570
(XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    4
(XEN) RIP:    e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f
(XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830839d3ec30   rcx: 0000000000000000
(XEN) rdx: ffff830839dcff18   rsi: 000000000000000a   rdi: ffff82c4802542e8
(XEN) rbp: ffff830839dcfe38   rsp: ffff830839dcfde8   r8:  0000000000000004
(XEN) r9:  ffff82c480213520   r10: 00000000fffffffc   r11: 0000000000000001
(XEN) r12: 0000000000000004   r13: ffff830839d3ec40   r14: ffff831002ad5e40
(XEN) r15: ffff830839d66f90   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000001020a98000   cr2: 00007fc5e9b79d98
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff830839dcfde8:
(XEN)    ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000
(XEN)    0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651
(XEN)    ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204
(XEN)    0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e
(XEN)    ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100
(XEN)    ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880
(XEN)    ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647
(XEN)    ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20
(XEN)    0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2
(XEN)    00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002
(XEN)    00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50
(XEN)    0000000000000246 0000000000000032 0000000000000000 00000000ffffffff
(XEN)    0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848
(XEN)    00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033
(XEN)    0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000004
(XEN)    ffff830077eee000 00000043b9afd180 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c4801197d7>] csched_tick+0x186/0x37f
(XEN)    [<ffff82c480126204>] execute_timer+0x4e/0x6c
(XEN)    [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
(XEN)    [<ffff82c480123647>] __do_softirq+0x88/0x99
(XEN)    [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 4:
(XEN) Xen BUG at sched_credit.c:570
(XEN) ****************************************

As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON
triggered in csched_acct() is a logical result of this.

How this can happen I don''t know yet.
Anyone any idea? I''ll keep searching...


Juergen

On 02/15/11 08:22, Juergen Gross wrote:> On 02/14/11 18:57, George Dunlap wrote:
>> The good news is, I''ve managed to reproduce this on my local
test
>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the
>> attached script. It''s time to go home now, but I should be
able to
>> dig something up tomorrow.
>>
>> To use the script:
>> * Rename cpupool0 to "p0", and create an empty second pool,
"p1"
>> * You can modify elements by adding "arg=val" as arguments.
>> * Arguments are:
>> + dryrun={true,false} Do the work, but don''t actually execute
any xl
>> arguments. Default false.
>> + left: Number commands to execute. Default 10.
>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is
>> 8 cpus).
>> + verbose={true,false} Print what you''re doing. Default is
true.
>>
>> The script sometimes attempts to remove the last cpu from cpupool0; in
>> this case, libxl will print an error. If the script gets an error
>> under that condition, it will ignore it; under any other condition, it
>> will print diagnostic information.
>>
>> What finally crashed it for me was this command:
>> # ./cpupool-test.sh verbose=false left=1000
>
> Nice!
> With your script I finally managed to get the error, too. On my box (2
> sockets
> a 6 cores) I had to use
>
> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>
> to trigger it.
> Looking for more data now...
>
>
> Juergen
>
>>
>> -George
>>
>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>> Przywara<andre.przywara@amd.com> wrote:
>>> Juergen Gross wrote:
>>>>
>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>
>>>>> Andre Przywara wrote:
>>>>>>
>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>>
>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>
>>>>>>>> Andre, George,
>>>>>>>>
>>>>>>>>
>>>>>>>> What seems to be interesting: I think the
problem did always occur
>>>>>>>> when
>>>>>>>> a new cpupool was created and the first cpu was
moved to it.
>>>>>>>>
>>>>>>>> I think my previous assumption regarding the
master_ticker was not
>>>>>>>> too bad.
>>>>>>>> I think somehow the master_ticker of the new
cpupool is becoming
>>>>>>>> active
>>>>>>>> before the scheduler is really initialized
properly. This could
>>>>>>>> happen, if
>>>>>>>> enough time is spent between alloc_pdata for
the cpu to be moved
>>>>>>>> and
>>>>>>>> the
>>>>>>>> critical section in schedule_cpu_switch().
>>>>>>>>
>>>>>>>> The solution should be to activate the timers
only if the
>>>>>>>> scheduler is
>>>>>>>> ready for them.
>>>>>>>>
>>>>>>>> George, do you think the master_ticker should
be stopped in
>>>>>>>> suspend_ticker
>>>>>>>> as well? I still see potential problems for
entering deep C-States.
>>>>>>>> I think
>>>>>>>> I''ll prepare a patch which will keep
the master_ticker active
>>>>>>>> for the
>>>>>>>> C-State case and migrate it for the
schedule_cpu_switch() case.
>>>>>>>
>>>>>>> Okay, here is a patch for this. It ran on my 4-core
machine
>>>>>>> without any
>>>>>>> problems.
>>>>>>> Andre, could you give it a try?
>>>>>>
>>>>>> Did, but unfortunately it crashed as always. Tried
twice and made
>>>>>> sure
>>>>>> I booted the right kernel. Sorry.
>>>>>> The idea with the race between the timer and the state
changing
>>>>>> sounded very appealing, actually that was suspicious to
me from the
>>>>>> beginning.
>>>>>>
>>>>>> I will add some code to dump the state of all cpupools
to the BUG_ON
>>>>>> to see in which situation we are when the bug triggers.
>>>>>
>>>>> OK, here is a first try of this, the patch iterates over
all CPU pools
>>>>> and outputs some data if the BUG_ON
>>>>> ((sdom->weight * sdom->active_vcpu_count)>
weight_left) condition
>>>>> triggers:
>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask:
>>>>> fffffffc003f
>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask:
fc0
>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask:
1000
>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>> ....
>>>>> The masks look proper (6 cores per node), the bug triggers
when the
>>>>> first CPU is about to be(?) inserted.
>>>>
>>>> Sure? I''m missing the cpu with mask 2000.
>>>> I''ll try to reproduce the problem on a larger machine
here (24 cores, 4
>>>> numa
>>>> nodes).
>>>> Andre, can you give me your xen boot parameters? Which xen
changeset
>>>> are
>>>> you
>>>> running, and do you have any additional patches in use?
>>>
>>> The grub lines:
>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga
com1=115200
>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>
>>> All of my experiments are use c/s 22858 as a base.
>>> If you use a AMD Magny-Cours box for your experiments (socket C32
or
>>> G34),
>>> you should add the following patch (removing the line)
>>> --- a/xen/arch/x86/traps.c
>>> +++ b/xen/arch/x86/traps.c
>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs
*regs)
>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>> break;
>>> case 5: /* MONITOR/MWAIT */
>>>
>>> This is not necessary (in fact that reverts my patch c/s 22815),
but
>>> raises
>>> the probability to trigger the bug, probably because it increases
the
>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0,
>>> try to
>>> create a guest with many VCPUs and squeeze it into a small
CPU-pool.
>>>
>>> Good luck ;-)
>>> Andre.
>>>
>>> --
>>> Andre Przywara
>>> AMD-OSRC (Dresden)
>>> Tel: x29712
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>
>

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Feb-16 13:54 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Andre (and Juergen), can you try again with the attached patch?

What the patch basically does is try to make "cpu_disable_scheduler()"
do what it seems to say it does. :-)  Namely, the various
scheduler-related interrutps (both per-cpu ticks and the master tick)
is a part of the scheduler, so disable them before doing anything, and
don''t enable them until the cpu is really ready to go again.

To be precise:
* cpu_disable_scheduler() disables ticks
* scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
and does it after inserting the idle vcpu
* Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or
stop tickers
 + Call tick_{resume,suspend} in cpu_{up,down}, respectively
* Modify credit1''s tick_{suspend,resume} to handle the master ticker as
well.

With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being
on one pcpu), I can perform thousands of operations successfully.

(NB this is not ready for application yet, I just wanted to check to
see if it fixes Andre''s problem)

 -George

On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:> Okay, I have some more data.
>
> I activated cpupool_dprintk() and included checks in sched_credit.c to
> test for weight inconsistencies. To reduce race possibilities I''ve
added
> my patch to execute cpu assigning/unassigning always in a tasklet on the
> cpu to be moved.
>
> Here is the result:
>
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
> (XEN) cpupool_assign_cpu(pool=0,cpu=1)
> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
> (XEN) cpupool_assign_cpu(cpu=1) ret 0
> (XEN) cpupool_assign_cpu(pool=1,cpu=4)
> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
> (XEN) cpupool_assign_cpu(cpu=4) ret 0
> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
> (XEN) Xen BUG at sched_credit.c:570
> (XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    4
> (XEN) RIP:    e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f
> (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff830839d3ec30   rcx: 0000000000000000
> (XEN) rdx: ffff830839dcff18   rsi: 000000000000000a   rdi: ffff82c4802542e8
> (XEN) rbp: ffff830839dcfe38   rsp: ffff830839dcfde8   r8:  0000000000000004
> (XEN) r9:  ffff82c480213520   r10: 00000000fffffffc   r11: 0000000000000001
> (XEN) r12: 0000000000000004   r13: ffff830839d3ec40   r14: ffff831002ad5e40
> (XEN) r15: ffff830839d66f90   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 0000001020a98000   cr2: 00007fc5e9b79d98
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff830839dcfde8:
> (XEN)    ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246
ffff830839d6c000
> (XEN)    0000000000000000 ffff830839dd1100 0000000000000004
ffff82c480119651
> (XEN)    ffff831002b28018 ffff831002b28010 ffff830839dcfe68
ffff82c480126204
> (XEN)    0000000000000002 ffff83083ffa3bb8 ffff830839dd1100
000000cae439ea7e
> (XEN)    ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20
ffff830839dd1100
> (XEN)    ffff831002b28010 0000000000000004 0000000000000004
ffff82c4802b0880
> (XEN)    ffff830839dcff18 ffffffffffffffff ffff830839dcfef8
ffff82c480123647
> (XEN)    ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98
00007fc5e9fa5b20
> (XEN)    0000000000000002 00007fff46826f20 ffff830839dcff08
ffff82c4801236c2
> (XEN)    00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20
0000000000000002
> (XEN)    00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260
00007fff46826f50
> (XEN)    0000000000000246 0000000000000032 0000000000000000
00000000ffffffff
> (XEN)    0000000000000009 00007fc5e9d9de1a 0000000000000003
0000000000004848
> (XEN)    00007fc5e9b7a000 0000010000000000 ffffffff800073f0
000000000000e033
> (XEN)    0000000000000246 ffff880f97b51fc8 000000000000e02b
0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000004
> (XEN)    ffff830077eee000 00000043b9afd180 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c4801197d7>] csched_tick+0x186/0x37f
> (XEN)    [<ffff82c480126204>] execute_timer+0x4e/0x6c
> (XEN)    [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
> (XEN)    [<ffff82c480123647>] __do_softirq+0x88/0x99
> (XEN)    [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 4:
> (XEN) Xen BUG at sched_credit.c:570
> (XEN) ****************************************
>
> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON
> triggered in csched_acct() is a logical result of this.
>
> How this can happen I don''t know yet.
> Anyone any idea? I''ll keep searching...
>
>
> Juergen
>
> On 02/15/11 08:22, Juergen Gross wrote:
>>
>> On 02/14/11 18:57, George Dunlap wrote:
>>>
>>> The good news is, I''ve managed to reproduce this on my
local test
>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using
the
>>> attached script. It''s time to go home now, but I should be
able to
>>> dig something up tomorrow.
>>>
>>> To use the script:
>>> * Rename cpupool0 to "p0", and create an empty second
pool, "p1"
>>> * You can modify elements by adding "arg=val" as
arguments.
>>> * Arguments are:
>>> + dryrun={true,false} Do the work, but don''t actually
execute any xl
>>> arguments. Default false.
>>> + left: Number commands to execute. Default 10.
>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7
is
>>> 8 cpus).
>>> + verbose={true,false} Print what you''re doing. Default is
true.
>>>
>>> The script sometimes attempts to remove the last cpu from cpupool0;
in
>>> this case, libxl will print an error. If the script gets an error
>>> under that condition, it will ignore it; under any other condition,
it
>>> will print diagnostic information.
>>>
>>> What finally crashed it for me was this command:
>>> # ./cpupool-test.sh verbose=false left=1000
>>
>> Nice!
>> With your script I finally managed to get the error, too. On my box (2
>> sockets
>> a 6 cores) I had to use
>>
>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>>
>> to trigger it.
>> Looking for more data now...
>>
>>
>> Juergen
>>
>>>
>>> -George
>>>
>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>>> Przywara<andre.przywara@amd.com> wrote:
>>>>
>>>> Juergen Gross wrote:
>>>>>
>>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>>
>>>>>> Andre Przywara wrote:
>>>>>>>
>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>>>
>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>>
>>>>>>>>> Andre, George,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What seems to be interesting: I think the
problem did always occur
>>>>>>>>> when
>>>>>>>>> a new cpupool was created and the first cpu
was moved to it.
>>>>>>>>>
>>>>>>>>> I think my previous assumption regarding
the master_ticker was not
>>>>>>>>> too bad.
>>>>>>>>> I think somehow the master_ticker of the
new cpupool is becoming
>>>>>>>>> active
>>>>>>>>> before the scheduler is really initialized
properly. This could
>>>>>>>>> happen, if
>>>>>>>>> enough time is spent between alloc_pdata
for the cpu to be moved
>>>>>>>>> and
>>>>>>>>> the
>>>>>>>>> critical section in schedule_cpu_switch().
>>>>>>>>>
>>>>>>>>> The solution should be to activate the
timers only if the
>>>>>>>>> scheduler is
>>>>>>>>> ready for them.
>>>>>>>>>
>>>>>>>>> George, do you think the master_ticker
should be stopped in
>>>>>>>>> suspend_ticker
>>>>>>>>> as well? I still see potential problems for
entering deep C-States.
>>>>>>>>> I think
>>>>>>>>> I''ll prepare a patch which will
keep the master_ticker active
>>>>>>>>> for the
>>>>>>>>> C-State case and migrate it for the
schedule_cpu_switch() case.
>>>>>>>>
>>>>>>>> Okay, here is a patch for this. It ran on my
4-core machine
>>>>>>>> without any
>>>>>>>> problems.
>>>>>>>> Andre, could you give it a try?
>>>>>>>
>>>>>>> Did, but unfortunately it crashed as always. Tried
twice and made
>>>>>>> sure
>>>>>>> I booted the right kernel. Sorry.
>>>>>>> The idea with the race between the timer and the
state changing
>>>>>>> sounded very appealing, actually that was
suspicious to me from the
>>>>>>> beginning.
>>>>>>>
>>>>>>> I will add some code to dump the state of all
cpupools to the BUG_ON
>>>>>>> to see in which situation we are when the bug
triggers.
>>>>>>
>>>>>> OK, here is a first try of this, the patch iterates
over all CPU pools
>>>>>> and outputs some data if the BUG_ON
>>>>>> ((sdom->weight * sdom->active_vcpu_count)>
weight_left) condition
>>>>>> triggers:
>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler),
mask:
>>>>>> fffffffc003f
>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler),
mask: fc0
>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler),
mask: 1000
>>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>>> ....
>>>>>> The masks look proper (6 cores per node), the bug
triggers when the
>>>>>> first CPU is about to be(?) inserted.
>>>>>
>>>>> Sure? I''m missing the cpu with mask 2000.
>>>>> I''ll try to reproduce the problem on a larger
machine here (24 cores, 4
>>>>> numa
>>>>> nodes).
>>>>> Andre, can you give me your xen boot parameters? Which xen
changeset
>>>>> are
>>>>> you
>>>>> running, and do you have any additional patches in use?
>>>>
>>>> The grub lines:
>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga
com1=115200
>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>>
>>>> All of my experiments are use c/s 22858 as a base.
>>>> If you use a AMD Magny-Cours box for your experiments (socket
C32 or
>>>> G34),
>>>> you should add the following patch (removing the line)
>>>> --- a/xen/arch/x86/traps.c
>>>> +++ b/xen/arch/x86/traps.c
>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs
*regs)
>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>>> break;
>>>> case 5: /* MONITOR/MWAIT */
>>>>
>>>> This is not necessary (in fact that reverts my patch c/s
22815), but
>>>> raises
>>>> the probability to trigger the bug, probably because it
increases the
>>>> pressure of the Dom0 scheduler. If you cannot trigger it with
Dom0,
>>>> try to
>>>> create a guest with many VCPUs and squeeze it into a small
CPU-pool.
>>>>
>>>> Good luck ;-)
>>>> Andre.
>>>>
>>>> --
>>>> Andre Przywara
>>>> AMD-OSRC (Dresden)
>>>> Tel: x29712
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>
>>
>
>
> --
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions              e-mail:
> juergen.gross@ts.fujitsu.com
> Domagkstr. 28                           Internet: ts.fujitsu.com
> D-80807 Muenchen                 Company details:
> ts.fujitsu.com/imprint.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-16 14:11 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/16/11 14:54, George Dunlap wrote:> Andre (and Juergen), can you try again with the attached patch?
>
> What the patch basically does is try to make
"cpu_disable_scheduler()"
> do what it seems to say it does. :-)  Namely, the various
> scheduler-related interrutps (both per-cpu ticks and the master tick)
> is a part of the scheduler, so disable them before doing anything, and
> don''t enable them until the cpu is really ready to go again.
>
> To be precise:
> * cpu_disable_scheduler() disables ticks
> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
> and does it after inserting the idle vcpu
> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually
start or
> stop tickers
>   + Call tick_{resume,suspend} in cpu_{up,down}, respectively
I tried this before :-)
It didn''t work for Andre, but may be there were some bits missing.
> * Modify credit1''s tick_{suspend,resume} to handle the master
ticker as well.
>
> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus
being
> on one pcpu), I can perform thousands of operations successfully.
Nice. I''ll try later. In the moment I''m testing another patch
(attached
for review, if you like). I think I''ve identified two possible races.


Juergen
>
> (NB this is not ready for application yet, I just wanted to check to
> see if it fixes Andre''s problem)
>
>   -George
>
> On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com>  wrote:
>> Okay, I have some more data.
>>
>> I activated cpupool_dprintk() and included checks in sched_credit.c to
>> test for weight inconsistencies. To reduce race possibilities
I''ve added
>> my patch to execute cpu assigning/unassigning always in a tasklet on
the
>> cpu to be moved.
>>
>> Here is the result:
>>
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>> (XEN) cpupool_assign_cpu(pool=0,cpu=1)
>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
>> (XEN) cpupool_assign_cpu(cpu=1) ret 0
>> (XEN) cpupool_assign_cpu(pool=1,cpu=4)
>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
>> (XEN) cpupool_assign_cpu(cpu=4) ret 0
>> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
>> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
>> (XEN) Xen BUG at sched_credit.c:570
>> (XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
>> (XEN) CPU:    4
>> (XEN) RIP:    e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f
>> (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
>> (XEN) rax: 0000000000000000   rbx: ffff830839d3ec30   rcx:
0000000000000000
>> (XEN) rdx: ffff830839dcff18   rsi: 000000000000000a   rdi:
ffff82c4802542e8
>> (XEN) rbp: ffff830839dcfe38   rsp: ffff830839dcfde8   r8: 
0000000000000004
>> (XEN) r9:  ffff82c480213520   r10: 00000000fffffffc   r11:
0000000000000001
>> (XEN) r12: 0000000000000004   r13: ffff830839d3ec40   r14:
ffff831002ad5e40
>> (XEN) r15: ffff830839d66f90   cr0: 000000008005003b   cr4:
00000000000026f0
>> (XEN) cr3: 0000001020a98000   cr2: 00007fc5e9b79d98
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN) Xen stack trace from rsp=ffff830839dcfde8:
>> (XEN)    ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246
ffff830839d6c000
>> (XEN)    0000000000000000 ffff830839dd1100 0000000000000004
ffff82c480119651
>> (XEN)    ffff831002b28018 ffff831002b28010 ffff830839dcfe68
ffff82c480126204
>> (XEN)    0000000000000002 ffff83083ffa3bb8 ffff830839dd1100
000000cae439ea7e
>> (XEN)    ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20
ffff830839dd1100
>> (XEN)    ffff831002b28010 0000000000000004 0000000000000004
ffff82c4802b0880
>> (XEN)    ffff830839dcff18 ffffffffffffffff ffff830839dcfef8
ffff82c480123647
>> (XEN)    ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98
00007fc5e9fa5b20
>> (XEN)    0000000000000002 00007fff46826f20 ffff830839dcff08
ffff82c4801236c2
>> (XEN)    00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20
0000000000000002
>> (XEN)    00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260
00007fff46826f50
>> (XEN)    0000000000000246 0000000000000032 0000000000000000
00000000ffffffff
>> (XEN)    0000000000000009 00007fc5e9d9de1a 0000000000000003
0000000000004848
>> (XEN)    00007fc5e9b7a000 0000010000000000 ffffffff800073f0
000000000000e033
>> (XEN)    0000000000000246 ffff880f97b51fc8 000000000000e02b
0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000004
>> (XEN)    ffff830077eee000 00000043b9afd180 0000000000000000
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82c4801197d7>] csched_tick+0x186/0x37f
>> (XEN)    [<ffff82c480126204>] execute_timer+0x4e/0x6c
>> (XEN)    [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
>> (XEN)    [<ffff82c480123647>] __do_softirq+0x88/0x99
>> (XEN)    [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 4:
>> (XEN) Xen BUG at sched_credit.c:570
>> (XEN) ****************************************
>>
>> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The
BUG_ON
>> triggered in csched_acct() is a logical result of this.
>>
>> How this can happen I don''t know yet.
>> Anyone any idea? I''ll keep searching...
>>
>>
>> Juergen
>>
>> On 02/15/11 08:22, Juergen Gross wrote:
>>>
>>> On 02/14/11 18:57, George Dunlap wrote:
>>>>
>>>> The good news is, I''ve managed to reproduce this on my
local test
>>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core)
using the
>>>> attached script. It''s time to go home now, but I
should be able to
>>>> dig something up tomorrow.
>>>>
>>>> To use the script:
>>>> * Rename cpupool0 to "p0", and create an empty second
pool, "p1"
>>>> * You can modify elements by adding "arg=val" as
arguments.
>>>> * Arguments are:
>>>> + dryrun={true,false} Do the work, but don''t actually
execute any xl
>>>> arguments. Default false.
>>>> + left: Number commands to execute. Default 10.
>>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e.,
0-7 is
>>>> 8 cpus).
>>>> + verbose={true,false} Print what you''re doing.
Default is true.
>>>>
>>>> The script sometimes attempts to remove the last cpu from
cpupool0; in
>>>> this case, libxl will print an error. If the script gets an
error
>>>> under that condition, it will ignore it; under any other
condition, it
>>>> will print diagnostic information.
>>>>
>>>> What finally crashed it for me was this command:
>>>> # ./cpupool-test.sh verbose=false left=1000
>>>
>>> Nice!
>>> With your script I finally managed to get the error, too. On my box
(2
>>> sockets
>>> a 6 cores) I had to use
>>>
>>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>>>
>>> to trigger it.
>>> Looking for more data now...
>>>
>>>
>>> Juergen
>>>
>>>>
>>>> -George
>>>>
>>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>>>> Przywara<andre.przywara@amd.com>  wrote:
>>>>>
>>>>> Juergen Gross wrote:
>>>>>>
>>>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>>>
>>>>>>> Andre Przywara wrote:
>>>>>>>>
>>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>>>>
>>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>>>
>>>>>>>>>> Andre, George,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What seems to be interesting: I think
the problem did always occur
>>>>>>>>>> when
>>>>>>>>>> a new cpupool was created and the first
cpu was moved to it.
>>>>>>>>>>
>>>>>>>>>> I think my previous assumption
regarding the master_ticker was not
>>>>>>>>>> too bad.
>>>>>>>>>> I think somehow the master_ticker of
the new cpupool is becoming
>>>>>>>>>> active
>>>>>>>>>> before the scheduler is really
initialized properly. This could
>>>>>>>>>> happen, if
>>>>>>>>>> enough time is spent between
alloc_pdata for the cpu to be moved
>>>>>>>>>> and
>>>>>>>>>> the
>>>>>>>>>> critical section in
schedule_cpu_switch().
>>>>>>>>>>
>>>>>>>>>> The solution should be to activate the
timers only if the
>>>>>>>>>> scheduler is
>>>>>>>>>> ready for them.
>>>>>>>>>>
>>>>>>>>>> George, do you think the master_ticker
should be stopped in
>>>>>>>>>> suspend_ticker
>>>>>>>>>> as well? I still see potential problems
for entering deep C-States.
>>>>>>>>>> I think
>>>>>>>>>> I''ll prepare a patch which
will keep the master_ticker active
>>>>>>>>>> for the
>>>>>>>>>> C-State case and migrate it for the
schedule_cpu_switch() case.
>>>>>>>>>
>>>>>>>>> Okay, here is a patch for this. It ran on
my 4-core machine
>>>>>>>>> without any
>>>>>>>>> problems.
>>>>>>>>> Andre, could you give it a try?
>>>>>>>>
>>>>>>>> Did, but unfortunately it crashed as always.
Tried twice and made
>>>>>>>> sure
>>>>>>>> I booted the right kernel. Sorry.
>>>>>>>> The idea with the race between the timer and
the state changing
>>>>>>>> sounded very appealing, actually that was
suspicious to me from the
>>>>>>>> beginning.
>>>>>>>>
>>>>>>>> I will add some code to dump the state of all
cpupools to the BUG_ON
>>>>>>>> to see in which situation we are when the bug
triggers.
>>>>>>>
>>>>>>> OK, here is a first try of this, the patch iterates
over all CPU pools
>>>>>>> and outputs some data if the BUG_ON
>>>>>>> ((sdom->weight * sdom->active_vcpu_count)>
weight_left) condition
>>>>>>> triggers:
>>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit
Scheduler), mask:
>>>>>>> fffffffc003f
>>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit
Scheduler), mask: fc0
>>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit
Scheduler), mask: 1000
>>>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>>>> ....
>>>>>>> The masks look proper (6 cores per node), the bug
triggers when the
>>>>>>> first CPU is about to be(?) inserted.
>>>>>>
>>>>>> Sure? I''m missing the cpu with mask 2000.
>>>>>> I''ll try to reproduce the problem on a larger
machine here (24 cores, 4
>>>>>> numa
>>>>>> nodes).
>>>>>> Andre, can you give me your xen boot parameters? Which
xen changeset
>>>>>> are
>>>>>> you
>>>>>> running, and do you have any additional patches in use?
>>>>>
>>>>> The grub lines:
>>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga
com1=115200
>>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>>>
>>>>> All of my experiments are use c/s 22858 as a base.
>>>>> If you use a AMD Magny-Cours box for your experiments
(socket C32 or
>>>>> G34),
>>>>> you should add the following patch (removing the line)
>>>>> --- a/xen/arch/x86/traps.c
>>>>> +++ b/xen/arch/x86/traps.c
>>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct
cpu_user_regs *regs)
>>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>>>> break;
>>>>> case 5: /* MONITOR/MWAIT */
>>>>>
>>>>> This is not necessary (in fact that reverts my patch c/s
22815), but
>>>>> raises
>>>>> the probability to trigger the bug, probably because it
increases the
>>>>> pressure of the Dom0 scheduler. If you cannot trigger it
with Dom0,
>>>>> try to
>>>>> create a guest with many VCPUs and squeeze it into a small
CPU-pool.
>>>>>
>>>>> Good luck ;-)
>>>>> Andre.
>>>>>
>>>>> --
>>>>> Andre Przywara
>>>>> AMD-OSRC (Dresden)
>>>>> Tel: x29712
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>
>>
>> --
>> Juergen Gross                 Principal Developer Operating Systems
>> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222
2967
>> Fujitsu Technology Solutions              e-mail:
>> juergen.gross@ts.fujitsu.com
>> Domagkstr. 28                           Internet: ts.fujitsu.com
>> D-80807 Muenchen                 Company details:
>> ts.fujitsu.com/imprint.html
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-16 14:28 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/16/11 15:11, Juergen Gross wrote:> On 02/16/11 14:54, George Dunlap wrote:
>> Andre (and Juergen), can you try again with the attached patch?
>>
>> What the patch basically does is try to make
"cpu_disable_scheduler()"
>> do what it seems to say it does. :-) Namely, the various
>> scheduler-related interrutps (both per-cpu ticks and the master tick)
>> is a part of the scheduler, so disable them before doing anything, and
>> don''t enable them until the cpu is really ready to go again.
>>
>> To be precise:
>> * cpu_disable_scheduler() disables ticks
>> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
>> and does it after inserting the idle vcpu
>> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually
start or
>> stop tickers
>> + Call tick_{resume,suspend} in cpu_{up,down}, respectively
>
> I tried this before :-)
> It didn''t work for Andre, but may be there were some bits missing.
>
>> * Modify credit1''s tick_{suspend,resume} to handle the master
ticker
>> as well.
>>
>> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus
being
>> on one pcpu), I can perform thousands of operations successfully.
>
> Nice. I''ll try later. In the moment I''m testing another
patch (attached
> for review, if you like). I think I''ve identified two possible
races.
My patch works for me. I think I have to rework the locking for credit1, but
that shouldn''t be too hard.

My machine survived 10000 iterations of your script with additional
consistency checks in the scheduler. Without my patch the machine crashed
after less then 500 iterations.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

André Przywara

2011-Feb-17 00:05 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Am 16.02.2011 15:11, schrieb Juergen Gross:> On 02/16/11 14:54, George Dunlap wrote:
>> Andre (and Juergen), can you try again with the attached patch?George, Juergen, thanks for all your work on this!
I will try the patch as soon as I am back in the office today afternoon.

Regards,
Andre.
>>
>> What the patch basically does is try to make
"cpu_disable_scheduler()"
>> do what it seems to say it does. :-)  Namely, the various
>> scheduler-related interrutps (both per-cpu ticks and the master tick)
>> is a part of the scheduler, so disable them before doing anything, and
>> don''t enable them until the cpu is really ready to go again.
>>
>> To be precise:
>> * cpu_disable_scheduler() disables ticks
>> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
>> and does it after inserting the idle vcpu
>> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually
start or
>> stop tickers
>>    + Call tick_{resume,suspend} in cpu_{up,down}, respectively
>
> I tried this before :-)
> It didn''t work for Andre, but may be there were some bits missing.
>
>> * Modify credit1''s tick_{suspend,resume} to handle the master
ticker as well.
>>
>> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus
being
>> on one pcpu), I can perform thousands of operations successfully.
>
> Nice. I''ll try later. In the moment I''m testing another
patch (attached
> for review, if you like). I think I''ve identified two possible
races.
>
>
> Juergen
>
>>
>> (NB this is not ready for application yet, I just wanted to check to
>> see if it fixes Andre''s problem)
>>
>>    -George
>>
>> On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross
>> <juergen.gross@ts.fujitsu.com>   wrote:
>>> Okay, I have some more data.
>>>
>>> I activated cpupool_dprintk() and included checks in sched_credit.c
to
>>> test for weight inconsistencies. To reduce race possibilities
I''ve added
>>> my patch to execute cpu assigning/unassigning always in a tasklet
on the
>>> cpu to be moved.
>>>
>>> Here is the result:
>>>
>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>>> (XEN) cpupool_assign_cpu(pool=0,cpu=1)
>>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
>>> (XEN) cpupool_assign_cpu(cpu=1) ret 0
>>> (XEN) cpupool_assign_cpu(pool=1,cpu=4)
>>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
>>> (XEN) cpupool_assign_cpu(cpu=4) ret 0
>>> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
>>> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
>>> (XEN) Xen BUG at sched_credit.c:570
>>> (XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
>>> (XEN) CPU:    4
>>> (XEN) RIP:    e008:[<ffff82c4801197d7>]
csched_tick+0x186/0x37f
>>> (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
>>> (XEN) rax: 0000000000000000   rbx: ffff830839d3ec30   rcx:
0000000000000000
>>> (XEN) rdx: ffff830839dcff18   rsi: 000000000000000a   rdi:
ffff82c4802542e8
>>> (XEN) rbp: ffff830839dcfe38   rsp: ffff830839dcfde8   r8: 
0000000000000004
>>> (XEN) r9:  ffff82c480213520   r10: 00000000fffffffc   r11:
0000000000000001
>>> (XEN) r12: 0000000000000004   r13: ffff830839d3ec40   r14:
ffff831002ad5e40
>>> (XEN) r15: ffff830839d66f90   cr0: 000000008005003b   cr4:
00000000000026f0
>>> (XEN) cr3: 0000001020a98000   cr2: 00007fc5e9b79d98
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs:
e008
>>> (XEN) Xen stack trace from rsp=ffff830839dcfde8:
>>> (XEN)    ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246
ffff830839d6c000
>>> (XEN)    0000000000000000 ffff830839dd1100 0000000000000004
ffff82c480119651
>>> (XEN)    ffff831002b28018 ffff831002b28010 ffff830839dcfe68
ffff82c480126204
>>> (XEN)    0000000000000002 ffff83083ffa3bb8 ffff830839dd1100
000000cae439ea7e
>>> (XEN)    ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20
ffff830839dd1100
>>> (XEN)    ffff831002b28010 0000000000000004 0000000000000004
ffff82c4802b0880
>>> (XEN)    ffff830839dcff18 ffffffffffffffff ffff830839dcfef8
ffff82c480123647
>>> (XEN)    ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98
00007fc5e9fa5b20
>>> (XEN)    0000000000000002 00007fff46826f20 ffff830839dcff08
ffff82c4801236c2
>>> (XEN)    00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20
0000000000000002
>>> (XEN)    00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260
00007fff46826f50
>>> (XEN)    0000000000000246 0000000000000032 0000000000000000
00000000ffffffff
>>> (XEN)    0000000000000009 00007fc5e9d9de1a 0000000000000003
0000000000004848
>>> (XEN)    00007fc5e9b7a000 0000010000000000 ffffffff800073f0
000000000000e033
>>> (XEN)    0000000000000246 ffff880f97b51fc8 000000000000e02b
0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000004
>>> (XEN)    ffff830077eee000 00000043b9afd180 0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82c4801197d7>] csched_tick+0x186/0x37f
>>> (XEN)    [<ffff82c480126204>] execute_timer+0x4e/0x6c
>>> (XEN)    [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
>>> (XEN)    [<ffff82c480123647>] __do_softirq+0x88/0x99
>>> (XEN)    [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 4:
>>> (XEN) Xen BUG at sched_credit.c:570
>>> (XEN) ****************************************
>>>
>>> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu.
The BUG_ON
>>> triggered in csched_acct() is a logical result of this.
>>>
>>> How this can happen I don''t know yet.
>>> Anyone any idea? I''ll keep searching...
>>>
>>>
>>> Juergen
>>>
>>> On 02/15/11 08:22, Juergen Gross wrote:
>>>>
>>>> On 02/14/11 18:57, George Dunlap wrote:
>>>>>
>>>>> The good news is, I''ve managed to reproduce this
on my local test
>>>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core)
using the
>>>>> attached script. It''s time to go home now, but I
should be able to
>>>>> dig something up tomorrow.
>>>>>
>>>>> To use the script:
>>>>> * Rename cpupool0 to "p0", and create an empty
second pool, "p1"
>>>>> * You can modify elements by adding "arg=val" as
arguments.
>>>>> * Arguments are:
>>>>> + dryrun={true,false} Do the work, but don''t
actually execute any xl
>>>>> arguments. Default false.
>>>>> + left: Number commands to execute. Default 10.
>>>>> + maxcpus: highest numerical value for a cpu. Default 7
(i.e., 0-7 is
>>>>> 8 cpus).
>>>>> + verbose={true,false} Print what you''re doing.
Default is true.
>>>>>
>>>>> The script sometimes attempts to remove the last cpu from
cpupool0; in
>>>>> this case, libxl will print an error. If the script gets an
error
>>>>> under that condition, it will ignore it; under any other
condition, it
>>>>> will print diagnostic information.
>>>>>
>>>>> What finally crashed it for me was this command:
>>>>> # ./cpupool-test.sh verbose=false left=1000
>>>>
>>>> Nice!
>>>> With your script I finally managed to get the error, too. On my
box (2
>>>> sockets
>>>> a 6 cores) I had to use
>>>>
>>>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>>>>
>>>> to trigger it.
>>>> Looking for more data now...
>>>>
>>>>
>>>> Juergen
>>>>
>>>>>
>>>>> -George
>>>>>
>>>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>>>>> Przywara<andre.przywara@amd.com>   wrote:
>>>>>>
>>>>>> Juergen Gross wrote:
>>>>>>>
>>>>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>>>>
>>>>>>>> Andre Przywara wrote:
>>>>>>>>>
>>>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross
wrote:
>>>>>>>>>>
>>>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>>>>
>>>>>>>>>>> Andre, George,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What seems to be interesting: I
think the problem did always occur
>>>>>>>>>>> when
>>>>>>>>>>> a new cpupool was created and the
first cpu was moved to it.
>>>>>>>>>>>
>>>>>>>>>>> I think my previous assumption
regarding the master_ticker was not
>>>>>>>>>>> too bad.
>>>>>>>>>>> I think somehow the master_ticker
of the new cpupool is becoming
>>>>>>>>>>> active
>>>>>>>>>>> before the scheduler is really
initialized properly. This could
>>>>>>>>>>> happen, if
>>>>>>>>>>> enough time is spent between
alloc_pdata for the cpu to be moved
>>>>>>>>>>> and
>>>>>>>>>>> the
>>>>>>>>>>> critical section in
schedule_cpu_switch().
>>>>>>>>>>>
>>>>>>>>>>> The solution should be to activate
the timers only if the
>>>>>>>>>>> scheduler is
>>>>>>>>>>> ready for them.
>>>>>>>>>>>
>>>>>>>>>>> George, do you think the
master_ticker should be stopped in
>>>>>>>>>>> suspend_ticker
>>>>>>>>>>> as well? I still see potential
problems for entering deep C-States.
>>>>>>>>>>> I think
>>>>>>>>>>> I''ll prepare a patch which
will keep the master_ticker active
>>>>>>>>>>> for the
>>>>>>>>>>> C-State case and migrate it for the
schedule_cpu_switch() case.
>>>>>>>>>>
>>>>>>>>>> Okay, here is a patch for this. It ran
on my 4-core machine
>>>>>>>>>> without any
>>>>>>>>>> problems.
>>>>>>>>>> Andre, could you give it a try?
>>>>>>>>>
>>>>>>>>> Did, but unfortunately it crashed as
always. Tried twice and made
>>>>>>>>> sure
>>>>>>>>> I booted the right kernel. Sorry.
>>>>>>>>> The idea with the race between the timer
and the state changing
>>>>>>>>> sounded very appealing, actually that was
suspicious to me from the
>>>>>>>>> beginning.
>>>>>>>>>
>>>>>>>>> I will add some code to dump the state of
all cpupools to the BUG_ON
>>>>>>>>> to see in which situation we are when the
bug triggers.
>>>>>>>>
>>>>>>>> OK, here is a first try of this, the patch
iterates over all CPU pools
>>>>>>>> and outputs some data if the BUG_ON
>>>>>>>> ((sdom->weight *
sdom->active_vcpu_count)>   weight_left) condition
>>>>>>>> triggers:
>>>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit
Scheduler), mask:
>>>>>>>> fffffffc003f
>>>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit
Scheduler), mask: fc0
>>>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit
Scheduler), mask: 1000
>>>>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>>>>> ....
>>>>>>>> The masks look proper (6 cores per node), the
bug triggers when the
>>>>>>>> first CPU is about to be(?) inserted.
>>>>>>>
>>>>>>> Sure? I''m missing the cpu with mask 2000.
>>>>>>> I''ll try to reproduce the problem on a
larger machine here (24 cores, 4
>>>>>>> numa
>>>>>>> nodes).
>>>>>>> Andre, can you give me your xen boot parameters?
Which xen changeset
>>>>>>> are
>>>>>>> you
>>>>>>> running, and do you have any additional patches in
use?
>>>>>>
>>>>>> The grub lines:
>>>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz
console=com1,vga com1=115200
>>>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops
console=tty0
>>>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>>>>
>>>>>> All of my experiments are use c/s 22858 as a base.
>>>>>> If you use a AMD Magny-Cours box for your experiments
(socket C32 or
>>>>>> G34),
>>>>>> you should add the following patch (removing the line)
>>>>>> --- a/xen/arch/x86/traps.c
>>>>>> +++ b/xen/arch/x86/traps.c
>>>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct
cpu_user_regs *regs)
>>>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>>>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>>>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>>>>> break;
>>>>>> case 5: /* MONITOR/MWAIT */
>>>>>>
>>>>>> This is not necessary (in fact that reverts my patch
c/s 22815), but
>>>>>> raises
>>>>>> the probability to trigger the bug, probably because it
increases the
>>>>>> pressure of the Dom0 scheduler. If you cannot trigger
it with Dom0,
>>>>>> try to
>>>>>> create a guest with many VCPUs and squeeze it into a
small CPU-pool.
>>>>>>
>>>>>> Good luck ;-)
>>>>>> Andre.
>>>>>>
>>>>>> --
>>>>>> Andre Przywara
>>>>>> AMD-OSRC (Dresden)
>>>>>> Tel: x29712
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>>
>>>>
>>>
>>>
>>> --
>>> Juergen Gross                 Principal Developer Operating Systems
>>> TSP ES&S SWE OS6                       Telephone: +49 (0) 89
3222 2967
>>> Fujitsu Technology Solutions              e-mail:
>>> juergen.gross@ts.fujitsu.com
>>> Domagkstr. 28                           Internet: ts.fujitsu.com
>>> D-80807 Muenchen                 Company details:
>>> ts.fujitsu.com/imprint.html
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>
>
> --
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions              e-mail:
juergen.gross@ts.fujitsu.com
> Domagkstr. 28                           Internet: ts.fujitsu.com
> D-80807 Muenchen                 Company details:
ts.fujitsu.com/imprint.html

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-17 07:05 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/16/11 14:54, George Dunlap wrote:> Andre (and Juergen), can you try again with the attached patch?
>
> What the patch basically does is try to make
"cpu_disable_scheduler()"
> do what it seems to say it does. :-)  Namely, the various
> scheduler-related interrutps (both per-cpu ticks and the master tick)
> is a part of the scheduler, so disable them before doing anything, and
> don''t enable them until the cpu is really ready to go again.
>
> To be precise:
> * cpu_disable_scheduler() disables ticks
> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
> and does it after inserting the idle vcpu
> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually
start or
> stop tickers
>   + Call tick_{resume,suspend} in cpu_{up,down}, respectively
> * Modify credit1''s tick_{suspend,resume} to handle the master
ticker as well.
>
> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus
being
> on one pcpu), I can perform thousands of operations successfully.
>
> (NB this is not ready for application yet, I just wanted to check to
> see if it fixes Andre''s problem)
After some thousand iterations the machine hang and after dumping Dom0
registers to console it continued running and crashed about a second later:

(XEN) cpupool_unassign_cpu(pool=0,cpu=9)
(XEN) cpupool_unassign_cpu(pool=0,cpu=9) ffff83083fff74c0
(XEN) cpupool_unassign_cpu ret=0
(XEN) cpupool_unassign_cpu(pool=0,cpu=4)
(XEN) cpupool_unassign_cpu(pool=0,cpu=4) ffff83083fff74c0
(XEN) cpupool_unassign_cpu ret=0
(XEN) cpupool_assign_cpu(pool=1,cpu=9)
(XEN) cpupool_assign_cpu(pool=1,cpu=9) ffff83083002de40
(XEN) Assertion ''timer->status >= TIMER_STATUS_inactive''
failed at timer.c:279
(XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    9
(XEN) RIP:    e008:[<ffff82c480126100>] active_timer+0xc/0x37
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) rdx: ffff830839d8ff18   rsi: 0000010dbb628a80   rdi: ffff83083ffbcf98
(XEN) rbp: ffff830839d8fd50   rsp: ffff830839d8fd50   r8:  ffff83083ffbcf90
(XEN) r9:  ffff82c480213680   r10: 00000000ffffffff   r11: 0000000000000010
(XEN) r12: ffff82c4802d3f80   r13: ffff82c4802d3f80   r14: ffff83083ffbcf98
(XEN) r15: ffff83083ffbcfc0   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000007809c000   cr2: 0000000000620048
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff830839d8fd50:
(XEN)    ffff830839d8fda0 ffff82c480126ef9 0000000000000000 0000010dbb628a80
(XEN)    0000000000000086 0000000000000009 ffff83083002de40 ffff83083002dd50
(XEN)    0000000000000009 0000000000000009 ffff830839d8fdc0 ffff82c480117906
(XEN)    ffff83083ffa3b40 ffff83083ffa5d70 ffff830839d8fe30 ffff82c4801214fa
(XEN)    ffff83083002dd00 0000000900000100 0000000000000286 ffff8300780da000
(XEN)    ffff83083ffbcf80 ffff83083ffbcf90 ffff82c480247e00 0000000000000009
(XEN)    00000000fffffff0 ffff83083002dd00 0000000000000000 ffff8300781cc198
(XEN)    ffff830839d8fe60 ffff82c4801019ff 0000000000000009 0000000000000009
(XEN)    ffff8300781cc198 ffff830839d990d0 ffff830839d8fe80 ffff82c480101bd9
(XEN)    ffff83107e80c5b0 ffff8300781cc000 ffff830839d8fea0 ffff82c480104f21
(XEN)    0000000000000009 ffff830839d990e0 ffff830839d8fee0 ffff82c480125b6c
(XEN)    ffff82c48024a020 ffff830839d8ff18 ffff82c48024a020 ffff830839d8ff18
(XEN)    ffff830839d99060 ffff830839d99040 ffff830839d8ff10 ffff82c48015645a
(XEN)    0000000000000000 ffff8300780da000 ffff8300780da000 ffffffffffffffff
(XEN)    ffff830839d8fe00 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffffffff8062bda0 ffff880fbb1e5fd8 0000000000000246
(XEN)    0000000000000000 000000010003347d 0000000000000000 0000000000000000
(XEN)    ffffffff800033aa 00000000deadbeef 00000000deadbeef 00000000deadbeef
(XEN)    0000010000000000 ffffffff800033aa 000000000000e033 0000000000000246
(XEN)    ffff880fbb1e5f08 000000000000e02b 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c480126100>] active_timer+0xc/0x37
(XEN)    [<ffff82c480126ef9>] set_timer+0x102/0x218
(XEN)    [<ffff82c480117906>] csched_tick_resume+0x53/0x75
(XEN)    [<ffff82c4801214fa>] schedule_cpu_switch+0x1f1/0x25c
(XEN)    [<ffff82c4801019ff>] cpupool_assign_cpu_locked+0x61/0xd6
(XEN)    [<ffff82c480101bd9>] cpupool_assign_cpu_helper+0x9f/0xcd
(XEN)    [<ffff82c480104f21>] continue_hypercall_tasklet_handler+0x51/0xc3
(XEN)    [<ffff82c480125b6c>] do_tasklet+0xe1/0x155
(XEN)    [<ffff82c48015645a>] idle_loop+0x5f/0x67
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 9:
(XEN) Assertion ''timer->status >= TIMER_STATUS_inactive''
failed at timer.c:279
(XEN) ****************************************


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-17 09:11 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/17/11 08:05, Juergen Gross wrote:> On 02/16/11 14:54, George Dunlap wrote:
>> Andre (and Juergen), can you try again with the attached patch?
>>
>> What the patch basically does is try to make
"cpu_disable_scheduler()"
>> do what it seems to say it does. :-) Namely, the various
>> scheduler-related interrutps (both per-cpu ticks and the master tick)
>> is a part of the scheduler, so disable them before doing anything, and
>> don''t enable them until the cpu is really ready to go again.
>>
>> To be precise:
>> * cpu_disable_scheduler() disables ticks
>> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
>> and does it after inserting the idle vcpu
>> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually
start or
>> stop tickers
>> + Call tick_{resume,suspend} in cpu_{up,down}, respectively
>> * Modify credit1''s tick_{suspend,resume} to handle the master
ticker
>> as well.
>>
>> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus
being
>> on one pcpu), I can perform thousands of operations successfully.
>>
>> (NB this is not ready for application yet, I just wanted to check to
>> see if it fixes Andre''s problem)
Tried again, this time with the following patch:

diff -r 72470de157ce xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Wed Feb 16 09:49:33 2011 +0000
+++ b/xen/common/sched_credit.c Wed Feb 16 15:09:54 2011 +0100
@@ -1268,7 +1268,8 @@ csched_load_balance(struct csched_privat
          /*
           * Any work over there to steal?
           */
-        speer = csched_runq_steal(peer_cpu, cpu, snext->pri);
+        speer = cpu_isset(peer_cpu, *online) ?
+            csched_runq_steal(peer_cpu, cpu, snext->pri) : NULL;
          pcpu_schedule_unlock(peer_cpu);
          if ( speer != NULL )
          {


Worked without any flaw for 30000 iterations.


Juergen
>
> After some thousand iterations the machine hang and after dumping Dom0
> registers to console it continued running and crashed about a second later:
>
> (XEN) cpupool_unassign_cpu(pool=0,cpu=9)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=9) ffff83083fff74c0
> (XEN) cpupool_unassign_cpu ret=0
> (XEN) cpupool_unassign_cpu(pool=0,cpu=4)
> (XEN) cpupool_unassign_cpu(pool=0,cpu=4) ffff83083fff74c0
> (XEN) cpupool_unassign_cpu ret=0
> (XEN) cpupool_assign_cpu(pool=1,cpu=9)
> (XEN) cpupool_assign_cpu(pool=1,cpu=9) ffff83083002de40
> (XEN) Assertion ''timer->status >=
TIMER_STATUS_inactive'' failed at
> timer.c:279
> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]----
> (XEN) CPU: 9
> (XEN) RIP: e008:[<ffff82c480126100>] active_timer+0xc/0x37
> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor
> (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000
> (XEN) rdx: ffff830839d8ff18 rsi: 0000010dbb628a80 rdi: ffff83083ffbcf98
> (XEN) rbp: ffff830839d8fd50 rsp: ffff830839d8fd50 r8: ffff83083ffbcf90
> (XEN) r9: ffff82c480213680 r10: 00000000ffffffff r11: 0000000000000010
> (XEN) r12: ffff82c4802d3f80 r13: ffff82c4802d3f80 r14: ffff83083ffbcf98
> (XEN) r15: ffff83083ffbcfc0 cr0: 000000008005003b cr4: 00000000000026f0
> (XEN) cr3: 000000007809c000 cr2: 0000000000620048
> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff830839d8fd50:
> (XEN) ffff830839d8fda0 ffff82c480126ef9 0000000000000000 0000010dbb628a80
> (XEN) 0000000000000086 0000000000000009 ffff83083002de40 ffff83083002dd50
> (XEN) 0000000000000009 0000000000000009 ffff830839d8fdc0 ffff82c480117906
> (XEN) ffff83083ffa3b40 ffff83083ffa5d70 ffff830839d8fe30 ffff82c4801214fa
> (XEN) ffff83083002dd00 0000000900000100 0000000000000286 ffff8300780da000
> (XEN) ffff83083ffbcf80 ffff83083ffbcf90 ffff82c480247e00 0000000000000009
> (XEN) 00000000fffffff0 ffff83083002dd00 0000000000000000 ffff8300781cc198
> (XEN) ffff830839d8fe60 ffff82c4801019ff 0000000000000009 0000000000000009
> (XEN) ffff8300781cc198 ffff830839d990d0 ffff830839d8fe80 ffff82c480101bd9
> (XEN) ffff83107e80c5b0 ffff8300781cc000 ffff830839d8fea0 ffff82c480104f21
> (XEN) 0000000000000009 ffff830839d990e0 ffff830839d8fee0 ffff82c480125b6c
> (XEN) ffff82c48024a020 ffff830839d8ff18 ffff82c48024a020 ffff830839d8ff18
> (XEN) ffff830839d99060 ffff830839d99040 ffff830839d8ff10 ffff82c48015645a
> (XEN) 0000000000000000 ffff8300780da000 ffff8300780da000 ffffffffffffffff
> (XEN) ffff830839d8fe00 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffffffff8062bda0 ffff880fbb1e5fd8 0000000000000246
> (XEN) 0000000000000000 000000010003347d 0000000000000000 0000000000000000
> (XEN) ffffffff800033aa 00000000deadbeef 00000000deadbeef 00000000deadbeef
> (XEN) 0000010000000000 ffffffff800033aa 000000000000e033 0000000000000246
> (XEN) ffff880fbb1e5f08 000000000000e02b 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN) [<ffff82c480126100>] active_timer+0xc/0x37
> (XEN) [<ffff82c480126ef9>] set_timer+0x102/0x218
> (XEN) [<ffff82c480117906>] csched_tick_resume+0x53/0x75
> (XEN) [<ffff82c4801214fa>] schedule_cpu_switch+0x1f1/0x25c
> (XEN) [<ffff82c4801019ff>] cpupool_assign_cpu_locked+0x61/0xd6
> (XEN) [<ffff82c480101bd9>] cpupool_assign_cpu_helper+0x9f/0xcd
> (XEN) [<ffff82c480104f21>]
continue_hypercall_tasklet_handler+0x51/0xc3
> (XEN) [<ffff82c480125b6c>] do_tasklet+0xe1/0x155
> (XEN) [<ffff82c48015645a>] idle_loop+0x5f/0x67
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 9:
> (XEN) Assertion ''timer->status >=
TIMER_STATUS_inactive'' failed at
> timer.c:279
> (XEN) ****************************************
>
>
> Juergen
>

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-21 10:00 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

George Dunlap wrote:> Andre (and Juergen), can you try again with the attached patch?
I applied this patch on top of 22931 and it did _not_ work.
The crash occurred almost immediately after I started my script, so the 
same behaviour as without the patch.
(attached my script for reference, though it will most likely only make 
sense on bigger NUMA machines)

Regards,
Andre.

> What the patch basically does is try to make
"cpu_disable_scheduler()"
> do what it seems to say it does. :-)  Namely, the various
> scheduler-related interrutps (both per-cpu ticks and the master tick)
> is a part of the scheduler, so disable them before doing anything, and
> don''t enable them until the cpu is really ready to go again.
> 
> To be precise:
> * cpu_disable_scheduler() disables ticks
> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
> and does it after inserting the idle vcpu
> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually
start or
> stop tickers
>  + Call tick_{resume,suspend} in cpu_{up,down}, respectively
> * Modify credit1''s tick_{suspend,resume} to handle the master
ticker as well.
> 
> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus
being
> on one pcpu), I can perform thousands of operations successfully.
> 
> (NB this is not ready for application yet, I just wanted to check to
> see if it fixes Andre''s problem)
> 
>  -George
> 
> On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com> wrote:
>> Okay, I have some more data.
>>
>> I activated cpupool_dprintk() and included checks in sched_credit.c to
>> test for weight inconsistencies. To reduce race possibilities
I''ve added
>> my patch to execute cpu assigning/unassigning always in a tasklet on
the
>> cpu to be moved.
>>
>> Here is the result:
>>
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>> (XEN) cpupool_assign_cpu(pool=0,cpu=1)
>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
>> (XEN) cpupool_assign_cpu(cpu=1) ret 0
>> (XEN) cpupool_assign_cpu(pool=1,cpu=4)
>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
>> (XEN) cpupool_assign_cpu(cpu=4) ret 0
>> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
>> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
>> (XEN) Xen BUG at sched_credit.c:570
>> (XEN) ----[ Xen-4.1.0-rc5-pre  x86_64  debug=y  Tainted:    C ]----
>> (XEN) CPU:    4
>> (XEN) RIP:    e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f
>> (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
>> (XEN) rax: 0000000000000000   rbx: ffff830839d3ec30   rcx:
0000000000000000
>> (XEN) rdx: ffff830839dcff18   rsi: 000000000000000a   rdi:
ffff82c4802542e8
>> (XEN) rbp: ffff830839dcfe38   rsp: ffff830839dcfde8   r8: 
0000000000000004
>> (XEN) r9:  ffff82c480213520   r10: 00000000fffffffc   r11:
0000000000000001
>> (XEN) r12: 0000000000000004   r13: ffff830839d3ec40   r14:
ffff831002ad5e40
>> (XEN) r15: ffff830839d66f90   cr0: 000000008005003b   cr4:
00000000000026f0
>> (XEN) cr3: 0000001020a98000   cr2: 00007fc5e9b79d98
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN) Xen stack trace from rsp=ffff830839dcfde8:
>> (XEN)    ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246
ffff830839d6c000
>> (XEN)    0000000000000000 ffff830839dd1100 0000000000000004
ffff82c480119651
>> (XEN)    ffff831002b28018 ffff831002b28010 ffff830839dcfe68
ffff82c480126204
>> (XEN)    0000000000000002 ffff83083ffa3bb8 ffff830839dd1100
000000cae439ea7e
>> (XEN)    ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20
ffff830839dd1100
>> (XEN)    ffff831002b28010 0000000000000004 0000000000000004
ffff82c4802b0880
>> (XEN)    ffff830839dcff18 ffffffffffffffff ffff830839dcfef8
ffff82c480123647
>> (XEN)    ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98
00007fc5e9fa5b20
>> (XEN)    0000000000000002 00007fff46826f20 ffff830839dcff08
ffff82c4801236c2
>> (XEN)    00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20
0000000000000002
>> (XEN)    00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260
00007fff46826f50
>> (XEN)    0000000000000246 0000000000000032 0000000000000000
00000000ffffffff
>> (XEN)    0000000000000009 00007fc5e9d9de1a 0000000000000003
0000000000004848
>> (XEN)    00007fc5e9b7a000 0000010000000000 ffffffff800073f0
000000000000e033
>> (XEN)    0000000000000246 ffff880f97b51fc8 000000000000e02b
0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000
0000000000000004
>> (XEN)    ffff830077eee000 00000043b9afd180 0000000000000000
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82c4801197d7>] csched_tick+0x186/0x37f
>> (XEN)    [<ffff82c480126204>] execute_timer+0x4e/0x6c
>> (XEN)    [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
>> (XEN)    [<ffff82c480123647>] __do_softirq+0x88/0x99
>> (XEN)    [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 4:
>> (XEN) Xen BUG at sched_credit.c:570
>> (XEN) ****************************************
>>
>> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The
BUG_ON
>> triggered in csched_acct() is a logical result of this.
>>
>> How this can happen I don''t know yet.
>> Anyone any idea? I''ll keep searching...
>>
>>
>> Juergen
>>
>> On 02/15/11 08:22, Juergen Gross wrote:
>>> On 02/14/11 18:57, George Dunlap wrote:
>>>> The good news is, I''ve managed to reproduce this on my
local test
>>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core)
using the
>>>> attached script. It''s time to go home now, but I
should be able to
>>>> dig something up tomorrow.
>>>>
>>>> To use the script:
>>>> * Rename cpupool0 to "p0", and create an empty second
pool, "p1"
>>>> * You can modify elements by adding "arg=val" as
arguments.
>>>> * Arguments are:
>>>> + dryrun={true,false} Do the work, but don''t actually
execute any xl
>>>> arguments. Default false.
>>>> + left: Number commands to execute. Default 10.
>>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e.,
0-7 is
>>>> 8 cpus).
>>>> + verbose={true,false} Print what you''re doing.
Default is true.
>>>>
>>>> The script sometimes attempts to remove the last cpu from
cpupool0; in
>>>> this case, libxl will print an error. If the script gets an
error
>>>> under that condition, it will ignore it; under any other
condition, it
>>>> will print diagnostic information.
>>>>
>>>> What finally crashed it for me was this command:
>>>> # ./cpupool-test.sh verbose=false left=1000
>>> Nice!
>>> With your script I finally managed to get the error, too. On my box
(2
>>> sockets
>>> a 6 cores) I had to use
>>>
>>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>>>
>>> to trigger it.
>>> Looking for more data now...
>>>
>>>
>>> Juergen
>>>
>>>> -George
>>>>
>>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>>>> Przywara<andre.przywara@amd.com> wrote:
>>>>> Juergen Gross wrote:
>>>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>>> Andre Przywara wrote:
>>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote:
>>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>>> Andre, George,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What seems to be interesting: I think
the problem did always occur
>>>>>>>>>> when
>>>>>>>>>> a new cpupool was created and the first
cpu was moved to it.
>>>>>>>>>>
>>>>>>>>>> I think my previous assumption
regarding the master_ticker was not
>>>>>>>>>> too bad.
>>>>>>>>>> I think somehow the master_ticker of
the new cpupool is becoming
>>>>>>>>>> active
>>>>>>>>>> before the scheduler is really
initialized properly. This could
>>>>>>>>>> happen, if
>>>>>>>>>> enough time is spent between
alloc_pdata for the cpu to be moved
>>>>>>>>>> and
>>>>>>>>>> the
>>>>>>>>>> critical section in
schedule_cpu_switch().
>>>>>>>>>>
>>>>>>>>>> The solution should be to activate the
timers only if the
>>>>>>>>>> scheduler is
>>>>>>>>>> ready for them.
>>>>>>>>>>
>>>>>>>>>> George, do you think the master_ticker
should be stopped in
>>>>>>>>>> suspend_ticker
>>>>>>>>>> as well? I still see potential problems
for entering deep C-States.
>>>>>>>>>> I think
>>>>>>>>>> I''ll prepare a patch which
will keep the master_ticker active
>>>>>>>>>> for the
>>>>>>>>>> C-State case and migrate it for the
schedule_cpu_switch() case.
>>>>>>>>> Okay, here is a patch for this. It ran on
my 4-core machine
>>>>>>>>> without any
>>>>>>>>> problems.
>>>>>>>>> Andre, could you give it a try?
>>>>>>>> Did, but unfortunately it crashed as always.
Tried twice and made
>>>>>>>> sure
>>>>>>>> I booted the right kernel. Sorry.
>>>>>>>> The idea with the race between the timer and
the state changing
>>>>>>>> sounded very appealing, actually that was
suspicious to me from the
>>>>>>>> beginning.
>>>>>>>>
>>>>>>>> I will add some code to dump the state of all
cpupools to the BUG_ON
>>>>>>>> to see in which situation we are when the bug
triggers.
>>>>>>> OK, here is a first try of this, the patch iterates
over all CPU pools
>>>>>>> and outputs some data if the BUG_ON
>>>>>>> ((sdom->weight * sdom->active_vcpu_count)>
weight_left) condition
>>>>>>> triggers:
>>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit
Scheduler), mask:
>>>>>>> fffffffc003f
>>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit
Scheduler), mask: fc0
>>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit
Scheduler), mask: 1000
>>>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>>>> ....
>>>>>>> The masks look proper (6 cores per node), the bug
triggers when the
>>>>>>> first CPU is about to be(?) inserted.
>>>>>> Sure? I''m missing the cpu with mask 2000.
>>>>>> I''ll try to reproduce the problem on a larger
machine here (24 cores, 4
>>>>>> numa
>>>>>> nodes).
>>>>>> Andre, can you give me your xen boot parameters? Which
xen changeset
>>>>>> are
>>>>>> you
>>>>>> running, and do you have any additional patches in use?
>>>>> The grub lines:
>>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga
com1=115200
>>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0
>>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>>>
>>>>> All of my experiments are use c/s 22858 as a base.
>>>>> If you use a AMD Magny-Cours box for your experiments
(socket C32 or
>>>>> G34),
>>>>> you should add the following patch (removing the line)
>>>>> --- a/xen/arch/x86/traps.c
>>>>> +++ b/xen/arch/x86/traps.c
>>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct
cpu_user_regs *regs)
>>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>>>> break;
>>>>> case 5: /* MONITOR/MWAIT */
>>>>>
>>>>> This is not necessary (in fact that reverts my patch c/s
22815), but
>>>>> raises
>>>>> the probability to trigger the bug, probably because it
increases the
>>>>> pressure of the Dom0 scheduler. If you cannot trigger it
with Dom0,
>>>>> try to
>>>>> create a guest with many VCPUs and squeeze it into a small
CPU-pool.
>>>>>
>>>>> Good luck ;-)
>>>>> Andre.
>>>>>
>>>>> --
>>>>> Andre Przywara
>>>>> AMD-OSRC (Dresden)
>>>>> Tel: x29712
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>
>>
>> --
>> Juergen Gross                 Principal Developer Operating Systems
>> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222
2967
>> Fujitsu Technology Solutions              e-mail:
>> juergen.gross@ts.fujitsu.com
>> Domagkstr. 28                           Internet: ts.fujitsu.com
>> D-80807 Muenchen                 Company details:
>> ts.fujitsu.com/imprint.html
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-21 13:19 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/21/11 11:00, Andre Przywara wrote:> George Dunlap wrote:
>> Andre (and Juergen), can you try again with the attached patch?
>
> I applied this patch on top of 22931 and it did _not_ work.
> The crash occurred almost immediately after I started my script, so the
> same behaviour as without the patch.
Did you try my patch addressing races in the scheduler when moving cpus
between cpupools?
I''ve attached it again. For me it works quite well, while
George''s patch
seems not to be enough (machine hanging after some tests with cpupools).
OTOH I can''t reproduce an error as fast as you even without any patch
:-)
> (attached my script for reference, though it will most likely only make
> sense on bigger NUMA machines)
Yeah, on my 2-node system I need several hundred tries to get an error.
But it seems to be more effective than George''s script.


Juergen
>
> Regards,
> Andre.
>
>
>> What the patch basically does is try to make
"cpu_disable_scheduler()"
>> do what it seems to say it does. :-) Namely, the various
>> scheduler-related interrutps (both per-cpu ticks and the master tick)
>> is a part of the scheduler, so disable them before doing anything, and
>> don''t enable them until the cpu is really ready to go again.
>>
>> To be precise:
>> * cpu_disable_scheduler() disables ticks
>> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool,
>> and does it after inserting the idle vcpu
>> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually
start or
>> stop tickers
>> + Call tick_{resume,suspend} in cpu_{up,down}, respectively
>> * Modify credit1''s tick_{suspend,resume} to handle the master
ticker
>> as well.
>>
>> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus
being
>> on one pcpu), I can perform thousands of operations successfully.
>>
>> (NB this is not ready for application yet, I just wanted to check to
>> see if it fixes Andre''s problem)
>>
>> -George
>>
>> On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross
>> <juergen.gross@ts.fujitsu.com> wrote:
>>> Okay, I have some more data.
>>>
>>> I activated cpupool_dprintk() and included checks in sched_credit.c
to
>>> test for weight inconsistencies. To reduce race possibilities
I''ve added
>>> my patch to execute cpu assigning/unassigning always in a tasklet
on the
>>> cpu to be moved.
>>>
>>> Here is the result:
>>>
>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>>> (XEN) cpupool_assign_cpu(pool=0,cpu=1)
>>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
>>> (XEN) cpupool_assign_cpu(cpu=1) ret 0
>>> (XEN) cpupool_assign_cpu(pool=1,cpu=4)
>>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
>>> (XEN) cpupool_assign_cpu(cpu=4) ret 0
>>> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
>>> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
>>> (XEN) Xen BUG at sched_credit.c:570
>>> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]----
>>> (XEN) CPU: 4
>>> (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f
>>> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor
>>> (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx:
0000000000000000
>>> (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi:
ffff82c4802542e8
>>> (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8:
0000000000000004
>>> (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11:
0000000000000001
>>> (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14:
ffff831002ad5e40
>>> (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4:
00000000000026f0
>>> (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98
>>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
>>> (XEN) Xen stack trace from rsp=ffff830839dcfde8:
>>> (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246
>>> ffff830839d6c000
>>> (XEN) 0000000000000000 ffff830839dd1100 0000000000000004
>>> ffff82c480119651
>>> (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68
>>> ffff82c480126204
>>> (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100
>>> 000000cae439ea7e
>>> (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20
>>> ffff830839dd1100
>>> (XEN) ffff831002b28010 0000000000000004 0000000000000004
>>> ffff82c4802b0880
>>> (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8
>>> ffff82c480123647
>>> (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98
>>> 00007fc5e9fa5b20
>>> (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08
>>> ffff82c4801236c2
>>> (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20
>>> 0000000000000002
>>> (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260
>>> 00007fff46826f50
>>> (XEN) 0000000000000246 0000000000000032 0000000000000000
>>> 00000000ffffffff
>>> (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003
>>> 0000000000004848
>>> (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0
>>> 000000000000e033
>>> (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b
>>> 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000004
>>> (XEN) ffff830077eee000 00000043b9afd180 0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f
>>> (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c
>>> (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239
>>> (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99
>>> (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 4:
>>> (XEN) Xen BUG at sched_credit.c:570
>>> (XEN) ****************************************
>>>
>>> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu.
The
>>> BUG_ON
>>> triggered in csched_acct() is a logical result of this.
>>>
>>> How this can happen I don''t know yet.
>>> Anyone any idea? I''ll keep searching...
>>>
>>>
>>> Juergen
>>>
>>> On 02/15/11 08:22, Juergen Gross wrote:
>>>> On 02/14/11 18:57, George Dunlap wrote:
>>>>> The good news is, I''ve managed to reproduce this
on my local test
>>>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core)
using the
>>>>> attached script. It''s time to go home now, but I
should be able to
>>>>> dig something up tomorrow.
>>>>>
>>>>> To use the script:
>>>>> * Rename cpupool0 to "p0", and create an empty
second pool, "p1"
>>>>> * You can modify elements by adding "arg=val" as
arguments.
>>>>> * Arguments are:
>>>>> + dryrun={true,false} Do the work, but don''t
actually execute any xl
>>>>> arguments. Default false.
>>>>> + left: Number commands to execute. Default 10.
>>>>> + maxcpus: highest numerical value for a cpu. Default 7
(i.e., 0-7 is
>>>>> 8 cpus).
>>>>> + verbose={true,false} Print what you''re doing.
Default is true.
>>>>>
>>>>> The script sometimes attempts to remove the last cpu from
cpupool0; in
>>>>> this case, libxl will print an error. If the script gets an
error
>>>>> under that condition, it will ignore it; under any other
condition, it
>>>>> will print diagnostic information.
>>>>>
>>>>> What finally crashed it for me was this command:
>>>>> # ./cpupool-test.sh verbose=false left=1000
>>>> Nice!
>>>> With your script I finally managed to get the error, too. On my
box (2
>>>> sockets
>>>> a 6 cores) I had to use
>>>>
>>>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>>>>
>>>> to trigger it.
>>>> Looking for more data now...
>>>>
>>>>
>>>> Juergen
>>>>
>>>>> -George
>>>>>
>>>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>>>>> Przywara<andre.przywara@amd.com> wrote:
>>>>>> Juergen Gross wrote:
>>>>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>>>> Andre Przywara wrote:
>>>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross
wrote:
>>>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote:
>>>>>>>>>>> Andre, George,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What seems to be interesting: I
think the problem did always
>>>>>>>>>>> occur
>>>>>>>>>>> when
>>>>>>>>>>> a new cpupool was created and the
first cpu was moved to it.
>>>>>>>>>>>
>>>>>>>>>>> I think my previous assumption
regarding the master_ticker
>>>>>>>>>>> was not
>>>>>>>>>>> too bad.
>>>>>>>>>>> I think somehow the master_ticker
of the new cpupool is becoming
>>>>>>>>>>> active
>>>>>>>>>>> before the scheduler is really
initialized properly. This could
>>>>>>>>>>> happen, if
>>>>>>>>>>> enough time is spent between
alloc_pdata for the cpu to be moved
>>>>>>>>>>> and
>>>>>>>>>>> the
>>>>>>>>>>> critical section in
schedule_cpu_switch().
>>>>>>>>>>>
>>>>>>>>>>> The solution should be to activate
the timers only if the
>>>>>>>>>>> scheduler is
>>>>>>>>>>> ready for them.
>>>>>>>>>>>
>>>>>>>>>>> George, do you think the
master_ticker should be stopped in
>>>>>>>>>>> suspend_ticker
>>>>>>>>>>> as well? I still see potential
problems for entering deep
>>>>>>>>>>> C-States.
>>>>>>>>>>> I think
>>>>>>>>>>> I''ll prepare a patch which
will keep the master_ticker active
>>>>>>>>>>> for the
>>>>>>>>>>> C-State case and migrate it for the
schedule_cpu_switch() case.
>>>>>>>>>> Okay, here is a patch for this. It ran
on my 4-core machine
>>>>>>>>>> without any
>>>>>>>>>> problems.
>>>>>>>>>> Andre, could you give it a try?
>>>>>>>>> Did, but unfortunately it crashed as
always. Tried twice and made
>>>>>>>>> sure
>>>>>>>>> I booted the right kernel. Sorry.
>>>>>>>>> The idea with the race between the timer
and the state changing
>>>>>>>>> sounded very appealing, actually that was
suspicious to me from
>>>>>>>>> the
>>>>>>>>> beginning.
>>>>>>>>>
>>>>>>>>> I will add some code to dump the state of
all cpupools to the
>>>>>>>>> BUG_ON
>>>>>>>>> to see in which situation we are when the
bug triggers.
>>>>>>>> OK, here is a first try of this, the patch
iterates over all CPU
>>>>>>>> pools
>>>>>>>> and outputs some data if the BUG_ON
>>>>>>>> ((sdom->weight *
sdom->active_vcpu_count)> weight_left) condition
>>>>>>>> triggers:
>>>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit
Scheduler), mask:
>>>>>>>> fffffffc003f
>>>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit
Scheduler), mask: fc0
>>>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit
Scheduler), mask: 1000
>>>>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>>>>> ....
>>>>>>>> The masks look proper (6 cores per node), the
bug triggers when the
>>>>>>>> first CPU is about to be(?) inserted.
>>>>>>> Sure? I''m missing the cpu with mask 2000.
>>>>>>> I''ll try to reproduce the problem on a
larger machine here (24
>>>>>>> cores, 4
>>>>>>> numa
>>>>>>> nodes).
>>>>>>> Andre, can you give me your xen boot parameters?
Which xen changeset
>>>>>>> are
>>>>>>> you
>>>>>>> running, and do you have any additional patches in
use?
>>>>>> The grub lines:
>>>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz
console=com1,vga
>>>>>> com1=115200
>>>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops
console=tty0
>>>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>>>>
>>>>>> All of my experiments are use c/s 22858 as a base.
>>>>>> If you use a AMD Magny-Cours box for your experiments
(socket C32 or
>>>>>> G34),
>>>>>> you should add the following patch (removing the line)
>>>>>> --- a/xen/arch/x86/traps.c
>>>>>> +++ b/xen/arch/x86/traps.c
>>>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct
cpu_user_regs *regs)
>>>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>>>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>>>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>>>>> break;
>>>>>> case 5: /* MONITOR/MWAIT */
>>>>>>
>>>>>> This is not necessary (in fact that reverts my patch
c/s 22815), but
>>>>>> raises
>>>>>> the probability to trigger the bug, probably because it
increases the
>>>>>> pressure of the Dom0 scheduler. If you cannot trigger
it with Dom0,
>>>>>> try to
>>>>>> create a guest with many VCPUs and squeeze it into a
small CPU-pool.
>>>>>>
>>>>>> Good luck ;-)
>>>>>> Andre.
>>>>>>
>>>>>> --
>>>>>> Andre Przywara
>>>>>> AMD-OSRC (Dresden)
>>>>>> Tel: x29712
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>>
>>>
>>> --
>>> Juergen Gross Principal Developer Operating Systems
>>> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967
>>> Fujitsu Technology Solutions e-mail:
>>> juergen.gross@ts.fujitsu.com
>>> Domagkstr. 28 Internet: ts.fujitsu.com
>>> D-80807 Muenchen Company details:
>>> ts.fujitsu.com/imprint.html
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andre Przywara

2011-Feb-21 14:45 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen Gross wrote:> On 02/21/11 11:00, Andre Przywara wrote:
>> George Dunlap wrote:
>>> Andre (and Juergen), can you try again with the attached patch?
>> I applied this patch on top of 22931 and it did _not_ work.
>> The crash occurred almost immediately after I started my script, so the
>> same behaviour as without the patch.
> 
> Did you try my patch addressing races in the scheduler when moving cpus
> between cpupools?Sorry, I tried yours first, but it didn''t apply cleanly on my
particular
tree (sched_jg_fix ;-). So I tested George''s first.
> I''ve attached it again. For me it works quite well, while
George''s patch
> seems not to be enough (machine hanging after some tests with cpupools).OK, it now applied after a rebase.
And yes, I didn''t see a crash! At least until the script stopped while 
at lot of these messages appeared:
(XEN) do_IRQ: 0.89 No irq handler for vector (irq -1)

That is what I reported before and is most probably totally unrelated to 
this issue.
So I consider this fix working!
I will try to match my recent theories and debug results with your patch 
to see whether this fits.
> OTOH I can''t reproduce an error as fast as you even without any
patch :-)
> 
>> (attached my script for reference, though it will most likely only make
>> sense on bigger NUMA machines)
> 
> Yeah, on my 2-node system I need several hundred tries to get an error.
> But it seems to be more effective than George''s script.I consider the large over-provisioning the reason. With Dom0 having 48 
VCPUs finally squashed together to 6 pCPUs, my script triggered at the 
second run the latest.
With your patch it made 24 iterations before the other bug kicked in.

Thanks very much!
Andre.
> 
> 
> Juergen
> 
>> Regards,
>> Andre.
>>
>>
>>> What the patch basically does is try to make
"cpu_disable_scheduler()"
>>> do what it seems to say it does. :-) Namely, the various
>>> scheduler-related interrutps (both per-cpu ticks and the master
tick)
>>> is a part of the scheduler, so disable them before doing anything,
and
>>> don''t enable them until the cpu is really ready to go
again.
>>>
>>> To be precise:
>>> * cpu_disable_scheduler() disables ticks
>>> * scheduler_cpu_switch() only enables ticks if adding a cpu to a
pool,
>>> and does it after inserting the idle vcpu
>>> * Modify semantics, s.t., {alloc,free}_pdata() don''t
actually start or
>>> stop tickers
>>> + Call tick_{resume,suspend} in cpu_{up,down}, respectively
>>> * Modify credit1''s tick_{suspend,resume} to handle the
master ticker
>>> as well.
>>>
>>> With this patch (if dom0 doesn''t get wedged due to all 8
vcpus being
>>> on one pcpu), I can perform thousands of operations successfully.
>>>
>>> (NB this is not ready for application yet, I just wanted to check
to
>>> see if it fixes Andre''s problem)
>>>
>>> -George
>>>
>>> On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross
>>> <juergen.gross@ts.fujitsu.com> wrote:
>>>> Okay, I have some more data.
>>>>
>>>> I activated cpupool_dprintk() and included checks in
sched_credit.c to
>>>> test for weight inconsistencies. To reduce race possibilities
I''ve added
>>>> my patch to execute cpu assigning/unassigning always in a
tasklet on the
>>>> cpu to be moved.
>>>>
>>>> Here is the result:
>>>>
>>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6)
>>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16
>>>> (XEN) cpupool_assign_cpu(pool=0,cpu=1)
>>>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0
>>>> (XEN) cpupool_assign_cpu(cpu=1) ret 0
>>>> (XEN) cpupool_assign_cpu(pool=1,cpu=4)
>>>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40
>>>> (XEN) cpupool_assign_cpu(cpu=4) ret 0
>>>> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0:
>>>> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1
>>>> (XEN) Xen BUG at sched_credit.c:570
>>>> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]----
>>>> (XEN) CPU: 4
>>>> (XEN) RIP: e008:[<ffff82c4801197d7>]
csched_tick+0x186/0x37f
>>>> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor
>>>> (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx:
0000000000000000
>>>> (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi:
ffff82c4802542e8
>>>> (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8:
0000000000000004
>>>> (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11:
0000000000000001
>>>> (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14:
ffff831002ad5e40
>>>> (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4:
00000000000026f0
>>>> (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98
>>>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff830839dcfde8:
>>>> (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246
>>>> ffff830839d6c000
>>>> (XEN) 0000000000000000 ffff830839dd1100 0000000000000004
>>>> ffff82c480119651
>>>> (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68
>>>> ffff82c480126204
>>>> (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100
>>>> 000000cae439ea7e
>>>> (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20
>>>> ffff830839dd1100
>>>> (XEN) ffff831002b28010 0000000000000004 0000000000000004
>>>> ffff82c4802b0880
>>>> (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8
>>>> ffff82c480123647
>>>> (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98
>>>> 00007fc5e9fa5b20
>>>> (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08
>>>> ffff82c4801236c2
>>>> (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20
>>>> 0000000000000002
>>>> (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260
>>>> 00007fff46826f50
>>>> (XEN) 0000000000000246 0000000000000032 0000000000000000
>>>> 00000000ffffffff
>>>> (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003
>>>> 0000000000004848
>>>> (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0
>>>> 000000000000e033
>>>> (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b
>>>> 0000000000000000
>>>> (XEN) 0000000000000000 0000000000000000 0000000000000000
>>>> 0000000000000004
>>>> (XEN) ffff830077eee000 00000043b9afd180 0000000000000000
>>>> (XEN) Xen call trace:
>>>> (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f
>>>> (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c
>>>> (XEN) [<ffff82c480126539>]
timer_softirq_action+0xf6/0x239
>>>> (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99
>>>> (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a
>>>> (XEN)
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 4:
>>>> (XEN) Xen BUG at sched_credit.c:570
>>>> (XEN) ****************************************
>>>>
>>>> As you can see, a Dom0 vcpus is becoming active on a pool 1
cpu. The
>>>> BUG_ON
>>>> triggered in csched_acct() is a logical result of this.
>>>>
>>>> How this can happen I don''t know yet.
>>>> Anyone any idea? I''ll keep searching...
>>>>
>>>>
>>>> Juergen
>>>>
>>>> On 02/15/11 08:22, Juergen Gross wrote:
>>>>> On 02/14/11 18:57, George Dunlap wrote:
>>>>>> The good news is, I''ve managed to reproduce
this on my local test
>>>>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per
core) using the
>>>>>> attached script. It''s time to go home now, but
I should be able to
>>>>>> dig something up tomorrow.
>>>>>>
>>>>>> To use the script:
>>>>>> * Rename cpupool0 to "p0", and create an
empty second pool, "p1"
>>>>>> * You can modify elements by adding "arg=val"
as arguments.
>>>>>> * Arguments are:
>>>>>> + dryrun={true,false} Do the work, but don''t
actually execute any xl
>>>>>> arguments. Default false.
>>>>>> + left: Number commands to execute. Default 10.
>>>>>> + maxcpus: highest numerical value for a cpu. Default 7
(i.e., 0-7 is
>>>>>> 8 cpus).
>>>>>> + verbose={true,false} Print what you''re
doing. Default is true.
>>>>>>
>>>>>> The script sometimes attempts to remove the last cpu
from cpupool0; in
>>>>>> this case, libxl will print an error. If the script
gets an error
>>>>>> under that condition, it will ignore it; under any
other condition, it
>>>>>> will print diagnostic information.
>>>>>>
>>>>>> What finally crashed it for me was this command:
>>>>>> # ./cpupool-test.sh verbose=false left=1000
>>>>> Nice!
>>>>> With your script I finally managed to get the error, too.
On my box (2
>>>>> sockets
>>>>> a 6 cores) I had to use
>>>>>
>>>>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11
>>>>>
>>>>> to trigger it.
>>>>> Looking for more data now...
>>>>>
>>>>>
>>>>> Juergen
>>>>>
>>>>>> -George
>>>>>>
>>>>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre
>>>>>> Przywara<andre.przywara@amd.com> wrote:
>>>>>>> Juergen Gross wrote:
>>>>>>>> On 02/10/11 15:18, Andre Przywara wrote:
>>>>>>>>> Andre Przywara wrote:
>>>>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross
wrote:
>>>>>>>>>>> On 02/09/11 15:21, Juergen Gross
wrote:
>>>>>>>>>>>> Andre, George,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> What seems to be interesting: I
think the problem did always
>>>>>>>>>>>> occur
>>>>>>>>>>>> when
>>>>>>>>>>>> a new cpupool was created and
the first cpu was moved to it.
>>>>>>>>>>>>
>>>>>>>>>>>> I think my previous assumption
regarding the master_ticker
>>>>>>>>>>>> was not
>>>>>>>>>>>> too bad.
>>>>>>>>>>>> I think somehow the
master_ticker of the new cpupool is becoming
>>>>>>>>>>>> active
>>>>>>>>>>>> before the scheduler is really
initialized properly. This could
>>>>>>>>>>>> happen, if
>>>>>>>>>>>> enough time is spent between
alloc_pdata for the cpu to be moved
>>>>>>>>>>>> and
>>>>>>>>>>>> the
>>>>>>>>>>>> critical section in
schedule_cpu_switch().
>>>>>>>>>>>>
>>>>>>>>>>>> The solution should be to
activate the timers only if the
>>>>>>>>>>>> scheduler is
>>>>>>>>>>>> ready for them.
>>>>>>>>>>>>
>>>>>>>>>>>> George, do you think the
master_ticker should be stopped in
>>>>>>>>>>>> suspend_ticker
>>>>>>>>>>>> as well? I still see potential
problems for entering deep
>>>>>>>>>>>> C-States.
>>>>>>>>>>>> I think
>>>>>>>>>>>> I''ll prepare a patch
which will keep the master_ticker active
>>>>>>>>>>>> for the
>>>>>>>>>>>> C-State case and migrate it for
the schedule_cpu_switch() case.
>>>>>>>>>>> Okay, here is a patch for this. It
ran on my 4-core machine
>>>>>>>>>>> without any
>>>>>>>>>>> problems.
>>>>>>>>>>> Andre, could you give it a try?
>>>>>>>>>> Did, but unfortunately it crashed as
always. Tried twice and made
>>>>>>>>>> sure
>>>>>>>>>> I booted the right kernel. Sorry.
>>>>>>>>>> The idea with the race between the
timer and the state changing
>>>>>>>>>> sounded very appealing, actually that
was suspicious to me from
>>>>>>>>>> the
>>>>>>>>>> beginning.
>>>>>>>>>>
>>>>>>>>>> I will add some code to dump the state
of all cpupools to the
>>>>>>>>>> BUG_ON
>>>>>>>>>> to see in which situation we are when
the bug triggers.
>>>>>>>>> OK, here is a first try of this, the patch
iterates over all CPU
>>>>>>>>> pools
>>>>>>>>> and outputs some data if the BUG_ON
>>>>>>>>> ((sdom->weight *
sdom->active_vcpu_count)> weight_left) condition
>>>>>>>>> triggers:
>>>>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit
Scheduler), mask:
>>>>>>>>> fffffffc003f
>>>>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit
Scheduler), mask: fc0
>>>>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit
Scheduler), mask: 1000
>>>>>>>>> (XEN) Xen BUG at sched_credit.c:1010
>>>>>>>>> ....
>>>>>>>>> The masks look proper (6 cores per node),
the bug triggers when the
>>>>>>>>> first CPU is about to be(?) inserted.
>>>>>>>> Sure? I''m missing the cpu with mask
2000.
>>>>>>>> I''ll try to reproduce the problem on a
larger machine here (24
>>>>>>>> cores, 4
>>>>>>>> numa
>>>>>>>> nodes).
>>>>>>>> Andre, can you give me your xen boot
parameters? Which xen changeset
>>>>>>>> are
>>>>>>>> you
>>>>>>>> running, and do you have any additional patches
in use?
>>>>>>> The grub lines:
>>>>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz
console=com1,vga
>>>>>>> com1=115200
>>>>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops
console=tty0
>>>>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0
>>>>>>>
>>>>>>> All of my experiments are use c/s 22858 as a base.
>>>>>>> If you use a AMD Magny-Cours box for your
experiments (socket C32 or
>>>>>>> G34),
>>>>>>> you should add the following patch (removing the
line)
>>>>>>> --- a/xen/arch/x86/traps.c
>>>>>>> +++ b/xen/arch/x86/traps.c
>>>>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct
cpu_user_regs *regs)
>>>>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c);
>>>>>>> __clear_bit(X86_FEATURE_WDT % 32,&c);
>>>>>>> __clear_bit(X86_FEATURE_LWP % 32,&c);
>>>>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c);
>>>>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c);
>>>>>>> break;
>>>>>>> case 5: /* MONITOR/MWAIT */
>>>>>>>
>>>>>>> This is not necessary (in fact that reverts my
patch c/s 22815), but
>>>>>>> raises
>>>>>>> the probability to trigger the bug, probably
because it increases the
>>>>>>> pressure of the Dom0 scheduler. If you cannot
trigger it with Dom0,
>>>>>>> try to
>>>>>>> create a guest with many VCPUs and squeeze it into
a small CPU-pool.
>>>>>>>
>>>>>>> Good luck ;-)
>>>>>>> Andre.
>>>>>>>
>>>>>>> --
>>>>>>> Andre Przywara
>>>>>>> AMD-OSRC (Dresden)
>>>>>>> Tel: x29712
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xensource.com
>>>>>>> http://lists.xensource.com/xen-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xensource.com
>>>>>>> http://lists.xensource.com/xen-devel
>>>> --
>>>> Juergen Gross Principal Developer Operating Systems
>>>> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967
>>>> Fujitsu Technology Solutions e-mail:
>>>> juergen.gross@ts.fujitsu.com
>>>> Domagkstr. 28 Internet: ts.fujitsu.com
>>>> D-80807 Muenchen Company details:
>>>> ts.fujitsu.com/imprint.html
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> 
> 
> --
> Juergen Gross                 Principal Developer Operating Systems
> TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
> Fujitsu Technology Solutions              e-mail:
juergen.gross@ts.fujitsu.com
> Domagkstr. 28                           Internet: ts.fujitsu.com
> D-80807 Muenchen                 Company details:
ts.fujitsu.com/imprint.html
> 

-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2011-Feb-21 14:50 UTC

head link

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

On 02/21/11 15:45, Andre Przywara wrote:> Juergen Gross wrote:
>> On 02/21/11 11:00, Andre Przywara wrote:
>>> George Dunlap wrote:
>>>> Andre (and Juergen), can you try again with the attached patch?
>>> I applied this patch on top of 22931 and it did _not_ work.
>>> The crash occurred almost immediately after I started my script, so
the
>>> same behaviour as without the patch.
>>
>> Did you try my patch addressing races in the scheduler when moving cpus
>> between cpupools?
> Sorry, I tried yours first, but it didn''t apply cleanly on my
particular
> tree (sched_jg_fix ;-). So I tested George''s first.
>
>> I''ve attached it again. For me it works quite well, while
George''s patch
>> seems not to be enough (machine hanging after some tests with
cpupools).
> OK, it now applied after a rebase.
> And yes, I didn''t see a crash! At least until the script stopped
while
> at lot of these messages appeared:
> (XEN) do_IRQ: 0.89 No irq handler for vector (irq -1)
>
> That is what I reported before and is most probably totally unrelated to
> this issue.
> So I consider this fix working!
> I will try to match my recent theories and debug results with your patch
> to see whether this fits.
>
>> OTOH I can''t reproduce an error as fast as you even without
any patch :-)
>>
>>> (attached my script for reference, though it will most likely only
make
>>> sense on bigger NUMA machines)
>>
>> Yeah, on my 2-node system I need several hundred tries to get an error.
>> But it seems to be more effective than George''s script.
> I consider the large over-provisioning the reason. With Dom0 having 48
> VCPUs finally squashed together to 6 pCPUs, my script triggered at the
> second run the latest.
> With your patch it made 24 iterations before the other bug kicked in.
Okay, I''ll prepare an official patch. Might last some days, as
I''m not in the
office until Thursday.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jan 2011 - Hypervisor crash(!) on xl cpupool-numa-split

[Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split