Andre Przywara
2011-Jan-27 23:18 UTC
[Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Hi, when I boot my machine without restricting Dom0 (dom0_mem= dom0_max_vcpus=) I get an _hypervisor_ crash when I run # xl cpupool-numa-split If Dom0''s resources are limited on the Xen cmdline, everything works fine. The crashdump points to a scheduling problem with weights, so I assume the NUMA distribution algorithm some fools the hypervisor completely. I will investigate this further tomorrow, but maybe someone has some good idea. Regards, Andre. root@dosorca:/data/images# xl cpupool-numa-split (XEN) Xen BUG at sched_credit.c:990 (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419 (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100 (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010 (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100 (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8 (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff82c480297d80: (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff ffff830437ffa5e0 (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0 0000000000000282 (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00 0000000000000000 (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c480117fd9 (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40 ffff82c480125f34 (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80 000000afb6f8667f (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20 ffff82c4802d3f80 (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000 ffff82c4802b0880 (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123327 (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20 ffff82c480297f18 (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801233a2 (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000 ffff8300c7cd6000 (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48 0000000000000000 (XEN) 0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8503f10 (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80 ffff880000000001 (XEN) 0000000000000000 0000000000000000 ffffffff810093aa 000000aafab2f86e (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8 000000000000e02b (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) [<ffff82c4801180f8>] csched_acct+0x11f/0x419 (XEN) [<ffff82c480125f34>] execute_timer+0x4e/0x6c (XEN) [<ffff82c480126259>] timer_softirq_action+0xf2/0x245 (XEN) [<ffff82c480123327>] __do_softirq+0x88/0x99 (XEN) [<ffff82c4801233a2>] do_softirq+0x6a/0x7a (XEN) [<ffff82c4801563f5>] idle_loop+0x6a/0x6f (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at sched_credit.c:990 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Jan-28 06:47 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 01/28/11 00:18, Andre Przywara wrote:> Hi, > > when I boot my machine without restricting Dom0 (dom0_mem> dom0_max_vcpus=) I get an _hypervisor_ crash when I run > # xl cpupool-numa-split > If Dom0''s resources are limited on the Xen cmdline, everything works fine. > The crashdump points to a scheduling problem with weights, so I assume > the NUMA distribution algorithm some fools the hypervisor completely. > > I will investigate this further tomorrow, but maybe someone has some > good idea.I''ve seen this once with an older cpupool version on a 24 processor machine. It was NOT related to NUMA, but did occur only on reboot after a Dom0 panic. The machine had an init script creating a cpupool and populating it with cpus. The machine was in a panic loop due to the BUG in sched_acct then until it was resetted manually. After the reset the problem was gone. As I was never able to reproduce the problem later (the same software is running on dozens of machines!), I assumed there was a problem related to the first Dom0 panic, may be some destroyed BIOS tables. Can the crash be reproduced easily? Juergen> > Regards, > Andre. > > root@dosorca:/data/images# xl cpupool-numa-split > (XEN) Xen BUG at sched_credit.c:990 > (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419 > (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor > (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100 > (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010 > (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100 > (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf > (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8 > (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0 > (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618 > (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff82c480297d80: > (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff ffff830437ffa5e0 > (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0 0000000000000282 > (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00 0000000000000000 > (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c480117fd9 > (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40 ffff82c480125f34 > (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80 000000afb6f8667f > (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20 ffff82c4802d3f80 > (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000 ffff82c4802b0880 > (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123327 > (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20 ffff82c480297f18 > (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801233a2 > (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000 ffff8300c7cd6000 > (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48 0000000000000000 > (XEN) 0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8503f10 > (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80 ffff880000000001 > (XEN) 0000000000000000 0000000000000000 ffffffff810093aa 000000aafab2f86e > (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa > (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8 000000000000e02b > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000 > (XEN) Xen call trace: > (XEN) [<ffff82c4801180f8>] csched_acct+0x11f/0x419 > (XEN) [<ffff82c480125f34>] execute_timer+0x4e/0x6c > (XEN) [<ffff82c480126259>] timer_softirq_action+0xf2/0x245 > (XEN) [<ffff82c480123327>] __do_softirq+0x88/0x99 > (XEN) [<ffff82c4801233a2>] do_softirq+0x6a/0x7a > (XEN) [<ffff82c4801563f5>] idle_loop+0x6a/0x6f > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Xen BUG at sched_credit.c:990 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > >-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Jan-28 11:07 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen Gross wrote:> On 01/28/11 00:18, Andre Przywara wrote: >> Hi, >> >> when I boot my machine without restricting Dom0 (dom0_mem>> dom0_max_vcpus=) I get an _hypervisor_ crash when I run >> # xl cpupool-numa-split >> If Dom0''s resources are limited on the Xen cmdline, everything works fine. >> The crashdump points to a scheduling problem with weights, so I assume >> the NUMA distribution algorithm some fools the hypervisor completely. >> >> I will investigate this further tomorrow, but maybe someone has some >> good idea. > > I''ve seen this once with an older cpupool version on a 24 processor machine. > It was NOT related to NUMA, but did occur only on reboot after a Dom0 panic. > The machine had an init script creating a cpupool and populating it with > cpus. The machine was in a panic loop due to the BUG in sched_acct then until > it was resetted manually. After the reset the problem was gone. > > As I was never able to reproduce the problem later (the same software is > running on dozens of machines!), I assumed there was a problem related to > the first Dom0 panic, may be some destroyed BIOS tables. > > Can the crash be reproduced easily?Yes. If I don''t specify dom0_max_vcpus= and dom0_mem= on the Xen cmdline, I can reliably trigger the crash with xl cpupool-numa-split. Omitting dom0_max_vcpus only does not suffice. Will continue after lunch-break ;-) Regards, Andre.> > > Juergen > >> Regards, >> Andre. >> >> root@dosorca:/data/images# xl cpupool-numa-split >> (XEN) Xen BUG at sched_credit.c:990 >> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419 >> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor >> (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100 >> (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010 >> (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100 >> (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf >> (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8 >> (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0 >> (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618 >> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=ffff82c480297d80: >> (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff ffff830437ffa5e0 >> (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0 0000000000000282 >> (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00 0000000000000000 >> (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c480117fd9 >> (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40 ffff82c480125f34 >> (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80 000000afb6f8667f >> (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20 ffff82c4802d3f80 >> (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000 ffff82c4802b0880 >> (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123327 >> (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20 ffff82c480297f18 >> (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801233a2 >> (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000 ffff8300c7cd6000 >> (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8503f10 >> (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80 ffff880000000001 >> (XEN) 0000000000000000 0000000000000000 ffffffff810093aa 000000aafab2f86e >> (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa >> (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8 000000000000e02b >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [<ffff82c4801180f8>] csched_acct+0x11f/0x419 >> (XEN) [<ffff82c480125f34>] execute_timer+0x4e/0x6c >> (XEN) [<ffff82c480126259>] timer_softirq_action+0xf2/0x245 >> (XEN) [<ffff82c480123327>] __do_softirq+0x88/0x99 >> (XEN) [<ffff82c4801233a2>] do_softirq+0x6a/0x7a >> (XEN) [<ffff82c4801563f5>] idle_loop+0x6a/0x6f >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Xen BUG at sched_credit.c:990 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> > >-- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Jan-28 11:13 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Hmm, strange... looks like it has something to do with the code which keeps track of which vcpus are earning credits. You say this is done immediately after boot, with no VMs running other than dom0? What are the dom0_max_vcpus and dom0_mem settings required to make it work? -George On Fri, Jan 28, 2011 at 6:47 AM, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote:> On 01/28/11 00:18, Andre Przywara wrote: >> >> Hi, >> >> when I boot my machine without restricting Dom0 (dom0_mem>> dom0_max_vcpus=) I get an _hypervisor_ crash when I run >> # xl cpupool-numa-split >> If Dom0''s resources are limited on the Xen cmdline, everything works fine. >> The crashdump points to a scheduling problem with weights, so I assume >> the NUMA distribution algorithm some fools the hypervisor completely. >> >> I will investigate this further tomorrow, but maybe someone has some >> good idea. > > I''ve seen this once with an older cpupool version on a 24 processor machine. > It was NOT related to NUMA, but did occur only on reboot after a Dom0 panic. > The machine had an init script creating a cpupool and populating it with > cpus. The machine was in a panic loop due to the BUG in sched_acct then > until > it was resetted manually. After the reset the problem was gone. > > As I was never able to reproduce the problem later (the same software is > running on dozens of machines!), I assumed there was a problem related to > the first Dom0 panic, may be some destroyed BIOS tables. > > Can the crash be reproduced easily? > > > Juergen > >> >> Regards, >> Andre. >> >> root@dosorca:/data/images# xl cpupool-numa-split >> (XEN) Xen BUG at sched_credit.c:990 >> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419 >> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor >> (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100 >> (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010 >> (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100 >> (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf >> (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8 >> (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0 >> (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618 >> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=ffff82c480297d80: >> (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff ffff830437ffa5e0 >> (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0 0000000000000282 >> (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00 0000000000000000 >> (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c480117fd9 >> (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40 ffff82c480125f34 >> (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80 000000afb6f8667f >> (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20 ffff82c4802d3f80 >> (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000 ffff82c4802b0880 >> (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123327 >> (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20 ffff82c480297f18 >> (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801233a2 >> (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000 ffff8300c7cd6000 >> (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8503f10 >> (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80 ffff880000000001 >> (XEN) 0000000000000000 0000000000000000 ffffffff810093aa 000000aafab2f86e >> (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa >> (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8 000000000000e02b >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [<ffff82c4801180f8>] csched_acct+0x11f/0x419 >> (XEN) [<ffff82c480125f34>] execute_timer+0x4e/0x6c >> (XEN) [<ffff82c480126259>] timer_softirq_action+0xf2/0x245 >> (XEN) [<ffff82c480123327>] __do_softirq+0x88/0x99 >> (XEN) [<ffff82c4801233a2>] do_softirq+0x6a/0x7a >> (XEN) [<ffff82c4801563f5>] idle_loop+0x6a/0x6f >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Xen BUG at sched_credit.c:990 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> > > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 > Fujitsu Technology Solutions e-mail: > juergen.gross@ts.fujitsu.com > Domagkstr. 28 Internet: ts.fujitsu.com > D-80807 Muenchen Company details: > ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Jan-28 11:44 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 01/28/11 12:07, Andre Przywara wrote:> Juergen Gross wrote: >> On 01/28/11 00:18, Andre Przywara wrote: >>> Hi, >>> >>> when I boot my machine without restricting Dom0 (dom0_mem>>> dom0_max_vcpus=) I get an _hypervisor_ crash when I run >>> # xl cpupool-numa-split >>> If Dom0''s resources are limited on the Xen cmdline, everything works >>> fine. >>> The crashdump points to a scheduling problem with weights, so I assume >>> the NUMA distribution algorithm some fools the hypervisor completely. >>> >>> I will investigate this further tomorrow, but maybe someone has some >>> good idea. >> >> I''ve seen this once with an older cpupool version on a 24 processor >> machine. >> It was NOT related to NUMA, but did occur only on reboot after a Dom0 >> panic. >> The machine had an init script creating a cpupool and populating it with >> cpus. The machine was in a panic loop due to the BUG in sched_acct >> then until >> it was resetted manually. After the reset the problem was gone. >> >> As I was never able to reproduce the problem later (the same software is >> running on dozens of machines!), I assumed there was a problem related to >> the first Dom0 panic, may be some destroyed BIOS tables. >> >> Can the crash be reproduced easily? > Yes. > If I don''t specify dom0_max_vcpus= and dom0_mem= on the Xen cmdline, I > can reliably trigger the crash with xl cpupool-numa-split. > Omitting dom0_max_vcpus only does not suffice.Do I understand correctly? No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ? Could you try this patch? diff -r b59f04eb8978 xen/common/schedule.c --- a/xen/common/schedule.c Fri Jan 21 18:06:23 2011 +0000 +++ b/xen/common/schedule.c Fri Jan 28 12:42:46 2011 +0100 @@ -1301,7 +1301,9 @@ void schedule_cpu_switch(unsigned int cp idle = idle_vcpu[cpu]; ppriv = SCHED_OP(new_ops, alloc_pdata, cpu); + BUG_ON(ppriv == NULL); vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv); + BUG_ON(vpriv == NULL); pcpu_schedule_lock_irqsave(cpu, flags); -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Jan-28 13:05 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
George Dunlap wrote:> Hmm, strange... looks like it has something to do with the code which > keeps track of which vcpus are earning credits. You say this is done > immediately after boot, with no VMs running other than dom0?Right, after Dom0''s prompt I just start xl cpupool-numa-split and the machine crashes.> > What are the dom0_max_vcpus and dom0_mem settings required to make it work?dom0_mem=8192M dom0_max_vcpus=6: works dom0_mem=8192M: works dom0_max_vcpus=6: works (no settings): crashes dom0_mem=20480M dom0_max_vcpus=8: works The machine has 8 nodes with 6 CPUs each, the nodes have alternating 16G and 8GB memory (4 12-core (MCM aka dual-node) Opterons with 96GB RAM in total). If I try to reproduce the actions of xl numa-split via a shell script it also crashes, just before the creation of the last pool. I will insert some instrumentation to the code to find the offending action. Regards, Andre.> On Fri, Jan 28, 2011 at 6:47 AM, Juergen Gross > <juergen.gross@ts.fujitsu.com> wrote: >> On 01/28/11 00:18, Andre Przywara wrote: >>> Hi, >>> >>> when I boot my machine without restricting Dom0 (dom0_mem>>> dom0_max_vcpus=) I get an _hypervisor_ crash when I run >>> # xl cpupool-numa-split >>> If Dom0''s resources are limited on the Xen cmdline, everything works fine. >>> The crashdump points to a scheduling problem with weights, so I assume >>> the NUMA distribution algorithm some fools the hypervisor completely. >>> >>> I will investigate this further tomorrow, but maybe someone has some >>> good idea. >> I''ve seen this once with an older cpupool version on a 24 processor machine. >> It was NOT related to NUMA, but did occur only on reboot after a Dom0 panic. >> The machine had an init script creating a cpupool and populating it with >> cpus. The machine was in a panic loop due to the BUG in sched_acct then >> until >> it was resetted manually. After the reset the problem was gone. >> >> As I was never able to reproduce the problem later (the same software is >> running on dozens of machines!), I assumed there was a problem related to >> the first Dom0 panic, may be some destroyed BIOS tables. >> >> Can the crash be reproduced easily? >> >> >> Juergen >> >>> Regards, >>> Andre. >>> >>> root@dosorca:/data/images# xl cpupool-numa-split >>> (XEN) Xen BUG at sched_credit.c:990 >>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- >>> (XEN) CPU: 0 >>> (XEN) RIP: e008:[<ffff82c4801180f8>] csched_acct+0x11f/0x419 >>> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor >>> (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100 >>> (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010 >>> (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100 >>> (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 000000afc7df0edf >>> (XEN) r12: ffff830437ffa5e0 r13: ffff82c480117fd9 r14: ffff830437f9f2e8 >>> (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0 >>> (XEN) cr3: 000000080df4e000 cr2: ffff88179af79618 >>> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 >>> (XEN) Xen stack trace from rsp=ffff82c480297d80: >>> (XEN) 0000000000000282 fffffed4802d3f80 0000000000000eff ffff830437ffa5e0 >>> (XEN) ffff830437ffa5e8 ffff830437ffa870 ffff830437ffa5e0 0000000000000282 >>> (XEN) ffff830437ffa5e8 00002a3037ffa870 00000f0000000f00 0000000000000000 >>> (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c480117fd9 >>> (XEN) ffff830437f9f2e8 ffff830437f9f2e0 ffff82c480297e40 ffff82c480125f34 >>> (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80 000000afb6f8667f >>> (XEN) ffff82c480297e90 ffff82c480126259 ffff82c48024ae20 ffff82c4802d3f80 >>> (XEN) ffff830437f9f2e0 0000000000000000 0000000000000000 ffff82c4802b0880 >>> (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123327 >>> (XEN) ffff82c4802d4a00 ffff82c480297f18 ffff82c48024ae20 ffff82c480297f18 >>> (XEN) 000000afb6abd652 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801233a2 >>> (XEN) ffff82c480297f10 ffff82c4801563f5 0000000000000000 ffff8300c7cd6000 >>> (XEN) 0000000000000000 ffff8300c7ad4000 ffff82c480297d48 0000000000000000 >>> (XEN) 0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8503f10 >>> (XEN) ffff8817a8503fd8 0000000000000246 ffff8817a8503e80 ffff880000000001 >>> (XEN) 0000000000000000 0000000000000000 ffffffff810093aa 000000aafab2f86e >>> (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa >>> (XEN) 000000000000e033 0000000000000246 ffff8817a8503ef8 000000000000e02b >>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000 >>> (XEN) Xen call trace: >>> (XEN) [<ffff82c4801180f8>] csched_acct+0x11f/0x419 >>> (XEN) [<ffff82c480125f34>] execute_timer+0x4e/0x6c >>> (XEN) [<ffff82c480126259>] timer_softirq_action+0xf2/0x245 >>> (XEN) [<ffff82c480123327>] __do_softirq+0x88/0x99 >>> (XEN) [<ffff82c4801233a2>] do_softirq+0x6a/0x7a >>> (XEN) [<ffff82c4801563f5>] idle_loop+0x6a/0x6f >>> (XEN) >>> (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 0: >>> (XEN) Xen BUG at sched_credit.c:990 >>> (XEN) **************************************** >>> (XEN) >>> (XEN) Reboot in five seconds... >>> >>> >> >> -- >> Juergen Gross Principal Developer Operating Systems >> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 >> Fujitsu Technology Solutions e-mail: >> juergen.gross@ts.fujitsu.com >> Domagkstr. 28 Internet: ts.fujitsu.com >> D-80807 Muenchen Company details: >> ts.fujitsu.com/imprint.html >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >-- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Jan-28 13:14 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
> > Do I understand correctly? > No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ?Yes, see my previous mail to George.> > Could you try this patch?Ok, the crash dump is as follows: (XEN) Xen BUG at sched_credit.c:384 (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor (XEN) rax: ffff830434322000 rbx: ffff830434418748 rcx: 0000000000000024 (XEN) rdx: ffff82c4802d3ec0 rsi: 0000000000000003 rdi: ffff8304343c9100 (XEN) rbp: ffff83043457fce8 rsp: ffff83043457fca8 r8: 0000000000000001 (XEN) r9: ffff830434418748 r10: ffff82c48021a0a0 r11: 0000000000000286 (XEN) r12: 0000000000000024 r13: ffff83123a3b2b60 r14: ffff830434418730 (XEN) r15: 0000000000000024 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 00000008061df000 cr2: ffff8817a21f87a0 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff83043457fca8: (XEN) ffff83043457fcb8 ffff83123a3b2b60 0000000000000286 0000000000000024 (XEN) ffff830434418820 ffff83123a3b2a70 0000000000000024 ffff82c4802b0880 (XEN) ffff83043457fd58 ffff82c48011fa63 ffff82f60102aa80 0000000000081554 (XEN) ffff8300c7cfa000 0000000000000000 0000400000000000 ffff82c480248e00 (XEN) 0000000000000002 0000000000000024 ffff830434418820 0000000000305000 (XEN) ffff82c4802550e4 ffff82c4802b0880 ffff83043457fd78 ffff82c48010188c (XEN) ffff83043457fe40 0000000000000024 ffff83043457fdb8 ffff82c480101b94 (XEN) ffff83043457fdb8 ffff82c4801836f2 fffffffe00000286 ffff83043457ff18 (XEN) 0000000002170004 0000000000305000 ffff83043457fef8 ffff82c480125281 (XEN) ffff83043457fdd8 0000000180153c9d 0000000000000000 ffff82c4801068f8 (XEN) 0000000000000296 ffff8300c7e0a1c8 aaaaaaaaaaaaaaaa 0000000000000000 (XEN) ffff88007d1ac170 ffff88007d1ac170 ffff83043457fef8 ffff82c480113d8a (XEN) ffff83043457fe78 ffff83043457fe88 0000000800000012 0000000600000004 (XEN) 0000000000000000 ffffffff00000024 0000000000000000 00007fac2e0e5a00 (XEN) 0000000002170000 0000000000000000 0000000000000000 ffffffffffffffff (XEN) 0000000000000000 0000000000000080 000000000000002f 0000000002170004 (XEN) 0000000002172004 0000000002174004 00007fff878f1c80 0000000000000033 (XEN) ffff83043457fed8 ffff8300c7e0a000 00007fff878f1b30 0000000000305000 (XEN) 0000000000000003 0000000000000003 00007cfbcba800c7 ffff82c480207dd8 (XEN) ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003 (XEN) Xen call trace: (XEN) [<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f (XEN) [<ffff82c48011fa63>] schedule_cpu_switch+0x75/0x1eb (XEN) [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b (XEN) [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461 (XEN) [<ffff82c480125281>] do_sysctl+0x921/0xa30 (XEN) [<ffff82c480207dd8>] syscall_enter+0xc8/0x122 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 2: (XEN) Xen BUG at sched_credit.c:384 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Regards, Andre.> > diff -r b59f04eb8978 xen/common/schedule.c > --- a/xen/common/schedule.c Fri Jan 21 18:06:23 2011 +0000 > +++ b/xen/common/schedule.c Fri Jan 28 12:42:46 2011 +0100 > @@ -1301,7 +1301,9 @@ void schedule_cpu_switch(unsigned int cp > > idle = idle_vcpu[cpu]; > ppriv = SCHED_OP(new_ops, alloc_pdata, cpu); > + BUG_ON(ppriv == NULL); > vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv); > + BUG_ON(vpriv == NULL); > > pcpu_schedule_lock_irqsave(cpu, flags); > > >-- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Jan-31 07:04 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 01/28/11 14:14, Andre Przywara wrote:>> >> Do I understand correctly? >> No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ? > Yes, see my previous mail to George. > >> >> Could you try this patch? > Ok, the crash dump is as follows:Hmm, is the new crash reproducable as well? Seems not to be directly related to my diagnosis patch... Currently I have no NUMA machine available. I tried to use numa=fake=... boot parameter, but this seems to fake only NUMA memory nodes, all cpus are still in node 0: (XEN) ''u'' pressed -> dumping numa info (now-0x120:5D5E0203) (XEN) idx0 -> NODE0 start->0 size->524288 (XEN) phys_to_nid(0000000000001000) -> 0 should be 0 (XEN) idx1 -> NODE1 start->524288 size->524288 (XEN) phys_to_nid(0000000080001000) -> 1 should be 1 (XEN) idx2 -> NODE2 start->1048576 size->524288 (XEN) phys_to_nid(0000000100001000) -> 2 should be 2 (XEN) idx3 -> NODE3 start->1572864 size->1835008 (XEN) phys_to_nid(0000000180001000) -> 3 should be 3 (XEN) CPU0 -> NODE0 (XEN) CPU1 -> NODE0 (XEN) CPU2 -> NODE0 (XEN) CPU3 -> NODE0 (XEN) Memory location of each domain: (XEN) Domain 0 (total: 3003121): (XEN) Node 0: 433864 (XEN) Node 1: 258522 (XEN) Node 2: 514315 (XEN) Node 3: 1796420 I suspect a problem with the __cpuinit stuff overwriting some node info. Andre, could you check this? I hope to reproduce your problem on my machine.> (XEN) Xen BUG at sched_credit.c:384 > (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- > (XEN) CPU: 2 > (XEN) RIP: e008:[<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f > (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor > (XEN) rax: ffff830434322000 rbx: ffff830434418748 rcx: 0000000000000024 > (XEN) rdx: ffff82c4802d3ec0 rsi: 0000000000000003 rdi: ffff8304343c9100 > (XEN) rbp: ffff83043457fce8 rsp: ffff83043457fca8 r8: 0000000000000001 > (XEN) r9: ffff830434418748 r10: ffff82c48021a0a0 r11: 0000000000000286 > (XEN) r12: 0000000000000024 r13: ffff83123a3b2b60 r14: ffff830434418730 > (XEN) r15: 0000000000000024 cr0: 000000008005003b cr4: 00000000000006f0 > (XEN) cr3: 00000008061df000 cr2: ffff8817a21f87a0 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff83043457fca8: > (XEN) ffff83043457fcb8 ffff83123a3b2b60 0000000000000286 0000000000000024 > (XEN) ffff830434418820 ffff83123a3b2a70 0000000000000024 ffff82c4802b0880 > (XEN) ffff83043457fd58 ffff82c48011fa63 ffff82f60102aa80 0000000000081554 > (XEN) ffff8300c7cfa000 0000000000000000 0000400000000000 ffff82c480248e00 > (XEN) 0000000000000002 0000000000000024 ffff830434418820 0000000000305000 > (XEN) ffff82c4802550e4 ffff82c4802b0880 ffff83043457fd78 ffff82c48010188c > (XEN) ffff83043457fe40 0000000000000024 ffff83043457fdb8 ffff82c480101b94 > (XEN) ffff83043457fdb8 ffff82c4801836f2 fffffffe00000286 ffff83043457ff18 > (XEN) 0000000002170004 0000000000305000 ffff83043457fef8 ffff82c480125281 > (XEN) ffff83043457fdd8 0000000180153c9d 0000000000000000 ffff82c4801068f8 > (XEN) 0000000000000296 ffff8300c7e0a1c8 aaaaaaaaaaaaaaaa 0000000000000000 > (XEN) ffff88007d1ac170 ffff88007d1ac170 ffff83043457fef8 ffff82c480113d8a > (XEN) ffff83043457fe78 ffff83043457fe88 0000000800000012 0000000600000004 > (XEN) 0000000000000000 ffffffff00000024 0000000000000000 00007fac2e0e5a00 > (XEN) 0000000002170000 0000000000000000 0000000000000000 ffffffffffffffff > (XEN) 0000000000000000 0000000000000080 000000000000002f 0000000002170004 > (XEN) 0000000002172004 0000000002174004 00007fff878f1c80 0000000000000033 > (XEN) ffff83043457fed8 ffff8300c7e0a000 00007fff878f1b30 0000000000305000 > (XEN) 0000000000000003 0000000000000003 00007cfbcba800c7 ffff82c480207dd8 > (XEN) ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003 > (XEN) Xen call trace: > (XEN) [<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f > (XEN) [<ffff82c48011fa63>] schedule_cpu_switch+0x75/0x1eb > (XEN) [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b > (XEN) [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461 > (XEN) [<ffff82c480125281>] do_sysctl+0x921/0xa30 > (XEN) [<ffff82c480207dd8>] syscall_enter+0xc8/0x122 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 2: > (XEN) Xen BUG at sched_credit.c:384 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds...Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Jan-31 14:59 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen Gross wrote:> On 01/28/11 14:14, Andre Przywara wrote: >>> Do I understand correctly? >>> No crash with only dom0_max_vcpus= and no crash with only dom0_mem= ? >> Yes, see my previous mail to George. >> >>> Could you try this patch? >> Ok, the crash dump is as follows: > > Hmm, is the new crash reproducable as well? > Seems not to be directly related to my diagnosis patch...Right, that was also my impression. I seemed to get a bit further, though: By accident I found that in c/s 22846 the issue is fixed, it works now without crashing. I bisected it down to my own patch, which disables the NODEID_MSR in Dom0. I could confirm this theory by a) applying this single line (clear_bit(NODEID_MSR)) to 22799 and _not_ seeing it crash and b) by removing this line from 22846 and seeing it crash. So my theory is that Dom0 sees different nodes on its virtual CPUs via the physical NodeID MSR, but this association can (and will) be changed every moment by the Xen scheduler. So Dom0 will build a bogus topology based upon these values. As soon as all vCPUs of Dom0 are contained into one node (node 0, this is caused by the cpupool-numa-split call), the Xen scheduler somehow hicks up. So it seems to be bad combination caused by the NodeID-MSR (on newer AMD platforms: sockets C32 and G34) and a NodeID MSR aware Dom0 (2.6.32.27). Since this is a hypervisor crash, I assume that the bug is still there, only the current tip will make it much less likely to be triggered. Hope that help, I will dig deeper now. Regards, Andre.> > Currently I have no NUMA machine available. I tried to use numa=fake=... > boot parameter, but this seems to fake only NUMA memory nodes, all cpus are > still in node 0: > > (XEN) ''u'' pressed -> dumping numa info (now-0x120:5D5E0203) > (XEN) idx0 -> NODE0 start->0 size->524288 > (XEN) phys_to_nid(0000000000001000) -> 0 should be 0 > (XEN) idx1 -> NODE1 start->524288 size->524288 > (XEN) phys_to_nid(0000000080001000) -> 1 should be 1 > (XEN) idx2 -> NODE2 start->1048576 size->524288 > (XEN) phys_to_nid(0000000100001000) -> 2 should be 2 > (XEN) idx3 -> NODE3 start->1572864 size->1835008 > (XEN) phys_to_nid(0000000180001000) -> 3 should be 3 > (XEN) CPU0 -> NODE0 > (XEN) CPU1 -> NODE0 > (XEN) CPU2 -> NODE0 > (XEN) CPU3 -> NODE0 > (XEN) Memory location of each domain: > (XEN) Domain 0 (total: 3003121): > (XEN) Node 0: 433864 > (XEN) Node 1: 258522 > (XEN) Node 2: 514315 > (XEN) Node 3: 1796420 > > I suspect a problem with the __cpuinit stuff overwriting some node info. > Andre, could you check this? I hope to reproduce your problem on my machine. > >> (XEN) Xen BUG at sched_credit.c:384 >> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 2 >> (XEN) RIP: e008:[<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f >> (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor >> (XEN) rax: ffff830434322000 rbx: ffff830434418748 rcx: 0000000000000024 >> (XEN) rdx: ffff82c4802d3ec0 rsi: 0000000000000003 rdi: ffff8304343c9100 >> (XEN) rbp: ffff83043457fce8 rsp: ffff83043457fca8 r8: 0000000000000001 >> (XEN) r9: ffff830434418748 r10: ffff82c48021a0a0 r11: 0000000000000286 >> (XEN) r12: 0000000000000024 r13: ffff83123a3b2b60 r14: ffff830434418730 >> (XEN) r15: 0000000000000024 cr0: 000000008005003b cr4: 00000000000006f0 >> (XEN) cr3: 00000008061df000 cr2: ffff8817a21f87a0 >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=ffff83043457fca8: >> (XEN) ffff83043457fcb8 ffff83123a3b2b60 0000000000000286 0000000000000024 >> (XEN) ffff830434418820 ffff83123a3b2a70 0000000000000024 ffff82c4802b0880 >> (XEN) ffff83043457fd58 ffff82c48011fa63 ffff82f60102aa80 0000000000081554 >> (XEN) ffff8300c7cfa000 0000000000000000 0000400000000000 ffff82c480248e00 >> (XEN) 0000000000000002 0000000000000024 ffff830434418820 0000000000305000 >> (XEN) ffff82c4802550e4 ffff82c4802b0880 ffff83043457fd78 ffff82c48010188c >> (XEN) ffff83043457fe40 0000000000000024 ffff83043457fdb8 ffff82c480101b94 >> (XEN) ffff83043457fdb8 ffff82c4801836f2 fffffffe00000286 ffff83043457ff18 >> (XEN) 0000000002170004 0000000000305000 ffff83043457fef8 ffff82c480125281 >> (XEN) ffff83043457fdd8 0000000180153c9d 0000000000000000 ffff82c4801068f8 >> (XEN) 0000000000000296 ffff8300c7e0a1c8 aaaaaaaaaaaaaaaa 0000000000000000 >> (XEN) ffff88007d1ac170 ffff88007d1ac170 ffff83043457fef8 ffff82c480113d8a >> (XEN) ffff83043457fe78 ffff83043457fe88 0000000800000012 0000000600000004 >> (XEN) 0000000000000000 ffffffff00000024 0000000000000000 00007fac2e0e5a00 >> (XEN) 0000000002170000 0000000000000000 0000000000000000 ffffffffffffffff >> (XEN) 0000000000000000 0000000000000080 000000000000002f 0000000002170004 >> (XEN) 0000000002172004 0000000002174004 00007fff878f1c80 0000000000000033 >> (XEN) ffff83043457fed8 ffff8300c7e0a000 00007fff878f1b30 0000000000305000 >> (XEN) 0000000000000003 0000000000000003 00007cfbcba800c7 ffff82c480207dd8 >> (XEN) ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003 >> (XEN) Xen call trace: >> (XEN) [<ffff82c480117fa0>] csched_alloc_pdata+0x146/0x17f >> (XEN) [<ffff82c48011fa63>] schedule_cpu_switch+0x75/0x1eb >> (XEN) [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b >> (XEN) [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461 >> (XEN) [<ffff82c480125281>] do_sysctl+0x921/0xa30 >> (XEN) [<ffff82c480207dd8>] syscall_enter+0xc8/0x122 >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 2: >> (XEN) Xen BUG at sched_credit.c:384 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... > > > Juergen >-- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Jan-31 15:28 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On Mon, Jan 31, 2011 at 2:59 PM, Andre Przywara <andre.przywara@amd.com> wrote:> Right, that was also my impression. > > I seemed to get a bit further, though: > By accident I found that in c/s 22846 the issue is fixed, it works now > without crashing. I bisected it down to my own patch, which disables the > NODEID_MSR in Dom0. I could confirm this theory by a) applying this single > line (clear_bit(NODEID_MSR)) to 22799 and _not_ seeing it crash and b) by > removing this line from 22846 and seeing it crash. > > So my theory is that Dom0 sees different nodes on its virtual CPUs via the > physical NodeID MSR, but this association can (and will) be changed every > moment by the Xen scheduler. So Dom0 will build a bogus topology based upon > these values. As soon as all vCPUs of Dom0 are contained into one node (node > 0, this is caused by the cpupool-numa-split call), the Xen scheduler somehow > hicks up. > So it seems to be bad combination caused by the NodeID-MSR (on newer AMD > platforms: sockets C32 and G34) and a NodeID MSR aware Dom0 (2.6.32.27). > Since this is a hypervisor crash, I assume that the bug is still there, only > the current tip will make it much less likely to be triggered. > > Hope that help, I will dig deeper now.Thanks. The crashes you''re getting are in fact very strange. They have to do with assumptions that the credit scheduler makes as part of its accounting process. It would only make sense for those to be triggered if a vcpu was moved from one pool to another pool without the proper accounting being done. (Specifically, each vcpu is classified as either "active" or "inactive"; and each scheduler instance keeps track of the total weight of all "active" vcpus. The BUGs you''re tripping over are saying that this invariant has been violated.) However, I''ve looked at the cpupools vcpu-migrate code, and it looks like it does everything right. So I''m a bit mystified. My only thought is if possibly a cpumask somewhere that wasn''t getting set properly, such that a vcpu was being run on a cpu from another pool. Unfortunately I can''t take a good look at this right now; hopefully I''ll be able to take a look next week. Andre, if you were keen, you might go through the credit code and put in a bunch of ASSERTs that the current pcpu is in the mask of the current vcpu; and that the current vcpu is assigned to the pool of the current pcpu, and so on. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-01 16:32 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Hi folks, I asked Stephan Diestelhorst for help and after I convinced him that removing credit and making SEDF the default again is not an option he worked together with me on that ;-) Many thanks for that! We haven''t come to a final solution but could gather some debug data. I will simply dump some data here, maybe somebody has got a clue. We will work further on this tomorrow. First I replaced the BUG_ON with some printks to get some insight: (XEN) sdom->active_vcpu_count: 18 (XEN) sdom->weight: 256 (XEN) weight_left: 4096, weight_total: 4096 (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 (XEN) Xen BUG at sched_credit.c:591 (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- So that one shows that the number of VCPUs is not up-to-date with the computed weight sum, we have seen a difference of one or two VCPUs (in this case here the weight has been computed from 16 VCPUs). Also it shows that the assertion kicks in in the first iteration of the loop, where weight_left and weight_total are still equal. So I additionally instrumented alloc_pdata and free_pdata, the unprefixed lines come from a shell script mimicking the functionality of cpupool-numa-split. ------------ Removing CPUs from Pool 0 Creating new pool Using config file "cpupool.test" cpupool name: Pool-node6 scheduler: credit number of cpus: 1 (XEN) adding CPU 36, now 1 CPUs (XEN) removing CPU 36, remaining: 17 Populating new pool (XEN) sdom->active_vcpu_count: 9 (XEN) sdom->weight: 256 (XEN) weight_left: 2048, weight_total: 2048 (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 (XEN) adding CPU 37, now 2 CPUs (XEN) removing CPU 37, remaining: 16 (XEN) adding CPU 38, now 3 CPUs (XEN) removing CPU 38, remaining: 15 (XEN) adding CPU 39, now 4 CPUs (XEN) removing CPU 39, remaining: 14 (XEN) adding CPU 40, now 5 CPUs (XEN) removing CPU 40, remaining: 13 (XEN) sdom->active_vcpu_count: 17 (XEN) sdom->weight: 256 (XEN) weight_left: 4096, weight_total: 4096 (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 (XEN) adding CPU 41, now 6 CPUs (XEN) removing CPU 41, remaining: 12 ... Two thing startled me: 1) There is quite some between the "Removing CPUs" message from the script and the actual HV printk showing it''s done, why is that not synchronous? Looking at the code it shows that __csched_vcpu_acct_start() is eventually triggered by a timer, shouldn''t that be triggered synchronously by add/removal events? 2) It clearly shows that each CPU gets added to the new pool _before_ it gets removed from the old one (Pool-0), isn''t that violating the "only one pool per CPU" rule? Even it that is fine for a short period of time, maybe the timer kicks in in this very moment resulting in violated invariants? Yours confused, Andre. George Dunlap wrote:> On Mon, Jan 31, 2011 at 2:59 PM, Andre Przywara <andre.przywara@amd.com> wrote: >> Right, that was also my impression. >> >> I seemed to get a bit further, though: >> By accident I found that in c/s 22846 the issue is fixed, it works now >> without crashing. I bisected it down to my own patch, which disables the >> NODEID_MSR in Dom0. I could confirm this theory by a) applying this single >> line (clear_bit(NODEID_MSR)) to 22799 and _not_ seeing it crash and b) by >> removing this line from 22846 and seeing it crash. >> >> So my theory is that Dom0 sees different nodes on its virtual CPUs via the >> physical NodeID MSR, but this association can (and will) be changed every >> moment by the Xen scheduler. So Dom0 will build a bogus topology based upon >> these values. As soon as all vCPUs of Dom0 are contained into one node (node >> 0, this is caused by the cpupool-numa-split call), the Xen scheduler somehow >> hicks up. >> So it seems to be bad combination caused by the NodeID-MSR (on newer AMD >> platforms: sockets C32 and G34) and a NodeID MSR aware Dom0 (2.6.32.27). >> Since this is a hypervisor crash, I assume that the bug is still there, only >> the current tip will make it much less likely to be triggered. >> >> Hope that help, I will dig deeper now. > > Thanks. The crashes you''re getting are in fact very strange. They > have to do with assumptions that the credit scheduler makes as part of > its accounting process. It would only make sense for those to be > triggered if a vcpu was moved from one pool to another pool without > the proper accounting being done. (Specifically, each vcpu is > classified as either "active" or "inactive"; and each scheduler > instance keeps track of the total weight of all "active" vcpus. The > BUGs you''re tripping over are saying that this invariant has been > violated.) However, I''ve looked at the cpupools vcpu-migrate code, > and it looks like it does everything right. So I''m a bit mystified. > My only thought is if possibly a cpumask somewhere that wasn''t getting > set properly, such that a vcpu was being run on a cpu from another > pool. > > Unfortunately I can''t take a good look at this right now; hopefully > I''ll be able to take a look next week. > > Andre, if you were keen, you might go through the credit code and put > in a bunch of ASSERTs that the current pcpu is in the mask of the > current vcpu; and that the current vcpu is assigned to the pool of the > current pcpu, and so on. > > -George >-- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-02 06:27 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/01/11 17:32, Andre Przywara wrote:> Hi folks, > > I asked Stephan Diestelhorst for help and after I convinced him that > removing credit and making SEDF the default again is not an option he > worked together with me on that ;-) Many thanks for that! > We haven''t come to a final solution but could gather some debug data. > I will simply dump some data here, maybe somebody has got a clue. We > will work further on this tomorrow. > > First I replaced the BUG_ON with some printks to get some insight: > (XEN) sdom->active_vcpu_count: 18 > (XEN) sdom->weight: 256 > (XEN) weight_left: 4096, weight_total: 4096 > (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 > (XEN) Xen BUG at sched_credit.c:591 > (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- > > So that one shows that the number of VCPUs is not up-to-date with the > computed weight sum, we have seen a difference of one or two VCPUs (in > this case here the weight has been computed from 16 VCPUs). Also it > shows that the assertion kicks in in the first iteration of the loop, > where weight_left and weight_total are still equal. > > So I additionally instrumented alloc_pdata and free_pdata, the > unprefixed lines come from a shell script mimicking the functionality of > cpupool-numa-split. > ------------ > Removing CPUs from Pool 0 > Creating new pool > Using config file "cpupool.test" > cpupool name: Pool-node6 > scheduler: credit > number of cpus: 1 > (XEN) adding CPU 36, now 1 CPUs > (XEN) removing CPU 36, remaining: 17 > Populating new pool > (XEN) sdom->active_vcpu_count: 9 > (XEN) sdom->weight: 256 > (XEN) weight_left: 2048, weight_total: 2048 > (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 > (XEN) adding CPU 37, now 2 CPUs > (XEN) removing CPU 37, remaining: 16 > (XEN) adding CPU 38, now 3 CPUs > (XEN) removing CPU 38, remaining: 15 > (XEN) adding CPU 39, now 4 CPUs > (XEN) removing CPU 39, remaining: 14 > (XEN) adding CPU 40, now 5 CPUs > (XEN) removing CPU 40, remaining: 13 > (XEN) sdom->active_vcpu_count: 17 > (XEN) sdom->weight: 256 > (XEN) weight_left: 4096, weight_total: 4096 > (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 > (XEN) adding CPU 41, now 6 CPUs > (XEN) removing CPU 41, remaining: 12 > ... > Two thing startled me: > 1) There is quite some between the "Removing CPUs" message from the > script and the actual HV printk showing it''s done, why is that not > synchronous?Removing cpus from Pool-0 requires no switching of the scheduler, so you see no calls of alloc/free_pdata here. > Looking at the code it shows that> __csched_vcpu_acct_start() is eventually triggered by a timer, shouldn''t > that be triggered synchronously by add/removal events?The vcpus are not moved explicitly, they are migrated by the normal scheduler mechanisms, same as for vcpu-pin.> 2) It clearly shows that each CPU gets added to the new pool _before_ it > gets removed from the old one (Pool-0), isn''t that violating the "only > one pool per CPU" rule? Even it that is fine for a short period of time, > maybe the timer kicks in in this very moment resulting in violated > invariants?The sequence you are seeing seems to be okay. The alloc_pdata for the new pool is called before the free_pdata for the old pool. And the timer is not relevant, as only the idle vcpu should be running on the moving cpu and the accounting stuff is never called during idle. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-02 08:49 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/02/11 07:27, Juergen Gross wrote:> On 02/01/11 17:32, Andre Przywara wrote: >> Hi folks, >> >> I asked Stephan Diestelhorst for help and after I convinced him that >> removing credit and making SEDF the default again is not an option he >> worked together with me on that ;-) Many thanks for that! >> We haven''t come to a final solution but could gather some debug data. >> I will simply dump some data here, maybe somebody has got a clue. We >> will work further on this tomorrow. >> >> First I replaced the BUG_ON with some printks to get some insight: >> (XEN) sdom->active_vcpu_count: 18 >> (XEN) sdom->weight: 256 >> (XEN) weight_left: 4096, weight_total: 4096 >> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >> (XEN) Xen BUG at sched_credit.c:591 >> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- >> >> So that one shows that the number of VCPUs is not up-to-date with the >> computed weight sum, we have seen a difference of one or two VCPUs (in >> this case here the weight has been computed from 16 VCPUs). Also it >> shows that the assertion kicks in in the first iteration of the loop, >> where weight_left and weight_total are still equal. >> >> So I additionally instrumented alloc_pdata and free_pdata, the >> unprefixed lines come from a shell script mimicking the functionality of >> cpupool-numa-split. >> ------------ >> Removing CPUs from Pool 0 >> Creating new pool >> Using config file "cpupool.test" >> cpupool name: Pool-node6 >> scheduler: credit >> number of cpus: 1 >> (XEN) adding CPU 36, now 1 CPUs >> (XEN) removing CPU 36, remaining: 17 >> Populating new pool >> (XEN) sdom->active_vcpu_count: 9 >> (XEN) sdom->weight: 256 >> (XEN) weight_left: 2048, weight_total: 2048 >> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >> (XEN) adding CPU 37, now 2 CPUs >> (XEN) removing CPU 37, remaining: 16 >> (XEN) adding CPU 38, now 3 CPUs >> (XEN) removing CPU 38, remaining: 15 >> (XEN) adding CPU 39, now 4 CPUs >> (XEN) removing CPU 39, remaining: 14 >> (XEN) adding CPU 40, now 5 CPUs >> (XEN) removing CPU 40, remaining: 13 >> (XEN) sdom->active_vcpu_count: 17 >> (XEN) sdom->weight: 256 >> (XEN) weight_left: 4096, weight_total: 4096 >> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >> (XEN) adding CPU 41, now 6 CPUs >> (XEN) removing CPU 41, remaining: 12 >> ... >> Two thing startled me: >> 1) There is quite some between the "Removing CPUs" message from the >> script and the actual HV printk showing it''s done, why is that not >> synchronous? > > Removing cpus from Pool-0 requires no switching of the scheduler, so you > see no calls of alloc/free_pdata here. > > > Looking at the code it shows that >> __csched_vcpu_acct_start() is eventually triggered by a timer, shouldn''t >> that be triggered synchronously by add/removal events? > > The vcpus are not moved explicitly, they are migrated by the normal > scheduler mechanisms, same as for vcpu-pin. > >> 2) It clearly shows that each CPU gets added to the new pool _before_ it >> gets removed from the old one (Pool-0), isn''t that violating the "only >> one pool per CPU" rule? Even it that is fine for a short period of time, >> maybe the timer kicks in in this very moment resulting in violated >> invariants? > > The sequence you are seeing seems to be okay. The alloc_pdata for the > new pool > is called before the free_pdata for the old pool. > > And the timer is not relevant, as only the idle vcpu should be running > on the > moving cpu and the accounting stuff is never called during idle.Uhh, this could be wrong! The normal ticker doesn''t call accounting in idle and it is stopped during cpu move. The master_ticker is handled wrong, perhaps. I''ll check this and prepare a patch if necessary. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-02 10:05 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Hi Andre, could you try the attached patch? It should verify if your problems are due to the master ticker kicking in at a time when the cpu is already gone from the cpupool. I''m not sure if the patch is complete - Disabling the master ticker in csched_tick_suspend might lead to problems with cstates. The functionality is different, at least. George, do you think this is correct? Juergen On 02/02/11 09:49, Juergen Gross wrote:> On 02/02/11 07:27, Juergen Gross wrote: >> On 02/01/11 17:32, Andre Przywara wrote: >>> Hi folks, >>> >>> I asked Stephan Diestelhorst for help and after I convinced him that >>> removing credit and making SEDF the default again is not an option he >>> worked together with me on that ;-) Many thanks for that! >>> We haven''t come to a final solution but could gather some debug data. >>> I will simply dump some data here, maybe somebody has got a clue. We >>> will work further on this tomorrow. >>> >>> First I replaced the BUG_ON with some printks to get some insight: >>> (XEN) sdom->active_vcpu_count: 18 >>> (XEN) sdom->weight: 256 >>> (XEN) weight_left: 4096, weight_total: 4096 >>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >>> (XEN) Xen BUG at sched_credit.c:591 >>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- >>> >>> So that one shows that the number of VCPUs is not up-to-date with the >>> computed weight sum, we have seen a difference of one or two VCPUs (in >>> this case here the weight has been computed from 16 VCPUs). Also it >>> shows that the assertion kicks in in the first iteration of the loop, >>> where weight_left and weight_total are still equal. >>> >>> So I additionally instrumented alloc_pdata and free_pdata, the >>> unprefixed lines come from a shell script mimicking the functionality of >>> cpupool-numa-split. >>> ------------ >>> Removing CPUs from Pool 0 >>> Creating new pool >>> Using config file "cpupool.test" >>> cpupool name: Pool-node6 >>> scheduler: credit >>> number of cpus: 1 >>> (XEN) adding CPU 36, now 1 CPUs >>> (XEN) removing CPU 36, remaining: 17 >>> Populating new pool >>> (XEN) sdom->active_vcpu_count: 9 >>> (XEN) sdom->weight: 256 >>> (XEN) weight_left: 2048, weight_total: 2048 >>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >>> (XEN) adding CPU 37, now 2 CPUs >>> (XEN) removing CPU 37, remaining: 16 >>> (XEN) adding CPU 38, now 3 CPUs >>> (XEN) removing CPU 38, remaining: 15 >>> (XEN) adding CPU 39, now 4 CPUs >>> (XEN) removing CPU 39, remaining: 14 >>> (XEN) adding CPU 40, now 5 CPUs >>> (XEN) removing CPU 40, remaining: 13 >>> (XEN) sdom->active_vcpu_count: 17 >>> (XEN) sdom->weight: 256 >>> (XEN) weight_left: 4096, weight_total: 4096 >>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >>> (XEN) adding CPU 41, now 6 CPUs >>> (XEN) removing CPU 41, remaining: 12 >>> ... >>> Two thing startled me: >>> 1) There is quite some between the "Removing CPUs" message from the >>> script and the actual HV printk showing it''s done, why is that not >>> synchronous? >> >> Removing cpus from Pool-0 requires no switching of the scheduler, so you >> see no calls of alloc/free_pdata here. >> >> > Looking at the code it shows that >>> __csched_vcpu_acct_start() is eventually triggered by a timer, shouldn''t >>> that be triggered synchronously by add/removal events? >> >> The vcpus are not moved explicitly, they are migrated by the normal >> scheduler mechanisms, same as for vcpu-pin. >> >>> 2) It clearly shows that each CPU gets added to the new pool _before_ it >>> gets removed from the old one (Pool-0), isn''t that violating the "only >>> one pool per CPU" rule? Even it that is fine for a short period of time, >>> maybe the timer kicks in in this very moment resulting in violated >>> invariants? >> >> The sequence you are seeing seems to be okay. The alloc_pdata for the >> new pool >> is called before the free_pdata for the old pool. >> >> And the timer is not relevant, as only the idle vcpu should be running >> on the >> moving cpu and the accounting stuff is never called during idle. > > Uhh, this could be wrong! > The normal ticker doesn''t call accounting in idle and it is stopped during > cpu move. The master_ticker is handled wrong, perhaps. I''ll check this and > prepare a patch if necessary. > > > Juergen >-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-02 10:59 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen Gross wrote:> Hi Andre, > > could you try the attached patch? > It should verify if your problems are due to the master ticker > kicking in at a time when the cpu is already gone from the cpupool.That''s what we found also yesterday. If the timer routine triggers before the timer is stopped but is actually _running_ afterwards, this could lead to problems. Anyway, the hypervisor still crashes, now at a different BUG_ON(): /* Start off idling... */ BUG_ON(!is_idle_vcpu(per_cpu(schedule_data, cpu).curr)); cpu_set(cpu, prv->idlers); The complete crash dump was this: (XEN) Xen BUG at sched_credit.c:389 (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- (XEN) CPU: 3 (XEN) RIP: e008:[<ffff82c480118020>] csched_alloc_pdata+0x146/0x197 (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor (XEN) rax: ffff830434322000 rbx: ffff830434492478 rcx: 0000000000000018 (XEN) rdx: ffff82c4802d3ec0 rsi: 0000000000000006 rdi: ffff83043445e100 (XEN) rbp: ffff83043456fce8 rsp: ffff83043456fca8 r8: 00000000deadbeef (XEN) r9: ffff830434492478 r10: ffff82c48021a1c0 r11: 0000000000000286 (XEN) r12: 0000000000000018 r13: ffff830a3c70c780 r14: ffff830434492460 (XEN) r15: 0000000000000018 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000805bac000 cr2: 00007fbbdaf71116 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff83043456fca8: (XEN) ffff83043456fcb8 ffff830a3c70c780 0000000000000286 0000000000000018 (XEN) ffff830434492550 ffff830a3c70c690 0000000000000018 ffff82c4802b0880 (XEN) ffff83043456fd58 ffff82c48011fbb3 ffff82f601020900 0000000000081048 (XEN) ffff8300c7e42000 0000000000000000 0000800000000000 ffff82c480249000 (XEN) 0000000000000002 0000000000000018 ffff830434492550 0000000000305000 (XEN) ffff82c4802550e4 ffff82c4802b0880 ffff83043456fd78 ffff82c48010188c (XEN) ffff83043456fe40 0000000000000018 ffff83043456fdb8 ffff82c480101b94 (XEN) ffff83043456fdb8 ffff82c48018380a fffffffe00000286 ffff83043456ff18 (XEN) 0000000001669004 0000000000305000 ffff83043456fef8 ffff82c4801253c1 (XEN) ffff83043456fde8 ffff8300c7ac0000 0000000000000000 0000000000000246 (XEN) ffff83043456fe18 ffff82c480106c7f ffff830434577100 ffff8300c7ac0000 (XEN) ffff83043456fe28 ffff82c480125de4 0000000000000003 ffff82c4802d3f80 (XEN) ffff83043456fe78 0000000000000282 0000000800000012 0000000400000004 (XEN) 0000000000000000 ffffffff00000018 0000000000000000 00007f7e6a549a00 (XEN) 0000000001669000 0000000000000000 0000000000000000 ffffffffffffffff (XEN) 0000000000000000 0000000000000080 000000000000002f 0000000001669004 (XEN) 000000000166b004 000000000166d004 00007fffa59ff250 0000000000000033 (XEN) ffff83043456fed8 ffff8300c7ac0000 00007fffa59ff100 0000000000305000 (XEN) 0000000000000003 0000000000000003 00007cfbcba900c7 ffff82c480207ee8 (XEN) ffffffff8100946a 0000000000000023 0000000000000003 0000000000000003 (XEN) Xen call trace: (XEN) [<ffff82c480118020>] csched_alloc_pdata+0x146/0x197 (XEN) [<ffff82c48011fbb3>] schedule_cpu_switch+0x75/0x1cd (XEN) [<ffff82c48010188c>] cpupool_assign_cpu_locked+0x44/0x8b (XEN) [<ffff82c480101b94>] cpupool_do_sysctl+0x1fb/0x461 (XEN) [<ffff82c4801253c1>] do_sysctl+0x921/0xa30 (XEN) [<ffff82c480207ee8>] syscall_enter+0xc8/0x122 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 3: (XEN) Xen BUG at sched_credit.c:389 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Regards, Andre.> > I''m not sure if the patch is complete - Disabling the master ticker > in csched_tick_suspend might lead to problems with cstates. The > functionality is different, at least. > > George, do you think this is correct? > > > Juergen > > On 02/02/11 09:49, Juergen Gross wrote: >> On 02/02/11 07:27, Juergen Gross wrote: >>> On 02/01/11 17:32, Andre Przywara wrote: >>>> Hi folks, >>>> >>>> I asked Stephan Diestelhorst for help and after I convinced him that >>>> removing credit and making SEDF the default again is not an option he >>>> worked together with me on that ;-) Many thanks for that! >>>> We haven''t come to a final solution but could gather some debug data. >>>> I will simply dump some data here, maybe somebody has got a clue. We >>>> will work further on this tomorrow. >>>> >>>> First I replaced the BUG_ON with some printks to get some insight: >>>> (XEN) sdom->active_vcpu_count: 18 >>>> (XEN) sdom->weight: 256 >>>> (XEN) weight_left: 4096, weight_total: 4096 >>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >>>> (XEN) Xen BUG at sched_credit.c:591 >>>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]---- >>>> >>>> So that one shows that the number of VCPUs is not up-to-date with the >>>> computed weight sum, we have seen a difference of one or two VCPUs (in >>>> this case here the weight has been computed from 16 VCPUs). Also it >>>> shows that the assertion kicks in in the first iteration of the loop, >>>> where weight_left and weight_total are still equal. >>>> >>>> So I additionally instrumented alloc_pdata and free_pdata, the >>>> unprefixed lines come from a shell script mimicking the functionality of >>>> cpupool-numa-split. >>>> ------------ >>>> Removing CPUs from Pool 0 >>>> Creating new pool >>>> Using config file "cpupool.test" >>>> cpupool name: Pool-node6 >>>> scheduler: credit >>>> number of cpus: 1 >>>> (XEN) adding CPU 36, now 1 CPUs >>>> (XEN) removing CPU 36, remaining: 17 >>>> Populating new pool >>>> (XEN) sdom->active_vcpu_count: 9 >>>> (XEN) sdom->weight: 256 >>>> (XEN) weight_left: 2048, weight_total: 2048 >>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >>>> (XEN) adding CPU 37, now 2 CPUs >>>> (XEN) removing CPU 37, remaining: 16 >>>> (XEN) adding CPU 38, now 3 CPUs >>>> (XEN) removing CPU 38, remaining: 15 >>>> (XEN) adding CPU 39, now 4 CPUs >>>> (XEN) removing CPU 39, remaining: 14 >>>> (XEN) adding CPU 40, now 5 CPUs >>>> (XEN) removing CPU 40, remaining: 13 >>>> (XEN) sdom->active_vcpu_count: 17 >>>> (XEN) sdom->weight: 256 >>>> (XEN) weight_left: 4096, weight_total: 4096 >>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0 >>>> (XEN) adding CPU 41, now 6 CPUs >>>> (XEN) removing CPU 41, remaining: 12 >>>> ... >>>> Two thing startled me: >>>> 1) There is quite some between the "Removing CPUs" message from the >>>> script and the actual HV printk showing it''s done, why is that not >>>> synchronous? >>> Removing cpus from Pool-0 requires no switching of the scheduler, so you >>> see no calls of alloc/free_pdata here. >>> >>>> Looking at the code it shows that >>>> __csched_vcpu_acct_start() is eventually triggered by a timer, shouldn''t >>>> that be triggered synchronously by add/removal events? >>> The vcpus are not moved explicitly, they are migrated by the normal >>> scheduler mechanisms, same as for vcpu-pin. >>> >>>> 2) It clearly shows that each CPU gets added to the new pool _before_ it >>>> gets removed from the old one (Pool-0), isn''t that violating the "only >>>> one pool per CPU" rule? Even it that is fine for a short period of time, >>>> maybe the timer kicks in in this very moment resulting in violated >>>> invariants? >>> The sequence you are seeing seems to be okay. The alloc_pdata for the >>> new pool >>> is called before the free_pdata for the old pool. >>> >>> And the timer is not relevant, as only the idle vcpu should be running >>> on the >>> moving cpu and the accounting stuff is never called during idle. >> Uhh, this could be wrong! >> The normal ticker doesn''t call accounting in idle and it is stopped during >> cpu move. The master_ticker is handled wrong, perhaps. I''ll check this and >> prepare a patch if necessary. >> >> >> Juergen >> > >-- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan Diestelhorst
2011-Feb-02 14:39 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Hi folks, long time no see. :-) On Tuesday 01 February 2011 17:32:25 Andre Przywara wrote:> I asked Stephan Diestelhorst for help and after I convinced him that > removing credit and making SEDF the default again is not an option he > worked together with me on that ;-) Many thanks for that! > We haven''t come to a final solution but could gather some debug data. > I will simply dump some data here, maybe somebody has got a clue. We > will work further on this tomorrow.Andre and I have been looking through this further, in particular sanity checking the invariant prv->weight >= sdom->weight * sdom->active_vcpu_count each time someone tweaks the active vcpu count. This happens only in __csched_vcpu_acct_start and __csched_vcpu_acct_stop_locked. We managed to observe the broken invariant when splitting cpupoools. We have the following theory of what happens: * some vcpus of a particular domain are currently in the process of being moved to the new pool * some are still left on the old pool (vcpus_old) and some are already in the new pool (vcpus_new) * we now have vcpus_old->sdom = vcpus_new->sdom and following from this * vcpus_old->sdom->weight = vcpus_new->sdom->weight * vcpus_old->sdom->active_vcpu_count = vcpus_new->sdom->active_vcpu_count * active_vcpu_count thus does not represent the separation of the actual vpcus (may be the sum, only the old or new ones, does not matter) * however, sched_old != sched_new, and thus * sched_old->prv != sched_new->prv * sched_old->prv->weight != sched_new->prv->weight * the prv->weight field hence sees the incremental move of VCPUs (through modifications in *acct_start and *acct_stop_locked) * if at any point in this half-way migration, the scheduler wants to csched_acct, it erroneously checks the wrong active_vcpu_count Workarounds / fixes (none tried): * disable scheduler accounting while half-way migrating a domain (dom->pool_migrating flag and then checking in csched_acct) * temporarily split the sdom structures while migrating to account for transient split of vcpus * synchronously disable all vcpus, migrate and then re-enable Caveats: * prv->lock does not guarantee mutual exclusion between (same) schedulers of different pools <rant> The general locking policy vs the comment situation is a nightmare. I know that we have some advanced data-structure folks here, but intuitively reasoning about when specific things are atomic and mutually excluded is a pain in the scheduler / cpupool code, see the issue with the separate prv->locks above. E.g. cpupool_unassign_cpu and cpupool_unassign_cpu_helper interplay: * cpupool_unassign_cpu unlocks cpupool_lock * sets up the continuation calling cpupool_unassign_cpu_helper * cpupool_unassign_cpu_helper locks cpupool_lock * while intuitively, one would think that both should see a consistent snapshot and hence freeing the lock in the middle is a bad idea * also communicating continuation-local state through global variables mandates that only a single global continuation can be pending * reading cpu outside of the lock protection in cpupool_unassign_cpu_helper also smells </rant> Despite the rant, it is amazing to see the ability to move running things around through this remote continuation trick! In my (ancient) balancer experiments I added hypervisor-threads just for side- stepping this issue.. Stephan -- Stephan Diestelhorst, AMD Operating System Research Center stephan.diestelhorst@amd.com Tel. +49 (0)351 448 356 719 Advanced Micro Devices GmbH Einsteinring 24 85609 Aschheim Germany Geschaeftsfuehrer: Alberto Bozzo u. Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632, WEEE-Reg-Nr: DE 12919551 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-02 15:14 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/02/11 15:39, Stephan Diestelhorst wrote:> Hi folks, > long time no see. :-) > > On Tuesday 01 February 2011 17:32:25 Andre Przywara wrote: >> I asked Stephan Diestelhorst for help and after I convinced him that >> removing credit and making SEDF the default again is not an option he >> worked together with me on that ;-) Many thanks for that! >> We haven''t come to a final solution but could gather some debug data. >> I will simply dump some data here, maybe somebody has got a clue. We >> will work further on this tomorrow. > > Andre and I have been looking through this further, in particular sanity > checking the invariant > > prv->weight>= sdom->weight * sdom->active_vcpu_count > > each time someone tweaks the active vcpu count. This happens only in > __csched_vcpu_acct_start and __csched_vcpu_acct_stop_locked. We managed > to observe the broken invariant when splitting cpupoools. > > We have the following theory of what happens: > * some vcpus of a particular domain are currently in the process of > being moved to the new poolThe only _vcpus_ to be moved between pools are the idle vcpus. And those never contribute to accounting in credit scheduler. We are moving _pcpus_ only (well, moving a domain between pools actually moves vcpus as well, but then the domain is paused). On the pcpu to be moved the idle vcpu should be running. Obviously you have found a scenario where this isn''t true. I have no idea how this could happen, as other then idle vcpus are taken into account for scheduling only if the pcpu is valid in the cpupool. And the pcpu is set valid after the BUG_ON you have triggered in your tests.> > * some are still left on the old pool (vcpus_old) and some are already > in the new pool (vcpus_new) > > * we now have vcpus_old->sdom = vcpus_new->sdom and following from this > * vcpus_old->sdom->weight = vcpus_new->sdom->weight > * vcpus_old->sdom->active_vcpu_count = vcpus_new->sdom->active_vcpu_count > > * active_vcpu_count thus does not represent the separation of the > actual vpcus (may be the sum, only the old or new ones, does not > matter) > > * however, sched_old != sched_new, and thus > * sched_old->prv != sched_new->prv > * sched_old->prv->weight != sched_new->prv->weight > > * the prv->weight field hence sees the incremental move of VCPUs > (through modifications in *acct_start and *acct_stop_locked) > > * if at any point in this half-way migration, the scheduler wants to > csched_acct, it erroneously checks the wrong active_vcpu_count > > Workarounds / fixes (none tried): > * disable scheduler accounting while half-way migrating a domain > (dom->pool_migrating flag and then checking in csched_acct) > * temporarily split the sdom structures while migrating to account for > transient split of vcpus > * synchronously disable all vcpus, migrate and then re-enable > > Caveats: > * prv->lock does not guarantee mutual exclusion between (same) > schedulers of different pools > > <rant> > The general locking policy vs the comment situation is a nightmare. > I know that we have some advanced data-structure folks here, but > intuitively reasoning about when specific things are atomic and > mutually excluded is a pain in the scheduler / cpupool code, see the > issue with the separate prv->locks above. > > E.g. cpupool_unassign_cpu and cpupool_unassign_cpu_helper interplay: > * cpupool_unassign_cpu unlocks cpupool_lock > * sets up the continuation calling cpupool_unassign_cpu_helper > * cpupool_unassign_cpu_helper locks cpupool_lock > * while intuitively, one would think that both should see a consistent > snapshot and hence freeing the lock in the middle is a bad idea > * also communicating continuation-local state through global variables > mandates that only a single global continuation can be pending > > * reading cpu outside of the lock protection in > cpupool_unassign_cpu_helper also smells > </rant> > > Despite the rant, it is amazing to see the ability to move running > things around through this remote continuation trick! In my (ancient) > balancer experiments I added hypervisor-threads just for side- > stepping this issue..I think the easiest way to solve the problem would be to move the cpu to the new pool in a tasklet. This is possible now, because tasklets are always executed in the idle vcpus. OTOH I''d like to understand what is wrong with my current approach... Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan Diestelhorst
2011-Feb-02 16:01 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On Wednesday 02 February 2011 16:14:25 Juergen Gross wrote:> On 02/02/11 15:39, Stephan Diestelhorst wrote: > > We have the following theory of what happens: > > * some vcpus of a particular domain are currently in the process of > > being moved to the new pool > > The only _vcpus_ to be moved between pools are the idle vcpus. And those > never contribute to accounting in credit scheduler. > > We are moving _pcpus_ only (well, moving a domain between pools actually > moves vcpus as well, but then the domain is paused).How do you ensure that the domain is paused and stays that way? Pausing the domain was what I had in mind, too...> > Despite the rant, it is amazing to see the ability to move running > > things around through this remote continuation trick! In my (ancient) > > balancer experiments I added hypervisor-threads just for side- > > stepping this issue.. > > I think the easiest way to solve the problem would be to move the cpu to the > new pool in a tasklet. This is possible now, because tasklets are always > executed in the idle vcpus.Yep. That was exactly what I build. At the time stuff like that did not exist (2005).> OTOH I''d like to understand what is wrong with my current approach...Nothing, in fact I like it. In my rant I complained about the fact that splitting the critical section accross this continuation looks scary, basically causing some generic red lights to turn on :-) And making reasoning about the correctness a little complicated, but that may well be a local issue ;-) Stephan -- Stephan Diestelhorst, AMD Operating System Research Center stephan.diestelhorst@amd.com Tel. +49 (0)351 448 356 719 Advanced Micro Devices GmbH Einsteinring 24 85609 Aschheim Germany Geschaeftsfuehrer: Alberto Bozzo u. Andrew Bowd; Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632, WEEE-Reg-Nr: DE 12919551 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-03 05:57 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/02/11 17:01, Stephan Diestelhorst wrote:> On Wednesday 02 February 2011 16:14:25 Juergen Gross wrote: >> On 02/02/11 15:39, Stephan Diestelhorst wrote: >>> We have the following theory of what happens: >>> * some vcpus of a particular domain are currently in the process of >>> being moved to the new pool >> >> The only _vcpus_ to be moved between pools are the idle vcpus. And those >> never contribute to accounting in credit scheduler. >> >> We are moving _pcpus_ only (well, moving a domain between pools actually >> moves vcpus as well, but then the domain is paused). > > How do you ensure that the domain is paused and stays that way? Pausing > the domain was what I had in mind, too...Look at sched_move_domain() in schedule.c: I''m calling domain_pause() before moving the vcpus and domain_unpause() after that.> >>> Despite the rant, it is amazing to see the ability to move running >>> things around through this remote continuation trick! In my (ancient) >>> balancer experiments I added hypervisor-threads just for side- >>> stepping this issue.. >> >> I think the easiest way to solve the problem would be to move the cpu to the >> new pool in a tasklet. This is possible now, because tasklets are always >> executed in the idle vcpus. > > Yep. That was exactly what I build. At the time stuff like that did > not exist (2005). > >> OTOH I''d like to understand what is wrong with my current approach... > > Nothing, in fact I like it. In my rant I complained about the fact > that splitting the critical section accross this continuation looks > scary, basically causing some generic red lights to turn on :-) And > making reasoning about the correctness a little complicated, but that > may well be a local issue ;-)Perhaps you can help solving the miracle: Could you replace the BUG_ON in sched_credit.c:389 with something like this: if (!is_idle_vcpu(per_cpu(schedule_data, cpu).curr)) { extern void dump_runq(unsigned char key); struct vcpu *vc = per_cpu(schedule_data, cpu).curr; printk("+++ (%d.%d) instead idle vcpu on cpu %d\n", vc->domain->domain_id, vc->vcpu_id, cpu); dump_runq(''q''); BUG(); } Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-03 09:18 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Andre, Stephan, could you give the attached patch a try? It moves the cpu assigning/unassigning into a tasklet always executed on the cpu to be moved. This should avoid critical races. Regarding Stephans rant: You should be aware that the main critical sections are only in the tasklets. The locking in the main routines is needed only to avoid the cpupool to be destroyed in between. I''m not sure whether the master_ticker patch is still needed. It seems to break something, as my machine hung up after several 100 cpu moves (without the new patch). I''m still investigating this problem. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-04 14:09 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen Gross wrote:> Andre, Stephan, > > could you give the attached patch a try? > It moves the cpu assigning/unassigning into a tasklet always executed on the > cpu to be moved. This should avoid critical races.Done. I checked it twice, but sadly it does not fix the issue. It still BUGs: (XEN) Xen BUG at sched_credit.c:990 (XEN) ----[ Xen-4.1.0-rc3-pre x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82c480118208>] csched_acct+0x11f/0x419 (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (XEN) rax: 0000000000000010 rbx: 0000000000000f00 rcx: 0000000000000100 (XEN) rdx: 0000000000001000 rsi: ffff830437ffa600 rdi: 0000000000000010 (XEN) rbp: ffff82c480297e10 rsp: ffff82c480297d80 r8: 0000000000000100 (XEN) r9: 0000000000000006 r10: ffff82c4802d4100 r11: 0000017322fea49a (XEN) r12: ffff830437ffa5e0 r13: ffff82c4801180e9 r14: ffff83043399f018 (XEN) r15: ffff830434321ec0 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 00000000c7c9c000 cr2: 0000000001ec8048 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff82c480297d80: (XEN) ffff82c480297f18 fffffed4c7cd6000 ffff830000000eff ffff830437ffa5e0 (XEN) ffff830437ffa5e8 ffff82c480297df8 ffff830437ffa5e0 0000000000000282 (XEN) ffff830437ffa5e8 00001c200000000f 00000f0000000f00 0000000000000000 (XEN) ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c4801180e9 (XEN) ffff83043399f018 ffff83043399f010 ffff82c480297e40 ffff82c480126044 (XEN) 0000000000000002 ffff830437ffa600 ffff82c4802d3f80 00000173010849b7 (XEN) ffff82c480297e90 ffff82c480126369 ffff82c48024aea0 ffff82c4802d3f80 (XEN) ffff83043399f010 0000000000000000 0000000000000000 ffff82c4802b0880 (XEN) ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123437 (XEN) ffff8300c7e1e0f8 ffff82c480297f18 ffff82c48024aea0 ffff82c480297f18 (XEN) 0000017301008665 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801234b2 (XEN) ffff82c480297f10 ffff82c4801564f5 0000000000000000 ffff8300c7cd6000 (XEN) 0000000000000000 ffff8300c7e1e000 ffff82c480297d48 0000000000000000 (XEN) 0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8553f10 (XEN) ffff8817a8553fd8 0000000000000246 ffff8817a8553e80 ffff880000000001 (XEN) 0000000000000000 0000000000000000 ffffffff810093aa 000000000000e030 (XEN) 00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa (XEN) 000000000000e033 0000000000000246 ffff8817a8553ef8 000000000000e02b (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) [<ffff82c480118208>] csched_acct+0x11f/0x419 (XEN) [<ffff82c480126044>] execute_timer+0x4e/0x6c (XEN) [<ffff82c480126369>] timer_softirq_action+0xf2/0x245 (XEN) [<ffff82c480123437>] __do_softirq+0x88/0x99 (XEN) [<ffff82c4801234b2>] do_softirq+0x6a/0x7a (XEN) [<ffff82c4801564f5>] idle_loop+0x6a/0x6f (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at sched_credit.c:990 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Stephan had created more printk debug patches, we will summarize the results soon. Regards, Andre.> > Regarding Stephans rant: > You should be aware that the main critical sections are only in the tasklets. > The locking in the main routines is needed only to avoid the cpupool to be > destroyed in between. > > I''m not sure whether the master_ticker patch is still needed. It seems to > break something, as my machine hung up after several 100 cpu moves (without > the new patch). I''m still investigating this problem. > > > Juergen > >-- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-07 12:38 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen, as promised some more debug data. This is from c/s 22858 with Stephans debug patch (attached). We get the following dump when the hypervisor crashes, note that the first lock is different from the second and subsequent ones: (XEN) sched_credit.c, 572: prv: ffff831836df2970 &prv->lock: ffff831836df2970 prv->weight: 256 sdom->active_vcpu_count: 3 sdom->weight: 256 (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: ffff830437ffa5e0 prv->weight: 768 sdom->active_vcpu_count: 4 sdom->weight: 256 (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: ffff830437ffa5e0 prv->weight: 1024 sdom->active_vcpu_count: 5 sdom->weight: 256 (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: ffff830437ffa5e0 prv->weight: 1280 sdom->active_vcpu_count: 6 sdom->weight: 256 .... Hope that gives you an idea. I attach the whole log for your reference. Regards, Andre -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-07 13:32 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/07/11 13:38, Andre Przywara wrote:> Juergen, > > as promised some more debug data. This is from c/s 22858 with Stephans > debug patch (attached). > We get the following dump when the hypervisor crashes, note that the > first lock is different from the second and subsequent ones: > > (XEN) sched_credit.c, 572: prv: ffff831836df2970 &prv->lock: > ffff831836df2970 prv->weight: 256 sdom->active_vcpu_count: 3 > sdom->weight: 256 > (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: > ffff830437ffa5e0 prv->weight: 768 sdom->active_vcpu_count: 4 > sdom->weight: 256 > (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: > ffff830437ffa5e0 prv->weight: 1024 sdom->active_vcpu_count: 5 > sdom->weight: 256 > (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: > ffff830437ffa5e0 prv->weight: 1280 sdom->active_vcpu_count: 6 > sdom->weight: 256 > > .... > > Hope that gives you an idea. I attach the whole log for your reference.Hmm, could it be your log wasn''t created with the attached patch? I''m missing Dom-Id and VCPU from the printk() above, which would be interesting (at least I hope so)... Additionally printing the local pcpu number would help, too. And could you add a printk for the new prv address in csched_init()? It would be nice if you could enable cpupool diag output. Please use the attached patch (includes the previous patch for executing the cpu move on the cpu to be moved, plus some diag printk corrections). Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Feb-07 15:55 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen, What is supposed to happen if a domain is in cpupool0, and then all of the cpus are taken out of cpupool0? Is that possible? It looks like there''s code in cpupools.c:cpupool_unassign_cpu() which will move all VMs in a cpupool to cpupool0 before removing the last cpu. But what happens if cpupool0 is the pool that has become empty? It seems like that breaks a lot of the assumptions; e.g., sched_move_domain() seems to assume that the pool we''re moving a VM to actually has cpus. While we''re at it, what''s with the "(cpu != cpu_moving_cpu)" in the first half of cpupool_unassign_cpu()? Under what conditions are you anticipating cpupool_unassign_cpu() being called a second time before the first completes? If you have to abort the move because schedule_cpu_switch() failed, wouldn''t it be better just to roll the whole transaction back, rather than leaving it hanging in the middle? Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0? What could possibly be the use of grabbing a random cpupool and then trying to remove the specified cpu from it? Andre, you might think about folding the attached patch into your debug patch. -George On Mon, Feb 7, 2011 at 1:32 PM, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote:> On 02/07/11 13:38, Andre Przywara wrote: >> >> Juergen, >> >> as promised some more debug data. This is from c/s 22858 with Stephans >> debug patch (attached). >> We get the following dump when the hypervisor crashes, note that the >> first lock is different from the second and subsequent ones: >> >> (XEN) sched_credit.c, 572: prv: ffff831836df2970 &prv->lock: >> ffff831836df2970 prv->weight: 256 sdom->active_vcpu_count: 3 >> sdom->weight: 256 >> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: >> ffff830437ffa5e0 prv->weight: 768 sdom->active_vcpu_count: 4 >> sdom->weight: 256 >> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: >> ffff830437ffa5e0 prv->weight: 1024 sdom->active_vcpu_count: 5 >> sdom->weight: 256 >> (XEN) sched_credit.c, 572: prv: ffff830437ffa5e0 &prv->lock: >> ffff830437ffa5e0 prv->weight: 1280 sdom->active_vcpu_count: 6 >> sdom->weight: 256 >> >> .... >> >> Hope that gives you an idea. I attach the whole log for your reference. > > Hmm, could it be your log wasn''t created with the attached patch? I''m > missing > Dom-Id and VCPU from the printk() above, which would be interesting (at > least > I hope so)... > Additionally printing the local pcpu number would help, too. > And could you add a printk for the new prv address in csched_init()? > > It would be nice if you could enable cpupool diag output. Please use the > attached patch (includes the previous patch for executing the cpu move on > the > cpu to be moved, plus some diag printk corrections). > > > Juergen > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 > Fujitsu Technology Solutions e-mail: > juergen.gross@ts.fujitsu.com > Domagkstr. 28 Internet: ts.fujitsu.com > D-80807 Muenchen Company details: > ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-08 05:43 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/07/11 16:55, George Dunlap wrote:> Juergen, > > What is supposed to happen if a domain is in cpupool0, and then all of > the cpus are taken out of cpupool0? Is that possible?No. Cpupool0 can''t be without any cpu, as Dom0 is always member of cpupool0.> > It looks like there''s code in cpupools.c:cpupool_unassign_cpu() which > will move all VMs in a cpupool to cpupool0 before removing the last > cpu. But what happens if cpupool0 is the pool that has become empty? > It seems like that breaks a lot of the assumptions; e.g., > sched_move_domain() seems to assume that the pool we''re moving a VM to > actually has cpus.The move of VMs to cpupool0 is done only for domains which are dying. If there are any active domains in the cpupool, removing the last cpu from it will be denied.> > While we''re at it, what''s with the "(cpu != cpu_moving_cpu)" in the > first half of cpupool_unassign_cpu()? Under what conditions are you > anticipating cpupool_unassign_cpu() being called a second time before > the first completes? If you have to abort the move because > schedule_cpu_switch() failed, wouldn''t it be better just to roll the > whole transaction back, rather than leaving it hanging in the middle?Not really. It could take some time until all vcpus have been migrated to another cpu. In this case -EAGAIN is returned and the cpu is already removed from the cpumask of valid cpus for that cpupool to avoid scheduling of other vcpus on that cpu. Without cpu_moving_cpu there would be no forward progress guaranteed.> > Hmm, and why does RMCPU call cpupool_get_by_id() with exact==0? What > could possibly be the use of grabbing a random cpupool and then trying > to remove the specified cpu from it?This is a very good question :-) I think this should be fixed. Seems to be a copy and paste error. I''ll send a patch. Thanks for your thoughts, Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Feb-08 12:08 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote:> On 02/07/11 16:55, George Dunlap wrote: >> >> Juergen, >> >> What is supposed to happen if a domain is in cpupool0, and then all of >> the cpus are taken out of cpupool0? Is that possible? > > No. Cpupool0 can''t be without any cpu, as Dom0 is always member of cpupool0.If that''s the case, then since Andre is running this immediately after boot, he shouldn''t be seeing any vcpus in the new pools; and all of the dom0 vcpus should be migrated to cpupool0, right? Is it possible that migration process isn''t happening properly? It looks like schedule.c:cpu_disable_scheduler() will try to migrate all vcpus, and if it fails to migrate, it returns -EAGAIN so that the tools will try again. It''s probably worth instrumenting that whole code-path to make sure it actually happens as we expect. Are we certain, for example, that if a hypercall continued on another cpu will actually return the new error value properly? Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why is the cpu''s bit set in cpupool_free_cpus without checking to see if the cpu_disable_scheduler() call actually worked? Shouldn''t that also be inside the if() statement? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Feb-08 12:14 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Andre, Can you try again with the attached patch? Thanks, -George On Tue, Feb 8, 2011 at 12:08 PM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross > <juergen.gross@ts.fujitsu.com> wrote: >> On 02/07/11 16:55, George Dunlap wrote: >>> >>> Juergen, >>> >>> What is supposed to happen if a domain is in cpupool0, and then all of >>> the cpus are taken out of cpupool0? Is that possible? >> >> No. Cpupool0 can''t be without any cpu, as Dom0 is always member of cpupool0. > > If that''s the case, then since Andre is running this immediately after > boot, he shouldn''t be seeing any vcpus in the new pools; and all of > the dom0 vcpus should be migrated to cpupool0, right? Is it possible > that migration process isn''t happening properly? > > It looks like schedule.c:cpu_disable_scheduler() will try to migrate > all vcpus, and if it fails to migrate, it returns -EAGAIN so that the > tools will try again. It''s probably worth instrumenting that whole > code-path to make sure it actually happens as we expect. Are we > certain, for example, that if a hypercall continued on another cpu > will actually return the new error value properly? > > Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why > is the cpu''s bit set in cpupool_free_cpus without checking to see if > the cpu_disable_scheduler() call actually worked? Shouldn''t that also > be inside the if() statement? > > -George >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-08 12:23 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/08/11 13:08, George Dunlap wrote:> On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross > <juergen.gross@ts.fujitsu.com> wrote: >> On 02/07/11 16:55, George Dunlap wrote: >>> >>> Juergen, >>> >>> What is supposed to happen if a domain is in cpupool0, and then all of >>> the cpus are taken out of cpupool0? Is that possible? >> >> No. Cpupool0 can''t be without any cpu, as Dom0 is always member of cpupool0. > > If that''s the case, then since Andre is running this immediately after > boot, he shouldn''t be seeing any vcpus in the new pools; and all of > the dom0 vcpus should be migrated to cpupool0, right? Is it possible > that migration process isn''t happening properly?Again: not the vcpus are migrated to cpupool0, but the physical cpus are taken away from it, so the vcpus being active on the cpu to be moved MUST be migrated to other cpus of cpupool0.> > It looks like schedule.c:cpu_disable_scheduler() will try to migrate > all vcpus, and if it fails to migrate, it returns -EAGAIN so that the > tools will try again. It''s probably worth instrumenting that whole > code-path to make sure it actually happens as we expect. Are we > certain, for example, that if a hypercall continued on another cpu > will actually return the new error value properly?I have checked that and did never see any problem. And yes, I did see the EAGAIN case happen. With my test patch to execute the cpu_disable_scheduler() always on the cpu to be moved this should not be a problem at all, since the tasklet is always running in the idle vcpu.> > Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why > is the cpu''s bit set in cpupool_free_cpus without checking to see if > the cpu_disable_scheduler() call actually worked? Shouldn''t that also > be inside the if() statement?No, I don''t think so. If removing a cpu fails permanently after returning -EAGAIN before, it should be addable to the original cpupool easily. This can only be done, if it is flagged as free. Adding it to another cpupool will be denied as cpupool_cpu_moving is still set. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-08 16:33 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
George Dunlap wrote:> Andre, > > Can you try again with the attached patch?Sure. Unfortunately (or is this a good sign?) the "Migration failed" message didn''t trigger, I only saw various instances of the other printk, see the attached log file. Migration is happening quite often, because Dom0 has 48 vCPUs and in the end they are squashed into less and less pCPUs. I guess that is the reason my I see it on my machine. Regards, Andre.> > Thanks, > -George > > On Tue, Feb 8, 2011 at 12:08 PM, George Dunlap > <George.Dunlap@eu.citrix.com> wrote: >> On Tue, Feb 8, 2011 at 5:43 AM, Juergen Gross >> <juergen.gross@ts.fujitsu.com> wrote: >>> On 02/07/11 16:55, George Dunlap wrote: >>>> Juergen, >>>> >>>> What is supposed to happen if a domain is in cpupool0, and then all of >>>> the cpus are taken out of cpupool0? Is that possible? >>> No. Cpupool0 can''t be without any cpu, as Dom0 is always member of cpupool0. >> If that''s the case, then since Andre is running this immediately after >> boot, he shouldn''t be seeing any vcpus in the new pools; and all of >> the dom0 vcpus should be migrated to cpupool0, right? Is it possible >> that migration process isn''t happening properly? >> >> It looks like schedule.c:cpu_disable_scheduler() will try to migrate >> all vcpus, and if it fails to migrate, it returns -EAGAIN so that the >> tools will try again. It''s probably worth instrumenting that whole >> code-path to make sure it actually happens as we expect. Are we >> certain, for example, that if a hypercall continued on another cpu >> will actually return the new error value properly? >> >> Another minor thing: In cpupool.c:cpupool_unassign_cpu_helper(), why >> is the cpu''s bit set in cpupool_free_cpus without checking to see if >> the cpu_disable_scheduler() call actually worked? Shouldn''t that also >> be inside the if() statement? >> >> -George >>-- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Feb-09 12:27 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara <andre.przywara@amd.com> wrote:> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24 > (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24 > (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24 > (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25 > (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25 > (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25 > (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26 > (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26 > (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26 > (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27 > (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27 > (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27 > (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27 > (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28 > (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28 > (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28 > (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28 > (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28 > (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29Interesting -- what seems to happen here is that as cpus are disabled, vcpus are "shovelled" in an accumulative fashion from one cpu to the next: * v18,34,42 start on cpu 24. * When 24 is brought down, they''re all migrated to 25; then when 25 is brougth down, to 26, then to 27 * v24 is running on cpu 27, so when 27 is brought down, v24 is added to the mix * v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29. While that behavior may not be ideal, it should certainly be bug-free. Another interesting thing to note is that the bug happened on pcpu 32, but there were no advertised migrations from that cpu. Andre, can you fold the attached patch into your testing? Thanks for all your work on this. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Feb-09 12:27 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Sorry, forgot the patch... -G On Wed, Feb 9, 2011 at 12:27 PM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara <andre.przywara@amd.com> wrote: >> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24 >> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24 >> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24 >> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25 >> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25 >> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25 >> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26 >> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26 >> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26 >> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27 >> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27 >> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27 >> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27 >> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28 >> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28 >> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28 >> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28 >> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28 >> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29 > > Interesting -- what seems to happen here is that as cpus are disabled, > vcpus are "shovelled" in an accumulative fashion from one cpu to the > next: > * v18,34,42 start on cpu 24. > * When 24 is brought down, they''re all migrated to 25; then when 25 is > brougth down, to 26, then to 27 > * v24 is running on cpu 27, so when 27 is brought down, v24 is added to the mix > * v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29. > > While that behavior may not be ideal, it should certainly be bug-free. > > Another interesting thing to note is that the bug happened on pcpu 32, > but there were no advertised migrations from that cpu. > > Andre, can you fold the attached patch into your testing? > > Thanks for all your work on this. > > -George >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-09 13:04 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/09/11 13:27, George Dunlap wrote:> Sorry, forgot the patch... > -G > > On Wed, Feb 9, 2011 at 12:27 PM, George Dunlap > <George.Dunlap@eu.citrix.com> wrote: >> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara<andre.przywara@amd.com> wrote: >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24 >>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25 >>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26 >>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29 >> >> Interesting -- what seems to happen here is that as cpus are disabled, >> vcpus are "shovelled" in an accumulative fashion from one cpu to the >> next: >> * v18,34,42 start on cpu 24. >> * When 24 is brought down, they''re all migrated to 25; then when 25 is >> brougth down, to 26, then to 27 >> * v24 is running on cpu 27, so when 27 is brought down, v24 is added to the mix >> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29. >> >> While that behavior may not be ideal, it should certainly be bug-free. >> >> Another interesting thing to note is that the bug happened on pcpu 32, >> but there were no advertised migrations from that cpu.If I understand the configuration of Andre''s machine correctly, pcpu32 will be the target of the next migrations. This pcpu is member of the next numa node, correct? Could it be there is a problem with the call of domain_update_node_affinity() from cpu_disable_scheduler() ? Hmm, I think this could really be the problem. Andre, could you try the following patch? diff -r f1fac30a531b xen/common/schedule.c --- a/xen/common/schedule.c Wed Feb 09 08:58:11 2011 +0000 +++ b/xen/common/schedule.c Wed Feb 09 14:02:12 2011 +0100 @@ -491,6 +491,10 @@ int cpu_disable_scheduler(unsigned int c v->domain->domain_id, v->vcpu_id); cpus_setall(v->cpu_affinity); affinity_broken = 1; + } + if ( cpus_weight(v->cpu_affinity) < NR_CPUS ) + { + cpu_clear(cpu, v->cpu_affinity); } if ( v->processor == cpu ) Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-09 13:39 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen Gross wrote:>>> Another interesting thing to note is that the bug happened on pcpu 32, >>> but there were no advertised migrations from that cpu. > > If I understand the configuration of Andre''s machine correctly, pcpu32 will > be the target of the next migrations. This pcpu is member of the next numa > node, correct?No, this is a 6-core box, so the NUMA node span pcpu30-35.> > Could it be there is a problem with the call of domain_update_node_affinity() > from cpu_disable_scheduler() ? > > Hmm, I think this could really be the problem. > Andre, could you try the following patch?Sorry, but that one didn''t help. It crashed with the well-known BUG_ON: (XEN) Xen BUG at sched_credit.c:990 (which is the weight assert in csched_acct (c/s 22858)) Regards, Andre.> > diff -r f1fac30a531b xen/common/schedule.c > --- a/xen/common/schedule.c Wed Feb 09 08:58:11 2011 +0000 > +++ b/xen/common/schedule.c Wed Feb 09 14:02:12 2011 +0100 > @@ -491,6 +491,10 @@ int cpu_disable_scheduler(unsigned int c > v->domain->domain_id, v->vcpu_id); > cpus_setall(v->cpu_affinity); > affinity_broken = 1; > + } > + if ( cpus_weight(v->cpu_affinity) < NR_CPUS ) > + { > + cpu_clear(cpu, v->cpu_affinity); > } > > if ( v->processor == cpu ) > > > Juergen >-- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-09 13:51 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
George Dunlap wrote:> <George.Dunlap@eu.citrix.com> wrote: >> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara <andre.przywara@amd.com> wrote: >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24 >>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25 >>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26 >>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27 >>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28 >>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29 >> Interesting -- what seems to happen here is that as cpus are disabled, >> vcpus are "shovelled" in an accumulative fashion from one cpu to the >> next: >> * v18,34,42 start on cpu 24. >> * When 24 is brought down, they''re all migrated to 25; then when 25 is >> brougth down, to 26, then to 27 >> * v24 is running on cpu 27, so when 27 is brought down, v24 is added to the mix >> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto cpu 29. >> >> While that behavior may not be ideal, it should certainly be bug-free. >> >> Another interesting thing to note is that the bug happened on pcpu 32, >> but there were no advertised migrations from that cpu. >> >> Andre, can you fold the attached patch into your testing?Sorry, but that bug (and its output) didn''t trigger on two tries. Instead I now saw two occasions of the "migration failed, must retry later" message. Interestingly enough is does not seem to be fatal. The first time it triggers, the numa-split even completes, then after I roll it back and repeat it it shows again, but crashes later on that old BUG_ON(). See the attached log for more details. Thanks for the try, anyway. Regards, Andre.>> >> Thanks for all your work on this.I am glad for all your help. I only start to really understand the scheduler, so your support is much appreciated.>> >> -George >>-- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-09 14:21 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Andre, George, What seems to be interesting: I think the problem did always occur when a new cpupool was created and the first cpu was moved to it. I think my previous assumption regarding the master_ticker was not too bad. I think somehow the master_ticker of the new cpupool is becoming active before the scheduler is really initialized properly. This could happen, if enough time is spent between alloc_pdata for the cpu to be moved and the critical section in schedule_cpu_switch(). The solution should be to activate the timers only if the scheduler is ready for them. George, do you think the master_ticker should be stopped in suspend_ticker as well? I still see potential problems for entering deep C-States. I think I''ll prepare a patch which will keep the master_ticker active for the C-State case and migrate it for the schedule_cpu_switch() case. Juergen On 02/09/11 14:51, Andre Przywara wrote:> George Dunlap wrote: >> <George.Dunlap@eu.citrix.com> wrote: >>> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara >>> <andre.przywara@amd.com> wrote: >>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24 >>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24 >>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24 >>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25 >>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25 >>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25 >>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26 >>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26 >>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26 >>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27 >>>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27 >>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27 >>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27 >>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28 >>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28 >>>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28 >>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28 >>>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28 >>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29 >>> Interesting -- what seems to happen here is that as cpus are disabled, >>> vcpus are "shovelled" in an accumulative fashion from one cpu to the >>> next: >>> * v18,34,42 start on cpu 24. >>> * When 24 is brought down, they''re all migrated to 25; then when 25 is >>> brougth down, to 26, then to 27 >>> * v24 is running on cpu 27, so when 27 is brought down, v24 is added >>> to the mix >>> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto >>> cpu 29. >>> >>> While that behavior may not be ideal, it should certainly be bug-free. >>> >>> Another interesting thing to note is that the bug happened on pcpu 32, >>> but there were no advertised migrations from that cpu. >>> >>> Andre, can you fold the attached patch into your testing? > Sorry, but that bug (and its output) didn''t trigger on two tries. > Instead I now saw two occasions of the "migration failed, must retry > later" message. Interestingly enough is does not seem to be fatal. The > first time it triggers, the numa-split even completes, then after I roll > it back and repeat it it shows again, but crashes later on that old > BUG_ON(). > > See the attached log for more details. > > Thanks for the try, anyway. > > Regards, > Andre. > > >>> >>> Thanks for all your work on this. > I am glad for all your help. I only start to really understand the > scheduler, so your support is much appreciated. > >>> >>> -George >>> > >-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-10 06:42 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/09/11 15:21, Juergen Gross wrote:> Andre, George, > > > What seems to be interesting: I think the problem did always occur when > a new cpupool was created and the first cpu was moved to it. > > I think my previous assumption regarding the master_ticker was not too bad. > I think somehow the master_ticker of the new cpupool is becoming active > before the scheduler is really initialized properly. This could happen, if > enough time is spent between alloc_pdata for the cpu to be moved and the > critical section in schedule_cpu_switch(). > > The solution should be to activate the timers only if the scheduler is > ready for them. > > George, do you think the master_ticker should be stopped in suspend_ticker > as well? I still see potential problems for entering deep C-States. I think > I''ll prepare a patch which will keep the master_ticker active for the > C-State case and migrate it for the schedule_cpu_switch() case.Okay, here is a patch for this. It ran on my 4-core machine without any problems. Andre, could you give it a try? Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-10 09:25 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/10/2011 07:42 AM, Juergen Gross wrote:> On 02/09/11 15:21, Juergen Gross wrote: >> Andre, George, >> >> >> What seems to be interesting: I think the problem did always occur when >> a new cpupool was created and the first cpu was moved to it. >> >> I think my previous assumption regarding the master_ticker was not too bad. >> I think somehow the master_ticker of the new cpupool is becoming active >> before the scheduler is really initialized properly. This could happen, if >> enough time is spent between alloc_pdata for the cpu to be moved and the >> critical section in schedule_cpu_switch(). >> >> The solution should be to activate the timers only if the scheduler is >> ready for them. >> >> George, do you think the master_ticker should be stopped in suspend_ticker >> as well? I still see potential problems for entering deep C-States. I think >> I''ll prepare a patch which will keep the master_ticker active for the >> C-State case and migrate it for the schedule_cpu_switch() case. > > Okay, here is a patch for this. It ran on my 4-core machine without any > problems. > Andre, could you give it a try?Did, but unfortunately it crashed as always. Tried twice and made sure I booted the right kernel. Sorry. The idea with the race between the timer and the state changing sounded very appealing, actually that was suspicious to me from the beginning. I will add some code to dump the state of all cpupools to the BUG_ON to see in which situation we are when the bug triggers. Regards, Andre. -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-10 14:18 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Andre Przywara wrote:> On 02/10/2011 07:42 AM, Juergen Gross wrote: >> On 02/09/11 15:21, Juergen Gross wrote: >>> Andre, George, >>> >>> >>> What seems to be interesting: I think the problem did always occur when >>> a new cpupool was created and the first cpu was moved to it. >>> >>> I think my previous assumption regarding the master_ticker was not too bad. >>> I think somehow the master_ticker of the new cpupool is becoming active >>> before the scheduler is really initialized properly. This could happen, if >>> enough time is spent between alloc_pdata for the cpu to be moved and the >>> critical section in schedule_cpu_switch(). >>> >>> The solution should be to activate the timers only if the scheduler is >>> ready for them. >>> >>> George, do you think the master_ticker should be stopped in suspend_ticker >>> as well? I still see potential problems for entering deep C-States. I think >>> I''ll prepare a patch which will keep the master_ticker active for the >>> C-State case and migrate it for the schedule_cpu_switch() case. >> Okay, here is a patch for this. It ran on my 4-core machine without any >> problems. >> Andre, could you give it a try? > Did, but unfortunately it crashed as always. Tried twice and made sure I > booted the right kernel. Sorry. > The idea with the race between the timer and the state changing sounded > very appealing, actually that was suspicious to me from the beginning. > > I will add some code to dump the state of all cpupools to the BUG_ON to > see in which situation we are when the bug triggers.OK, here is a first try of this, the patch iterates over all CPU pools and outputs some data if the BUG_ON ((sdom->weight * sdom->active_vcpu_count) > weight_left) condition triggers: (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 (XEN) Xen BUG at sched_credit.c:1010 .... The masks look proper (6 cores per node), the bug triggers when the first CPU is about to be(?) inserted. HTH, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-11 06:17 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/10/11 15:18, Andre Przywara wrote:> Andre Przywara wrote: >> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>> On 02/09/11 15:21, Juergen Gross wrote: >>>> Andre, George, >>>> >>>> >>>> What seems to be interesting: I think the problem did always occur when >>>> a new cpupool was created and the first cpu was moved to it. >>>> >>>> I think my previous assumption regarding the master_ticker was not >>>> too bad. >>>> I think somehow the master_ticker of the new cpupool is becoming active >>>> before the scheduler is really initialized properly. This could >>>> happen, if >>>> enough time is spent between alloc_pdata for the cpu to be moved and >>>> the >>>> critical section in schedule_cpu_switch(). >>>> >>>> The solution should be to activate the timers only if the scheduler is >>>> ready for them. >>>> >>>> George, do you think the master_ticker should be stopped in >>>> suspend_ticker >>>> as well? I still see potential problems for entering deep C-States. >>>> I think >>>> I''ll prepare a patch which will keep the master_ticker active for the >>>> C-State case and migrate it for the schedule_cpu_switch() case. >>> Okay, here is a patch for this. It ran on my 4-core machine without any >>> problems. >>> Andre, could you give it a try? >> Did, but unfortunately it crashed as always. Tried twice and made sure >> I booted the right kernel. Sorry. >> The idea with the race between the timer and the state changing >> sounded very appealing, actually that was suspicious to me from the >> beginning. >> >> I will add some code to dump the state of all cpupools to the BUG_ON >> to see in which situation we are when the bug triggers. > OK, here is a first try of this, the patch iterates over all CPU pools > and outputs some data if the BUG_ON > ((sdom->weight * sdom->active_vcpu_count) > weight_left) condition > triggers: > (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f > (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 > (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 > (XEN) Xen BUG at sched_credit.c:1010 > .... > The masks look proper (6 cores per node), the bug triggers when the > first CPU is about to be(?) inserted.Sure? I''m missing the cpu with mask 2000. I''ll try to reproduce the problem on a larger machine here (24 cores, 4 numa nodes). Andre, can you give me your xen boot parameters? Which xen changeset are you running, and do you have any additional patches in use? Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-11 07:39 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen Gross wrote:> On 02/10/11 15:18, Andre Przywara wrote: >> Andre Przywara wrote: >>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>> Andre, George, >>>>> >>>>> >>>>> What seems to be interesting: I think the problem did always occur when >>>>> a new cpupool was created and the first cpu was moved to it. >>>>> >>>>> I think my previous assumption regarding the master_ticker was not >>>>> too bad. >>>>> I think somehow the master_ticker of the new cpupool is becoming active >>>>> before the scheduler is really initialized properly. This could >>>>> happen, if >>>>> enough time is spent between alloc_pdata for the cpu to be moved and >>>>> the >>>>> critical section in schedule_cpu_switch(). >>>>> >>>>> The solution should be to activate the timers only if the scheduler is >>>>> ready for them. >>>>> >>>>> George, do you think the master_ticker should be stopped in >>>>> suspend_ticker >>>>> as well? I still see potential problems for entering deep C-States. >>>>> I think >>>>> I''ll prepare a patch which will keep the master_ticker active for the >>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>> Okay, here is a patch for this. It ran on my 4-core machine without any >>>> problems. >>>> Andre, could you give it a try? >>> Did, but unfortunately it crashed as always. Tried twice and made sure >>> I booted the right kernel. Sorry. >>> The idea with the race between the timer and the state changing >>> sounded very appealing, actually that was suspicious to me from the >>> beginning. >>> >>> I will add some code to dump the state of all cpupools to the BUG_ON >>> to see in which situation we are when the bug triggers. >> OK, here is a first try of this, the patch iterates over all CPU pools >> and outputs some data if the BUG_ON >> ((sdom->weight * sdom->active_vcpu_count) > weight_left) condition >> triggers: >> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f >> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >> (XEN) Xen BUG at sched_credit.c:1010 >> .... >> The masks look proper (6 cores per node), the bug triggers when the >> first CPU is about to be(?) inserted. > > Sure? I''m missing the cpu with mask 2000. > I''ll try to reproduce the problem on a larger machine here (24 cores, 4 numa > nodes). > Andre, can you give me your xen boot parameters? Which xen changeset are you > running, and do you have any additional patches in use?The grub lines: kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 All of my experiments are use c/s 22858 as a base. If you use a AMD Magny-Cours box for your experiments (socket C32 or G34), you should add the following patch (removing the line) --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) __clear_bit(X86_FEATURE_SKINIT % 32, &c); __clear_bit(X86_FEATURE_WDT % 32, &c); __clear_bit(X86_FEATURE_LWP % 32, &c); - __clear_bit(X86_FEATURE_NODEID_MSR % 32, &c); __clear_bit(X86_FEATURE_TOPOEXT % 32, &c); break; case 5: /* MONITOR/MWAIT */ This is not necessary (in fact that reverts my patch c/s 22815), but raises the probability to trigger the bug, probably because it increases the pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, try to create a guest with many VCPUs and squeeze it into a small CPU-pool. Good luck ;-) Andre. -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Feb-14 17:57 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
The good news is, I''ve managed to reproduce this on my local test hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the attached script. It''s time to go home now, but I should be able to dig something up tomorrow. To use the script: * Rename cpupool0 to "p0", and create an empty second pool, "p1" * You can modify elements by adding "arg=val" as arguments. * Arguments are: + dryrun={true,false} Do the work, but don''t actually execute any xl arguments. Default false. + left: Number commands to execute. Default 10. + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is 8 cpus). + verbose={true,false} Print what you''re doing. Default is true. The script sometimes attempts to remove the last cpu from cpupool0; in this case, libxl will print an error. If the script gets an error under that condition, it will ignore it; under any other condition, it will print diagnostic information. What finally crashed it for me was this command: # ./cpupool-test.sh verbose=false left=1000 -George On Fri, Feb 11, 2011 at 7:39 AM, Andre Przywara <andre.przywara@amd.com> wrote:> Juergen Gross wrote: >> >> On 02/10/11 15:18, Andre Przywara wrote: >>> >>> Andre Przywara wrote: >>>> >>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>> >>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>> >>>>>> Andre, George, >>>>>> >>>>>> >>>>>> What seems to be interesting: I think the problem did always occur >>>>>> when >>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>> >>>>>> I think my previous assumption regarding the master_ticker was not >>>>>> too bad. >>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>> active >>>>>> before the scheduler is really initialized properly. This could >>>>>> happen, if >>>>>> enough time is spent between alloc_pdata for the cpu to be moved and >>>>>> the >>>>>> critical section in schedule_cpu_switch(). >>>>>> >>>>>> The solution should be to activate the timers only if the scheduler is >>>>>> ready for them. >>>>>> >>>>>> George, do you think the master_ticker should be stopped in >>>>>> suspend_ticker >>>>>> as well? I still see potential problems for entering deep C-States. >>>>>> I think >>>>>> I''ll prepare a patch which will keep the master_ticker active for the >>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>> >>>>> Okay, here is a patch for this. It ran on my 4-core machine without any >>>>> problems. >>>>> Andre, could you give it a try? >>>> >>>> Did, but unfortunately it crashed as always. Tried twice and made sure >>>> I booted the right kernel. Sorry. >>>> The idea with the race between the timer and the state changing >>>> sounded very appealing, actually that was suspicious to me from the >>>> beginning. >>>> >>>> I will add some code to dump the state of all cpupools to the BUG_ON >>>> to see in which situation we are when the bug triggers. >>> >>> OK, here is a first try of this, the patch iterates over all CPU pools >>> and outputs some data if the BUG_ON >>> ((sdom->weight * sdom->active_vcpu_count) > weight_left) condition >>> triggers: >>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f >>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>> (XEN) Xen BUG at sched_credit.c:1010 >>> .... >>> The masks look proper (6 cores per node), the bug triggers when the >>> first CPU is about to be(?) inserted. >> >> Sure? I''m missing the cpu with mask 2000. >> I''ll try to reproduce the problem on a larger machine here (24 cores, 4 >> numa >> nodes). >> Andre, can you give me your xen boot parameters? Which xen changeset are >> you >> running, and do you have any additional patches in use? > > The grub lines: > kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 > module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 > console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 > > All of my experiments are use c/s 22858 as a base. > If you use a AMD Magny-Cours box for your experiments (socket C32 or G34), > you should add the following patch (removing the line) > --- a/xen/arch/x86/traps.c > +++ b/xen/arch/x86/traps.c > @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) > __clear_bit(X86_FEATURE_SKINIT % 32, &c); > __clear_bit(X86_FEATURE_WDT % 32, &c); > __clear_bit(X86_FEATURE_LWP % 32, &c); > - __clear_bit(X86_FEATURE_NODEID_MSR % 32, &c); > __clear_bit(X86_FEATURE_TOPOEXT % 32, &c); > break; > case 5: /* MONITOR/MWAIT */ > > This is not necessary (in fact that reverts my patch c/s 22815), but raises > the probability to trigger the bug, probably because it increases the > pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, try to > create a guest with many VCPUs and squeeze it into a small CPU-pool. > > Good luck ;-) > Andre. > > -- > Andre Przywara > AMD-OSRC (Dresden) > Tel: x29712 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-15 07:22 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/14/11 18:57, George Dunlap wrote:> The good news is, I''ve managed to reproduce this on my local test > hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the > attached script. It''s time to go home now, but I should be able to > dig something up tomorrow. > > To use the script: > * Rename cpupool0 to "p0", and create an empty second pool, "p1" > * You can modify elements by adding "arg=val" as arguments. > * Arguments are: > + dryrun={true,false} Do the work, but don''t actually execute any xl > arguments. Default false. > + left: Number commands to execute. Default 10. > + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is 8 cpus). > + verbose={true,false} Print what you''re doing. Default is true. > > The script sometimes attempts to remove the last cpu from cpupool0; in > this case, libxl will print an error. If the script gets an error > under that condition, it will ignore it; under any other condition, it > will print diagnostic information. > > What finally crashed it for me was this command: > # ./cpupool-test.sh verbose=false left=1000Nice! With your script I finally managed to get the error, too. On my box (2 sockets a 6 cores) I had to use ./cpupool-test.sh verbose=false left=10000 maxcpus=11 to trigger it. Looking for more data now... Juergen> > -George > > On Fri, Feb 11, 2011 at 7:39 AM, Andre Przywara<andre.przywara@amd.com> wrote: >> Juergen Gross wrote: >>> >>> On 02/10/11 15:18, Andre Przywara wrote: >>>> >>>> Andre Przywara wrote: >>>>> >>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>>> >>>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>>> >>>>>>> Andre, George, >>>>>>> >>>>>>> >>>>>>> What seems to be interesting: I think the problem did always occur >>>>>>> when >>>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>>> >>>>>>> I think my previous assumption regarding the master_ticker was not >>>>>>> too bad. >>>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>>> active >>>>>>> before the scheduler is really initialized properly. This could >>>>>>> happen, if >>>>>>> enough time is spent between alloc_pdata for the cpu to be moved and >>>>>>> the >>>>>>> critical section in schedule_cpu_switch(). >>>>>>> >>>>>>> The solution should be to activate the timers only if the scheduler is >>>>>>> ready for them. >>>>>>> >>>>>>> George, do you think the master_ticker should be stopped in >>>>>>> suspend_ticker >>>>>>> as well? I still see potential problems for entering deep C-States. >>>>>>> I think >>>>>>> I''ll prepare a patch which will keep the master_ticker active for the >>>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>>> >>>>>> Okay, here is a patch for this. It ran on my 4-core machine without any >>>>>> problems. >>>>>> Andre, could you give it a try? >>>>> >>>>> Did, but unfortunately it crashed as always. Tried twice and made sure >>>>> I booted the right kernel. Sorry. >>>>> The idea with the race between the timer and the state changing >>>>> sounded very appealing, actually that was suspicious to me from the >>>>> beginning. >>>>> >>>>> I will add some code to dump the state of all cpupools to the BUG_ON >>>>> to see in which situation we are when the bug triggers. >>>> >>>> OK, here is a first try of this, the patch iterates over all CPU pools >>>> and outputs some data if the BUG_ON >>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition >>>> triggers: >>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f >>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>>> (XEN) Xen BUG at sched_credit.c:1010 >>>> .... >>>> The masks look proper (6 cores per node), the bug triggers when the >>>> first CPU is about to be(?) inserted. >>> >>> Sure? I''m missing the cpu with mask 2000. >>> I''ll try to reproduce the problem on a larger machine here (24 cores, 4 >>> numa >>> nodes). >>> Andre, can you give me your xen boot parameters? Which xen changeset are >>> you >>> running, and do you have any additional patches in use? >> >> The grub lines: >> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 >> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 >> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 >> >> All of my experiments are use c/s 22858 as a base. >> If you use a AMD Magny-Cours box for your experiments (socket C32 or G34), >> you should add the following patch (removing the line) >> --- a/xen/arch/x86/traps.c >> +++ b/xen/arch/x86/traps.c >> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) >> __clear_bit(X86_FEATURE_SKINIT % 32,&c); >> __clear_bit(X86_FEATURE_WDT % 32,&c); >> __clear_bit(X86_FEATURE_LWP % 32,&c); >> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); >> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); >> break; >> case 5: /* MONITOR/MWAIT */ >> >> This is not necessary (in fact that reverts my patch c/s 22815), but raises >> the probability to trigger the bug, probably because it increases the >> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, try to >> create a guest with many VCPUs and squeeze it into a small CPU-pool. >> >> Good luck ;-) >> Andre. >> >> -- >> Andre Przywara >> AMD-OSRC (Dresden) >> Tel: x29712 >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-16 09:47 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Okay, I have some more data. I activated cpupool_dprintk() and included checks in sched_credit.c to test for weight inconsistencies. To reduce race possibilities I''ve added my patch to execute cpu assigning/unassigning always in a tasklet on the cpu to be moved. Here is the result: (XEN) cpupool_unassign_cpu(pool=0,cpu=6) (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 (XEN) cpupool_unassign_cpu(pool=0,cpu=6) (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 (XEN) cpupool_assign_cpu(pool=0,cpu=1) (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0 (XEN) cpupool_assign_cpu(cpu=1) ret 0 (XEN) cpupool_assign_cpu(pool=1,cpu=4) (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40 (XEN) cpupool_assign_cpu(cpu=4) ret 0 (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0: (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1 (XEN) Xen BUG at sched_credit.c:570 (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- (XEN) CPU: 4 (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx: 0000000000000000 (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi: ffff82c4802542e8 (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8: 0000000000000004 (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11: 0000000000000001 (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14: ffff831002ad5e40 (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff830839dcfde8: (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000 (XEN) 0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651 (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204 (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100 (XEN) ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880 (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647 (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20 (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2 (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002 (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50 (XEN) 0000000000000246 0000000000000032 0000000000000000 00000000ffffffff (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848 (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033 (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000004 (XEN) ffff830077eee000 00000043b9afd180 0000000000000000 (XEN) Xen call trace: (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239 (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99 (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 4: (XEN) Xen BUG at sched_credit.c:570 (XEN) **************************************** As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON triggered in csched_acct() is a logical result of this. How this can happen I don''t know yet. Anyone any idea? I''ll keep searching... Juergen On 02/15/11 08:22, Juergen Gross wrote:> On 02/14/11 18:57, George Dunlap wrote: >> The good news is, I''ve managed to reproduce this on my local test >> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the >> attached script. It''s time to go home now, but I should be able to >> dig something up tomorrow. >> >> To use the script: >> * Rename cpupool0 to "p0", and create an empty second pool, "p1" >> * You can modify elements by adding "arg=val" as arguments. >> * Arguments are: >> + dryrun={true,false} Do the work, but don''t actually execute any xl >> arguments. Default false. >> + left: Number commands to execute. Default 10. >> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is >> 8 cpus). >> + verbose={true,false} Print what you''re doing. Default is true. >> >> The script sometimes attempts to remove the last cpu from cpupool0; in >> this case, libxl will print an error. If the script gets an error >> under that condition, it will ignore it; under any other condition, it >> will print diagnostic information. >> >> What finally crashed it for me was this command: >> # ./cpupool-test.sh verbose=false left=1000 > > Nice! > With your script I finally managed to get the error, too. On my box (2 > sockets > a 6 cores) I had to use > > ./cpupool-test.sh verbose=false left=10000 maxcpus=11 > > to trigger it. > Looking for more data now... > > > Juergen > >> >> -George >> >> On Fri, Feb 11, 2011 at 7:39 AM, Andre >> Przywara<andre.przywara@amd.com> wrote: >>> Juergen Gross wrote: >>>> >>>> On 02/10/11 15:18, Andre Przywara wrote: >>>>> >>>>> Andre Przywara wrote: >>>>>> >>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>>>> >>>>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>>>> >>>>>>>> Andre, George, >>>>>>>> >>>>>>>> >>>>>>>> What seems to be interesting: I think the problem did always occur >>>>>>>> when >>>>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>>>> >>>>>>>> I think my previous assumption regarding the master_ticker was not >>>>>>>> too bad. >>>>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>>>> active >>>>>>>> before the scheduler is really initialized properly. This could >>>>>>>> happen, if >>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved >>>>>>>> and >>>>>>>> the >>>>>>>> critical section in schedule_cpu_switch(). >>>>>>>> >>>>>>>> The solution should be to activate the timers only if the >>>>>>>> scheduler is >>>>>>>> ready for them. >>>>>>>> >>>>>>>> George, do you think the master_ticker should be stopped in >>>>>>>> suspend_ticker >>>>>>>> as well? I still see potential problems for entering deep C-States. >>>>>>>> I think >>>>>>>> I''ll prepare a patch which will keep the master_ticker active >>>>>>>> for the >>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>>>> >>>>>>> Okay, here is a patch for this. It ran on my 4-core machine >>>>>>> without any >>>>>>> problems. >>>>>>> Andre, could you give it a try? >>>>>> >>>>>> Did, but unfortunately it crashed as always. Tried twice and made >>>>>> sure >>>>>> I booted the right kernel. Sorry. >>>>>> The idea with the race between the timer and the state changing >>>>>> sounded very appealing, actually that was suspicious to me from the >>>>>> beginning. >>>>>> >>>>>> I will add some code to dump the state of all cpupools to the BUG_ON >>>>>> to see in which situation we are when the bug triggers. >>>>> >>>>> OK, here is a first try of this, the patch iterates over all CPU pools >>>>> and outputs some data if the BUG_ON >>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition >>>>> triggers: >>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: >>>>> fffffffc003f >>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>>>> (XEN) Xen BUG at sched_credit.c:1010 >>>>> .... >>>>> The masks look proper (6 cores per node), the bug triggers when the >>>>> first CPU is about to be(?) inserted. >>>> >>>> Sure? I''m missing the cpu with mask 2000. >>>> I''ll try to reproduce the problem on a larger machine here (24 cores, 4 >>>> numa >>>> nodes). >>>> Andre, can you give me your xen boot parameters? Which xen changeset >>>> are >>>> you >>>> running, and do you have any additional patches in use? >>> >>> The grub lines: >>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 >>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 >>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 >>> >>> All of my experiments are use c/s 22858 as a base. >>> If you use a AMD Magny-Cours box for your experiments (socket C32 or >>> G34), >>> you should add the following patch (removing the line) >>> --- a/xen/arch/x86/traps.c >>> +++ b/xen/arch/x86/traps.c >>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) >>> __clear_bit(X86_FEATURE_SKINIT % 32,&c); >>> __clear_bit(X86_FEATURE_WDT % 32,&c); >>> __clear_bit(X86_FEATURE_LWP % 32,&c); >>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); >>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); >>> break; >>> case 5: /* MONITOR/MWAIT */ >>> >>> This is not necessary (in fact that reverts my patch c/s 22815), but >>> raises >>> the probability to trigger the bug, probably because it increases the >>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, >>> try to >>> create a guest with many VCPUs and squeeze it into a small CPU-pool. >>> >>> Good luck ;-) >>> Andre. >>> >>> -- >>> Andre Przywara >>> AMD-OSRC (Dresden) >>> Tel: x29712 >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel > >-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Feb-16 13:54 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Andre (and Juergen), can you try again with the attached patch? What the patch basically does is try to make "cpu_disable_scheduler()" do what it seems to say it does. :-) Namely, the various scheduler-related interrutps (both per-cpu ticks and the master tick) is a part of the scheduler, so disable them before doing anything, and don''t enable them until the cpu is really ready to go again. To be precise: * cpu_disable_scheduler() disables ticks * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, and does it after inserting the idle vcpu * Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or stop tickers + Call tick_{resume,suspend} in cpu_{up,down}, respectively * Modify credit1''s tick_{suspend,resume} to handle the master ticker as well. With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being on one pcpu), I can perform thousands of operations successfully. (NB this is not ready for application yet, I just wanted to check to see if it fixes Andre''s problem) -George On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote:> Okay, I have some more data. > > I activated cpupool_dprintk() and included checks in sched_credit.c to > test for weight inconsistencies. To reduce race possibilities I''ve added > my patch to execute cpu assigning/unassigning always in a tasklet on the > cpu to be moved. > > Here is the result: > > (XEN) cpupool_unassign_cpu(pool=0,cpu=6) > (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 > (XEN) cpupool_unassign_cpu(pool=0,cpu=6) > (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 > (XEN) cpupool_assign_cpu(pool=0,cpu=1) > (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0 > (XEN) cpupool_assign_cpu(cpu=1) ret 0 > (XEN) cpupool_assign_cpu(pool=1,cpu=4) > (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40 > (XEN) cpupool_assign_cpu(cpu=4) ret 0 > (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0: > (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1 > (XEN) Xen BUG at sched_credit.c:570 > (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 4 > (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f > (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx: 0000000000000000 > (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi: ffff82c4802542e8 > (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8: 0000000000000004 > (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11: 0000000000000001 > (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14: ffff831002ad5e40 > (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff830839dcfde8: > (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000 > (XEN) 0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651 > (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204 > (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e > (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100 > (XEN) ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880 > (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647 > (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20 > (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2 > (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002 > (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50 > (XEN) 0000000000000246 0000000000000032 0000000000000000 00000000ffffffff > (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848 > (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033 > (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000004 > (XEN) ffff830077eee000 00000043b9afd180 0000000000000000 > (XEN) Xen call trace: > (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f > (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c > (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239 > (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99 > (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 4: > (XEN) Xen BUG at sched_credit.c:570 > (XEN) **************************************** > > As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON > triggered in csched_acct() is a logical result of this. > > How this can happen I don''t know yet. > Anyone any idea? I''ll keep searching... > > > Juergen > > On 02/15/11 08:22, Juergen Gross wrote: >> >> On 02/14/11 18:57, George Dunlap wrote: >>> >>> The good news is, I''ve managed to reproduce this on my local test >>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the >>> attached script. It''s time to go home now, but I should be able to >>> dig something up tomorrow. >>> >>> To use the script: >>> * Rename cpupool0 to "p0", and create an empty second pool, "p1" >>> * You can modify elements by adding "arg=val" as arguments. >>> * Arguments are: >>> + dryrun={true,false} Do the work, but don''t actually execute any xl >>> arguments. Default false. >>> + left: Number commands to execute. Default 10. >>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is >>> 8 cpus). >>> + verbose={true,false} Print what you''re doing. Default is true. >>> >>> The script sometimes attempts to remove the last cpu from cpupool0; in >>> this case, libxl will print an error. If the script gets an error >>> under that condition, it will ignore it; under any other condition, it >>> will print diagnostic information. >>> >>> What finally crashed it for me was this command: >>> # ./cpupool-test.sh verbose=false left=1000 >> >> Nice! >> With your script I finally managed to get the error, too. On my box (2 >> sockets >> a 6 cores) I had to use >> >> ./cpupool-test.sh verbose=false left=10000 maxcpus=11 >> >> to trigger it. >> Looking for more data now... >> >> >> Juergen >> >>> >>> -George >>> >>> On Fri, Feb 11, 2011 at 7:39 AM, Andre >>> Przywara<andre.przywara@amd.com> wrote: >>>> >>>> Juergen Gross wrote: >>>>> >>>>> On 02/10/11 15:18, Andre Przywara wrote: >>>>>> >>>>>> Andre Przywara wrote: >>>>>>> >>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>>>>> >>>>>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>>>>> >>>>>>>>> Andre, George, >>>>>>>>> >>>>>>>>> >>>>>>>>> What seems to be interesting: I think the problem did always occur >>>>>>>>> when >>>>>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>>>>> >>>>>>>>> I think my previous assumption regarding the master_ticker was not >>>>>>>>> too bad. >>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>>>>> active >>>>>>>>> before the scheduler is really initialized properly. This could >>>>>>>>> happen, if >>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved >>>>>>>>> and >>>>>>>>> the >>>>>>>>> critical section in schedule_cpu_switch(). >>>>>>>>> >>>>>>>>> The solution should be to activate the timers only if the >>>>>>>>> scheduler is >>>>>>>>> ready for them. >>>>>>>>> >>>>>>>>> George, do you think the master_ticker should be stopped in >>>>>>>>> suspend_ticker >>>>>>>>> as well? I still see potential problems for entering deep C-States. >>>>>>>>> I think >>>>>>>>> I''ll prepare a patch which will keep the master_ticker active >>>>>>>>> for the >>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>>>>> >>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine >>>>>>>> without any >>>>>>>> problems. >>>>>>>> Andre, could you give it a try? >>>>>>> >>>>>>> Did, but unfortunately it crashed as always. Tried twice and made >>>>>>> sure >>>>>>> I booted the right kernel. Sorry. >>>>>>> The idea with the race between the timer and the state changing >>>>>>> sounded very appealing, actually that was suspicious to me from the >>>>>>> beginning. >>>>>>> >>>>>>> I will add some code to dump the state of all cpupools to the BUG_ON >>>>>>> to see in which situation we are when the bug triggers. >>>>>> >>>>>> OK, here is a first try of this, the patch iterates over all CPU pools >>>>>> and outputs some data if the BUG_ON >>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition >>>>>> triggers: >>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: >>>>>> fffffffc003f >>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>>>>> (XEN) Xen BUG at sched_credit.c:1010 >>>>>> .... >>>>>> The masks look proper (6 cores per node), the bug triggers when the >>>>>> first CPU is about to be(?) inserted. >>>>> >>>>> Sure? I''m missing the cpu with mask 2000. >>>>> I''ll try to reproduce the problem on a larger machine here (24 cores, 4 >>>>> numa >>>>> nodes). >>>>> Andre, can you give me your xen boot parameters? Which xen changeset >>>>> are >>>>> you >>>>> running, and do you have any additional patches in use? >>>> >>>> The grub lines: >>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 >>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 >>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 >>>> >>>> All of my experiments are use c/s 22858 as a base. >>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or >>>> G34), >>>> you should add the following patch (removing the line) >>>> --- a/xen/arch/x86/traps.c >>>> +++ b/xen/arch/x86/traps.c >>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) >>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c); >>>> __clear_bit(X86_FEATURE_WDT % 32,&c); >>>> __clear_bit(X86_FEATURE_LWP % 32,&c); >>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); >>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); >>>> break; >>>> case 5: /* MONITOR/MWAIT */ >>>> >>>> This is not necessary (in fact that reverts my patch c/s 22815), but >>>> raises >>>> the probability to trigger the bug, probably because it increases the >>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, >>>> try to >>>> create a guest with many VCPUs and squeeze it into a small CPU-pool. >>>> >>>> Good luck ;-) >>>> Andre. >>>> >>>> -- >>>> Andre Przywara >>>> AMD-OSRC (Dresden) >>>> Tel: x29712 >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>>> >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >> >> > > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 > Fujitsu Technology Solutions e-mail: > juergen.gross@ts.fujitsu.com > Domagkstr. 28 Internet: ts.fujitsu.com > D-80807 Muenchen Company details: > ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-16 14:11 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/16/11 14:54, George Dunlap wrote:> Andre (and Juergen), can you try again with the attached patch? > > What the patch basically does is try to make "cpu_disable_scheduler()" > do what it seems to say it does. :-) Namely, the various > scheduler-related interrutps (both per-cpu ticks and the master tick) > is a part of the scheduler, so disable them before doing anything, and > don''t enable them until the cpu is really ready to go again. > > To be precise: > * cpu_disable_scheduler() disables ticks > * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, > and does it after inserting the idle vcpu > * Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or > stop tickers > + Call tick_{resume,suspend} in cpu_{up,down}, respectivelyI tried this before :-) It didn''t work for Andre, but may be there were some bits missing.> * Modify credit1''s tick_{suspend,resume} to handle the master ticker as well. > > With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being > on one pcpu), I can perform thousands of operations successfully.Nice. I''ll try later. In the moment I''m testing another patch (attached for review, if you like). I think I''ve identified two possible races. Juergen> > (NB this is not ready for application yet, I just wanted to check to > see if it fixes Andre''s problem) > > -George > > On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross > <juergen.gross@ts.fujitsu.com> wrote: >> Okay, I have some more data. >> >> I activated cpupool_dprintk() and included checks in sched_credit.c to >> test for weight inconsistencies. To reduce race possibilities I''ve added >> my patch to execute cpu assigning/unassigning always in a tasklet on the >> cpu to be moved. >> >> Here is the result: >> >> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >> (XEN) cpupool_assign_cpu(pool=0,cpu=1) >> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0 >> (XEN) cpupool_assign_cpu(cpu=1) ret 0 >> (XEN) cpupool_assign_cpu(pool=1,cpu=4) >> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40 >> (XEN) cpupool_assign_cpu(cpu=4) ret 0 >> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0: >> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1 >> (XEN) Xen BUG at sched_credit.c:570 >> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- >> (XEN) CPU: 4 >> (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f >> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor >> (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx: 0000000000000000 >> (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi: ffff82c4802542e8 >> (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8: 0000000000000004 >> (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11: 0000000000000001 >> (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14: ffff831002ad5e40 >> (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4: 00000000000026f0 >> (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98 >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=ffff830839dcfde8: >> (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000 >> (XEN) 0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651 >> (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204 >> (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e >> (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100 >> (XEN) ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880 >> (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647 >> (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20 >> (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2 >> (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002 >> (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50 >> (XEN) 0000000000000246 0000000000000032 0000000000000000 00000000ffffffff >> (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848 >> (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033 >> (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000004 >> (XEN) ffff830077eee000 00000043b9afd180 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f >> (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c >> (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239 >> (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99 >> (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 4: >> (XEN) Xen BUG at sched_credit.c:570 >> (XEN) **************************************** >> >> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON >> triggered in csched_acct() is a logical result of this. >> >> How this can happen I don''t know yet. >> Anyone any idea? I''ll keep searching... >> >> >> Juergen >> >> On 02/15/11 08:22, Juergen Gross wrote: >>> >>> On 02/14/11 18:57, George Dunlap wrote: >>>> >>>> The good news is, I''ve managed to reproduce this on my local test >>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the >>>> attached script. It''s time to go home now, but I should be able to >>>> dig something up tomorrow. >>>> >>>> To use the script: >>>> * Rename cpupool0 to "p0", and create an empty second pool, "p1" >>>> * You can modify elements by adding "arg=val" as arguments. >>>> * Arguments are: >>>> + dryrun={true,false} Do the work, but don''t actually execute any xl >>>> arguments. Default false. >>>> + left: Number commands to execute. Default 10. >>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is >>>> 8 cpus). >>>> + verbose={true,false} Print what you''re doing. Default is true. >>>> >>>> The script sometimes attempts to remove the last cpu from cpupool0; in >>>> this case, libxl will print an error. If the script gets an error >>>> under that condition, it will ignore it; under any other condition, it >>>> will print diagnostic information. >>>> >>>> What finally crashed it for me was this command: >>>> # ./cpupool-test.sh verbose=false left=1000 >>> >>> Nice! >>> With your script I finally managed to get the error, too. On my box (2 >>> sockets >>> a 6 cores) I had to use >>> >>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11 >>> >>> to trigger it. >>> Looking for more data now... >>> >>> >>> Juergen >>> >>>> >>>> -George >>>> >>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre >>>> Przywara<andre.przywara@amd.com> wrote: >>>>> >>>>> Juergen Gross wrote: >>>>>> >>>>>> On 02/10/11 15:18, Andre Przywara wrote: >>>>>>> >>>>>>> Andre Przywara wrote: >>>>>>>> >>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>>>>>> >>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>>>>>> >>>>>>>>>> Andre, George, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What seems to be interesting: I think the problem did always occur >>>>>>>>>> when >>>>>>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>>>>>> >>>>>>>>>> I think my previous assumption regarding the master_ticker was not >>>>>>>>>> too bad. >>>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>>>>>> active >>>>>>>>>> before the scheduler is really initialized properly. This could >>>>>>>>>> happen, if >>>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved >>>>>>>>>> and >>>>>>>>>> the >>>>>>>>>> critical section in schedule_cpu_switch(). >>>>>>>>>> >>>>>>>>>> The solution should be to activate the timers only if the >>>>>>>>>> scheduler is >>>>>>>>>> ready for them. >>>>>>>>>> >>>>>>>>>> George, do you think the master_ticker should be stopped in >>>>>>>>>> suspend_ticker >>>>>>>>>> as well? I still see potential problems for entering deep C-States. >>>>>>>>>> I think >>>>>>>>>> I''ll prepare a patch which will keep the master_ticker active >>>>>>>>>> for the >>>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>>>>>> >>>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine >>>>>>>>> without any >>>>>>>>> problems. >>>>>>>>> Andre, could you give it a try? >>>>>>>> >>>>>>>> Did, but unfortunately it crashed as always. Tried twice and made >>>>>>>> sure >>>>>>>> I booted the right kernel. Sorry. >>>>>>>> The idea with the race between the timer and the state changing >>>>>>>> sounded very appealing, actually that was suspicious to me from the >>>>>>>> beginning. >>>>>>>> >>>>>>>> I will add some code to dump the state of all cpupools to the BUG_ON >>>>>>>> to see in which situation we are when the bug triggers. >>>>>>> >>>>>>> OK, here is a first try of this, the patch iterates over all CPU pools >>>>>>> and outputs some data if the BUG_ON >>>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition >>>>>>> triggers: >>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: >>>>>>> fffffffc003f >>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>>>>>> (XEN) Xen BUG at sched_credit.c:1010 >>>>>>> .... >>>>>>> The masks look proper (6 cores per node), the bug triggers when the >>>>>>> first CPU is about to be(?) inserted. >>>>>> >>>>>> Sure? I''m missing the cpu with mask 2000. >>>>>> I''ll try to reproduce the problem on a larger machine here (24 cores, 4 >>>>>> numa >>>>>> nodes). >>>>>> Andre, can you give me your xen boot parameters? Which xen changeset >>>>>> are >>>>>> you >>>>>> running, and do you have any additional patches in use? >>>>> >>>>> The grub lines: >>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 >>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 >>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 >>>>> >>>>> All of my experiments are use c/s 22858 as a base. >>>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or >>>>> G34), >>>>> you should add the following patch (removing the line) >>>>> --- a/xen/arch/x86/traps.c >>>>> +++ b/xen/arch/x86/traps.c >>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) >>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c); >>>>> __clear_bit(X86_FEATURE_WDT % 32,&c); >>>>> __clear_bit(X86_FEATURE_LWP % 32,&c); >>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); >>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); >>>>> break; >>>>> case 5: /* MONITOR/MWAIT */ >>>>> >>>>> This is not necessary (in fact that reverts my patch c/s 22815), but >>>>> raises >>>>> the probability to trigger the bug, probably because it increases the >>>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, >>>>> try to >>>>> create a guest with many VCPUs and squeeze it into a small CPU-pool. >>>>> >>>>> Good luck ;-) >>>>> Andre. >>>>> >>>>> -- >>>>> Andre Przywara >>>>> AMD-OSRC (Dresden) >>>>> Tel: x29712 >>>>> >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xensource.com >>>>> http://lists.xensource.com/xen-devel >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xensource.com >>>>> http://lists.xensource.com/xen-devel >>> >>> >> >> >> -- >> Juergen Gross Principal Developer Operating Systems >> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 >> Fujitsu Technology Solutions e-mail: >> juergen.gross@ts.fujitsu.com >> Domagkstr. 28 Internet: ts.fujitsu.com >> D-80807 Muenchen Company details: >> ts.fujitsu.com/imprint.html >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-16 14:28 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/16/11 15:11, Juergen Gross wrote:> On 02/16/11 14:54, George Dunlap wrote: >> Andre (and Juergen), can you try again with the attached patch? >> >> What the patch basically does is try to make "cpu_disable_scheduler()" >> do what it seems to say it does. :-) Namely, the various >> scheduler-related interrutps (both per-cpu ticks and the master tick) >> is a part of the scheduler, so disable them before doing anything, and >> don''t enable them until the cpu is really ready to go again. >> >> To be precise: >> * cpu_disable_scheduler() disables ticks >> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, >> and does it after inserting the idle vcpu >> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or >> stop tickers >> + Call tick_{resume,suspend} in cpu_{up,down}, respectively > > I tried this before :-) > It didn''t work for Andre, but may be there were some bits missing. > >> * Modify credit1''s tick_{suspend,resume} to handle the master ticker >> as well. >> >> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being >> on one pcpu), I can perform thousands of operations successfully. > > Nice. I''ll try later. In the moment I''m testing another patch (attached > for review, if you like). I think I''ve identified two possible races.My patch works for me. I think I have to rework the locking for credit1, but that shouldn''t be too hard. My machine survived 10000 iterations of your script with additional consistency checks in the scheduler. Without my patch the machine crashed after less then 500 iterations. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
André Przywara
2011-Feb-17 00:05 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Am 16.02.2011 15:11, schrieb Juergen Gross:> On 02/16/11 14:54, George Dunlap wrote: >> Andre (and Juergen), can you try again with the attached patch?George, Juergen, thanks for all your work on this! I will try the patch as soon as I am back in the office today afternoon. Regards, Andre.>> >> What the patch basically does is try to make "cpu_disable_scheduler()" >> do what it seems to say it does. :-) Namely, the various >> scheduler-related interrutps (both per-cpu ticks and the master tick) >> is a part of the scheduler, so disable them before doing anything, and >> don''t enable them until the cpu is really ready to go again. >> >> To be precise: >> * cpu_disable_scheduler() disables ticks >> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, >> and does it after inserting the idle vcpu >> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or >> stop tickers >> + Call tick_{resume,suspend} in cpu_{up,down}, respectively > > I tried this before :-) > It didn''t work for Andre, but may be there were some bits missing. > >> * Modify credit1''s tick_{suspend,resume} to handle the master ticker as well. >> >> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being >> on one pcpu), I can perform thousands of operations successfully. > > Nice. I''ll try later. In the moment I''m testing another patch (attached > for review, if you like). I think I''ve identified two possible races. > > > Juergen > >> >> (NB this is not ready for application yet, I just wanted to check to >> see if it fixes Andre''s problem) >> >> -George >> >> On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross >> <juergen.gross@ts.fujitsu.com> wrote: >>> Okay, I have some more data. >>> >>> I activated cpupool_dprintk() and included checks in sched_credit.c to >>> test for weight inconsistencies. To reduce race possibilities I''ve added >>> my patch to execute cpu assigning/unassigning always in a tasklet on the >>> cpu to be moved. >>> >>> Here is the result: >>> >>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) >>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0 >>> (XEN) cpupool_assign_cpu(cpu=1) ret 0 >>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) >>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40 >>> (XEN) cpupool_assign_cpu(cpu=4) ret 0 >>> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0: >>> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1 >>> (XEN) Xen BUG at sched_credit.c:570 >>> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- >>> (XEN) CPU: 4 >>> (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f >>> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor >>> (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx: 0000000000000000 >>> (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi: ffff82c4802542e8 >>> (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8: 0000000000000004 >>> (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11: 0000000000000001 >>> (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14: ffff831002ad5e40 >>> (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4: 00000000000026f0 >>> (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98 >>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >>> (XEN) Xen stack trace from rsp=ffff830839dcfde8: >>> (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000 >>> (XEN) 0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651 >>> (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204 >>> (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e >>> (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100 >>> (XEN) ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880 >>> (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647 >>> (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20 >>> (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2 >>> (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002 >>> (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50 >>> (XEN) 0000000000000246 0000000000000032 0000000000000000 00000000ffffffff >>> (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848 >>> (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033 >>> (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000 >>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000004 >>> (XEN) ffff830077eee000 00000043b9afd180 0000000000000000 >>> (XEN) Xen call trace: >>> (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f >>> (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c >>> (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239 >>> (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99 >>> (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a >>> (XEN) >>> (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 4: >>> (XEN) Xen BUG at sched_credit.c:570 >>> (XEN) **************************************** >>> >>> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON >>> triggered in csched_acct() is a logical result of this. >>> >>> How this can happen I don''t know yet. >>> Anyone any idea? I''ll keep searching... >>> >>> >>> Juergen >>> >>> On 02/15/11 08:22, Juergen Gross wrote: >>>> >>>> On 02/14/11 18:57, George Dunlap wrote: >>>>> >>>>> The good news is, I''ve managed to reproduce this on my local test >>>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the >>>>> attached script. It''s time to go home now, but I should be able to >>>>> dig something up tomorrow. >>>>> >>>>> To use the script: >>>>> * Rename cpupool0 to "p0", and create an empty second pool, "p1" >>>>> * You can modify elements by adding "arg=val" as arguments. >>>>> * Arguments are: >>>>> + dryrun={true,false} Do the work, but don''t actually execute any xl >>>>> arguments. Default false. >>>>> + left: Number commands to execute. Default 10. >>>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is >>>>> 8 cpus). >>>>> + verbose={true,false} Print what you''re doing. Default is true. >>>>> >>>>> The script sometimes attempts to remove the last cpu from cpupool0; in >>>>> this case, libxl will print an error. If the script gets an error >>>>> under that condition, it will ignore it; under any other condition, it >>>>> will print diagnostic information. >>>>> >>>>> What finally crashed it for me was this command: >>>>> # ./cpupool-test.sh verbose=false left=1000 >>>> >>>> Nice! >>>> With your script I finally managed to get the error, too. On my box (2 >>>> sockets >>>> a 6 cores) I had to use >>>> >>>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11 >>>> >>>> to trigger it. >>>> Looking for more data now... >>>> >>>> >>>> Juergen >>>> >>>>> >>>>> -George >>>>> >>>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre >>>>> Przywara<andre.przywara@amd.com> wrote: >>>>>> >>>>>> Juergen Gross wrote: >>>>>>> >>>>>>> On 02/10/11 15:18, Andre Przywara wrote: >>>>>>>> >>>>>>>> Andre Przywara wrote: >>>>>>>>> >>>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>>>>>>> >>>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>>>>>>> >>>>>>>>>>> Andre, George, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> What seems to be interesting: I think the problem did always occur >>>>>>>>>>> when >>>>>>>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>>>>>>> >>>>>>>>>>> I think my previous assumption regarding the master_ticker was not >>>>>>>>>>> too bad. >>>>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>>>>>>> active >>>>>>>>>>> before the scheduler is really initialized properly. This could >>>>>>>>>>> happen, if >>>>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved >>>>>>>>>>> and >>>>>>>>>>> the >>>>>>>>>>> critical section in schedule_cpu_switch(). >>>>>>>>>>> >>>>>>>>>>> The solution should be to activate the timers only if the >>>>>>>>>>> scheduler is >>>>>>>>>>> ready for them. >>>>>>>>>>> >>>>>>>>>>> George, do you think the master_ticker should be stopped in >>>>>>>>>>> suspend_ticker >>>>>>>>>>> as well? I still see potential problems for entering deep C-States. >>>>>>>>>>> I think >>>>>>>>>>> I''ll prepare a patch which will keep the master_ticker active >>>>>>>>>>> for the >>>>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>>>>>>> >>>>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine >>>>>>>>>> without any >>>>>>>>>> problems. >>>>>>>>>> Andre, could you give it a try? >>>>>>>>> >>>>>>>>> Did, but unfortunately it crashed as always. Tried twice and made >>>>>>>>> sure >>>>>>>>> I booted the right kernel. Sorry. >>>>>>>>> The idea with the race between the timer and the state changing >>>>>>>>> sounded very appealing, actually that was suspicious to me from the >>>>>>>>> beginning. >>>>>>>>> >>>>>>>>> I will add some code to dump the state of all cpupools to the BUG_ON >>>>>>>>> to see in which situation we are when the bug triggers. >>>>>>>> >>>>>>>> OK, here is a first try of this, the patch iterates over all CPU pools >>>>>>>> and outputs some data if the BUG_ON >>>>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition >>>>>>>> triggers: >>>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: >>>>>>>> fffffffc003f >>>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>>>>>>> (XEN) Xen BUG at sched_credit.c:1010 >>>>>>>> .... >>>>>>>> The masks look proper (6 cores per node), the bug triggers when the >>>>>>>> first CPU is about to be(?) inserted. >>>>>>> >>>>>>> Sure? I''m missing the cpu with mask 2000. >>>>>>> I''ll try to reproduce the problem on a larger machine here (24 cores, 4 >>>>>>> numa >>>>>>> nodes). >>>>>>> Andre, can you give me your xen boot parameters? Which xen changeset >>>>>>> are >>>>>>> you >>>>>>> running, and do you have any additional patches in use? >>>>>> >>>>>> The grub lines: >>>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 >>>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 >>>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 >>>>>> >>>>>> All of my experiments are use c/s 22858 as a base. >>>>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or >>>>>> G34), >>>>>> you should add the following patch (removing the line) >>>>>> --- a/xen/arch/x86/traps.c >>>>>> +++ b/xen/arch/x86/traps.c >>>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) >>>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c); >>>>>> __clear_bit(X86_FEATURE_WDT % 32,&c); >>>>>> __clear_bit(X86_FEATURE_LWP % 32,&c); >>>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); >>>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); >>>>>> break; >>>>>> case 5: /* MONITOR/MWAIT */ >>>>>> >>>>>> This is not necessary (in fact that reverts my patch c/s 22815), but >>>>>> raises >>>>>> the probability to trigger the bug, probably because it increases the >>>>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, >>>>>> try to >>>>>> create a guest with many VCPUs and squeeze it into a small CPU-pool. >>>>>> >>>>>> Good luck ;-) >>>>>> Andre. >>>>>> >>>>>> -- >>>>>> Andre Przywara >>>>>> AMD-OSRC (Dresden) >>>>>> Tel: x29712 >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-devel mailing list >>>>>> Xen-devel@lists.xensource.com >>>>>> http://lists.xensource.com/xen-devel >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-devel mailing list >>>>>> Xen-devel@lists.xensource.com >>>>>> http://lists.xensource.com/xen-devel >>>> >>>> >>> >>> >>> -- >>> Juergen Gross Principal Developer Operating Systems >>> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 >>> Fujitsu Technology Solutions e-mail: >>> juergen.gross@ts.fujitsu.com >>> Domagkstr. 28 Internet: ts.fujitsu.com >>> D-80807 Muenchen Company details: >>> ts.fujitsu.com/imprint.html >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel > > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 > Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com > Domagkstr. 28 Internet: ts.fujitsu.com > D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html-- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-17 07:05 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/16/11 14:54, George Dunlap wrote:> Andre (and Juergen), can you try again with the attached patch? > > What the patch basically does is try to make "cpu_disable_scheduler()" > do what it seems to say it does. :-) Namely, the various > scheduler-related interrutps (both per-cpu ticks and the master tick) > is a part of the scheduler, so disable them before doing anything, and > don''t enable them until the cpu is really ready to go again. > > To be precise: > * cpu_disable_scheduler() disables ticks > * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, > and does it after inserting the idle vcpu > * Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or > stop tickers > + Call tick_{resume,suspend} in cpu_{up,down}, respectively > * Modify credit1''s tick_{suspend,resume} to handle the master ticker as well. > > With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being > on one pcpu), I can perform thousands of operations successfully. > > (NB this is not ready for application yet, I just wanted to check to > see if it fixes Andre''s problem)After some thousand iterations the machine hang and after dumping Dom0 registers to console it continued running and crashed about a second later: (XEN) cpupool_unassign_cpu(pool=0,cpu=9) (XEN) cpupool_unassign_cpu(pool=0,cpu=9) ffff83083fff74c0 (XEN) cpupool_unassign_cpu ret=0 (XEN) cpupool_unassign_cpu(pool=0,cpu=4) (XEN) cpupool_unassign_cpu(pool=0,cpu=4) ffff83083fff74c0 (XEN) cpupool_unassign_cpu ret=0 (XEN) cpupool_assign_cpu(pool=1,cpu=9) (XEN) cpupool_assign_cpu(pool=1,cpu=9) ffff83083002de40 (XEN) Assertion ''timer->status >= TIMER_STATUS_inactive'' failed at timer.c:279 (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- (XEN) CPU: 9 (XEN) RIP: e008:[<ffff82c480126100>] active_timer+0xc/0x37 (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 (XEN) rdx: ffff830839d8ff18 rsi: 0000010dbb628a80 rdi: ffff83083ffbcf98 (XEN) rbp: ffff830839d8fd50 rsp: ffff830839d8fd50 r8: ffff83083ffbcf90 (XEN) r9: ffff82c480213680 r10: 00000000ffffffff r11: 0000000000000010 (XEN) r12: ffff82c4802d3f80 r13: ffff82c4802d3f80 r14: ffff83083ffbcf98 (XEN) r15: ffff83083ffbcfc0 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000007809c000 cr2: 0000000000620048 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff830839d8fd50: (XEN) ffff830839d8fda0 ffff82c480126ef9 0000000000000000 0000010dbb628a80 (XEN) 0000000000000086 0000000000000009 ffff83083002de40 ffff83083002dd50 (XEN) 0000000000000009 0000000000000009 ffff830839d8fdc0 ffff82c480117906 (XEN) ffff83083ffa3b40 ffff83083ffa5d70 ffff830839d8fe30 ffff82c4801214fa (XEN) ffff83083002dd00 0000000900000100 0000000000000286 ffff8300780da000 (XEN) ffff83083ffbcf80 ffff83083ffbcf90 ffff82c480247e00 0000000000000009 (XEN) 00000000fffffff0 ffff83083002dd00 0000000000000000 ffff8300781cc198 (XEN) ffff830839d8fe60 ffff82c4801019ff 0000000000000009 0000000000000009 (XEN) ffff8300781cc198 ffff830839d990d0 ffff830839d8fe80 ffff82c480101bd9 (XEN) ffff83107e80c5b0 ffff8300781cc000 ffff830839d8fea0 ffff82c480104f21 (XEN) 0000000000000009 ffff830839d990e0 ffff830839d8fee0 ffff82c480125b6c (XEN) ffff82c48024a020 ffff830839d8ff18 ffff82c48024a020 ffff830839d8ff18 (XEN) ffff830839d99060 ffff830839d99040 ffff830839d8ff10 ffff82c48015645a (XEN) 0000000000000000 ffff8300780da000 ffff8300780da000 ffffffffffffffff (XEN) ffff830839d8fe00 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 ffffffff8062bda0 ffff880fbb1e5fd8 0000000000000246 (XEN) 0000000000000000 000000010003347d 0000000000000000 0000000000000000 (XEN) ffffffff800033aa 00000000deadbeef 00000000deadbeef 00000000deadbeef (XEN) 0000010000000000 ffffffff800033aa 000000000000e033 0000000000000246 (XEN) ffff880fbb1e5f08 000000000000e02b 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) [<ffff82c480126100>] active_timer+0xc/0x37 (XEN) [<ffff82c480126ef9>] set_timer+0x102/0x218 (XEN) [<ffff82c480117906>] csched_tick_resume+0x53/0x75 (XEN) [<ffff82c4801214fa>] schedule_cpu_switch+0x1f1/0x25c (XEN) [<ffff82c4801019ff>] cpupool_assign_cpu_locked+0x61/0xd6 (XEN) [<ffff82c480101bd9>] cpupool_assign_cpu_helper+0x9f/0xcd (XEN) [<ffff82c480104f21>] continue_hypercall_tasklet_handler+0x51/0xc3 (XEN) [<ffff82c480125b6c>] do_tasklet+0xe1/0x155 (XEN) [<ffff82c48015645a>] idle_loop+0x5f/0x67 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 9: (XEN) Assertion ''timer->status >= TIMER_STATUS_inactive'' failed at timer.c:279 (XEN) **************************************** Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-17 09:11 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/17/11 08:05, Juergen Gross wrote:> On 02/16/11 14:54, George Dunlap wrote: >> Andre (and Juergen), can you try again with the attached patch? >> >> What the patch basically does is try to make "cpu_disable_scheduler()" >> do what it seems to say it does. :-) Namely, the various >> scheduler-related interrutps (both per-cpu ticks and the master tick) >> is a part of the scheduler, so disable them before doing anything, and >> don''t enable them until the cpu is really ready to go again. >> >> To be precise: >> * cpu_disable_scheduler() disables ticks >> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, >> and does it after inserting the idle vcpu >> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or >> stop tickers >> + Call tick_{resume,suspend} in cpu_{up,down}, respectively >> * Modify credit1''s tick_{suspend,resume} to handle the master ticker >> as well. >> >> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being >> on one pcpu), I can perform thousands of operations successfully. >> >> (NB this is not ready for application yet, I just wanted to check to >> see if it fixes Andre''s problem)Tried again, this time with the following patch: diff -r 72470de157ce xen/common/sched_credit.c --- a/xen/common/sched_credit.c Wed Feb 16 09:49:33 2011 +0000 +++ b/xen/common/sched_credit.c Wed Feb 16 15:09:54 2011 +0100 @@ -1268,7 +1268,8 @@ csched_load_balance(struct csched_privat /* * Any work over there to steal? */ - speer = csched_runq_steal(peer_cpu, cpu, snext->pri); + speer = cpu_isset(peer_cpu, *online) ? + csched_runq_steal(peer_cpu, cpu, snext->pri) : NULL; pcpu_schedule_unlock(peer_cpu); if ( speer != NULL ) { Worked without any flaw for 30000 iterations. Juergen> > After some thousand iterations the machine hang and after dumping Dom0 > registers to console it continued running and crashed about a second later: > > (XEN) cpupool_unassign_cpu(pool=0,cpu=9) > (XEN) cpupool_unassign_cpu(pool=0,cpu=9) ffff83083fff74c0 > (XEN) cpupool_unassign_cpu ret=0 > (XEN) cpupool_unassign_cpu(pool=0,cpu=4) > (XEN) cpupool_unassign_cpu(pool=0,cpu=4) ffff83083fff74c0 > (XEN) cpupool_unassign_cpu ret=0 > (XEN) cpupool_assign_cpu(pool=1,cpu=9) > (XEN) cpupool_assign_cpu(pool=1,cpu=9) ffff83083002de40 > (XEN) Assertion ''timer->status >= TIMER_STATUS_inactive'' failed at > timer.c:279 > (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- > (XEN) CPU: 9 > (XEN) RIP: e008:[<ffff82c480126100>] active_timer+0xc/0x37 > (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 > (XEN) rdx: ffff830839d8ff18 rsi: 0000010dbb628a80 rdi: ffff83083ffbcf98 > (XEN) rbp: ffff830839d8fd50 rsp: ffff830839d8fd50 r8: ffff83083ffbcf90 > (XEN) r9: ffff82c480213680 r10: 00000000ffffffff r11: 0000000000000010 > (XEN) r12: ffff82c4802d3f80 r13: ffff82c4802d3f80 r14: ffff83083ffbcf98 > (XEN) r15: ffff83083ffbcfc0 cr0: 000000008005003b cr4: 00000000000026f0 > (XEN) cr3: 000000007809c000 cr2: 0000000000620048 > (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff830839d8fd50: > (XEN) ffff830839d8fda0 ffff82c480126ef9 0000000000000000 0000010dbb628a80 > (XEN) 0000000000000086 0000000000000009 ffff83083002de40 ffff83083002dd50 > (XEN) 0000000000000009 0000000000000009 ffff830839d8fdc0 ffff82c480117906 > (XEN) ffff83083ffa3b40 ffff83083ffa5d70 ffff830839d8fe30 ffff82c4801214fa > (XEN) ffff83083002dd00 0000000900000100 0000000000000286 ffff8300780da000 > (XEN) ffff83083ffbcf80 ffff83083ffbcf90 ffff82c480247e00 0000000000000009 > (XEN) 00000000fffffff0 ffff83083002dd00 0000000000000000 ffff8300781cc198 > (XEN) ffff830839d8fe60 ffff82c4801019ff 0000000000000009 0000000000000009 > (XEN) ffff8300781cc198 ffff830839d990d0 ffff830839d8fe80 ffff82c480101bd9 > (XEN) ffff83107e80c5b0 ffff8300781cc000 ffff830839d8fea0 ffff82c480104f21 > (XEN) 0000000000000009 ffff830839d990e0 ffff830839d8fee0 ffff82c480125b6c > (XEN) ffff82c48024a020 ffff830839d8ff18 ffff82c48024a020 ffff830839d8ff18 > (XEN) ffff830839d99060 ffff830839d99040 ffff830839d8ff10 ffff82c48015645a > (XEN) 0000000000000000 ffff8300780da000 ffff8300780da000 ffffffffffffffff > (XEN) ffff830839d8fe00 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffffffff8062bda0 ffff880fbb1e5fd8 0000000000000246 > (XEN) 0000000000000000 000000010003347d 0000000000000000 0000000000000000 > (XEN) ffffffff800033aa 00000000deadbeef 00000000deadbeef 00000000deadbeef > (XEN) 0000010000000000 ffffffff800033aa 000000000000e033 0000000000000246 > (XEN) ffff880fbb1e5f08 000000000000e02b 0000000000000000 0000000000000000 > (XEN) Xen call trace: > (XEN) [<ffff82c480126100>] active_timer+0xc/0x37 > (XEN) [<ffff82c480126ef9>] set_timer+0x102/0x218 > (XEN) [<ffff82c480117906>] csched_tick_resume+0x53/0x75 > (XEN) [<ffff82c4801214fa>] schedule_cpu_switch+0x1f1/0x25c > (XEN) [<ffff82c4801019ff>] cpupool_assign_cpu_locked+0x61/0xd6 > (XEN) [<ffff82c480101bd9>] cpupool_assign_cpu_helper+0x9f/0xcd > (XEN) [<ffff82c480104f21>] continue_hypercall_tasklet_handler+0x51/0xc3 > (XEN) [<ffff82c480125b6c>] do_tasklet+0xe1/0x155 > (XEN) [<ffff82c48015645a>] idle_loop+0x5f/0x67 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 9: > (XEN) Assertion ''timer->status >= TIMER_STATUS_inactive'' failed at > timer.c:279 > (XEN) **************************************** > > > Juergen >-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-21 10:00 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
George Dunlap wrote:> Andre (and Juergen), can you try again with the attached patch?I applied this patch on top of 22931 and it did _not_ work. The crash occurred almost immediately after I started my script, so the same behaviour as without the patch. (attached my script for reference, though it will most likely only make sense on bigger NUMA machines) Regards, Andre.> What the patch basically does is try to make "cpu_disable_scheduler()" > do what it seems to say it does. :-) Namely, the various > scheduler-related interrutps (both per-cpu ticks and the master tick) > is a part of the scheduler, so disable them before doing anything, and > don''t enable them until the cpu is really ready to go again. > > To be precise: > * cpu_disable_scheduler() disables ticks > * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, > and does it after inserting the idle vcpu > * Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or > stop tickers > + Call tick_{resume,suspend} in cpu_{up,down}, respectively > * Modify credit1''s tick_{suspend,resume} to handle the master ticker as well. > > With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being > on one pcpu), I can perform thousands of operations successfully. > > (NB this is not ready for application yet, I just wanted to check to > see if it fixes Andre''s problem) > > -George > > On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross > <juergen.gross@ts.fujitsu.com> wrote: >> Okay, I have some more data. >> >> I activated cpupool_dprintk() and included checks in sched_credit.c to >> test for weight inconsistencies. To reduce race possibilities I''ve added >> my patch to execute cpu assigning/unassigning always in a tasklet on the >> cpu to be moved. >> >> Here is the result: >> >> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >> (XEN) cpupool_assign_cpu(pool=0,cpu=1) >> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0 >> (XEN) cpupool_assign_cpu(cpu=1) ret 0 >> (XEN) cpupool_assign_cpu(pool=1,cpu=4) >> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40 >> (XEN) cpupool_assign_cpu(cpu=4) ret 0 >> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0: >> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1 >> (XEN) Xen BUG at sched_credit.c:570 >> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- >> (XEN) CPU: 4 >> (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f >> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor >> (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx: 0000000000000000 >> (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi: ffff82c4802542e8 >> (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8: 0000000000000004 >> (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11: 0000000000000001 >> (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14: ffff831002ad5e40 >> (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4: 00000000000026f0 >> (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98 >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >> (XEN) Xen stack trace from rsp=ffff830839dcfde8: >> (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000 >> (XEN) 0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651 >> (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204 >> (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e >> (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100 >> (XEN) ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880 >> (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647 >> (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20 >> (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2 >> (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002 >> (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50 >> (XEN) 0000000000000246 0000000000000032 0000000000000000 00000000ffffffff >> (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848 >> (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033 >> (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000004 >> (XEN) ffff830077eee000 00000043b9afd180 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f >> (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c >> (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239 >> (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99 >> (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 4: >> (XEN) Xen BUG at sched_credit.c:570 >> (XEN) **************************************** >> >> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON >> triggered in csched_acct() is a logical result of this. >> >> How this can happen I don''t know yet. >> Anyone any idea? I''ll keep searching... >> >> >> Juergen >> >> On 02/15/11 08:22, Juergen Gross wrote: >>> On 02/14/11 18:57, George Dunlap wrote: >>>> The good news is, I''ve managed to reproduce this on my local test >>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the >>>> attached script. It''s time to go home now, but I should be able to >>>> dig something up tomorrow. >>>> >>>> To use the script: >>>> * Rename cpupool0 to "p0", and create an empty second pool, "p1" >>>> * You can modify elements by adding "arg=val" as arguments. >>>> * Arguments are: >>>> + dryrun={true,false} Do the work, but don''t actually execute any xl >>>> arguments. Default false. >>>> + left: Number commands to execute. Default 10. >>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is >>>> 8 cpus). >>>> + verbose={true,false} Print what you''re doing. Default is true. >>>> >>>> The script sometimes attempts to remove the last cpu from cpupool0; in >>>> this case, libxl will print an error. If the script gets an error >>>> under that condition, it will ignore it; under any other condition, it >>>> will print diagnostic information. >>>> >>>> What finally crashed it for me was this command: >>>> # ./cpupool-test.sh verbose=false left=1000 >>> Nice! >>> With your script I finally managed to get the error, too. On my box (2 >>> sockets >>> a 6 cores) I had to use >>> >>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11 >>> >>> to trigger it. >>> Looking for more data now... >>> >>> >>> Juergen >>> >>>> -George >>>> >>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre >>>> Przywara<andre.przywara@amd.com> wrote: >>>>> Juergen Gross wrote: >>>>>> On 02/10/11 15:18, Andre Przywara wrote: >>>>>>> Andre Przywara wrote: >>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>>>>>> Andre, George, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What seems to be interesting: I think the problem did always occur >>>>>>>>>> when >>>>>>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>>>>>> >>>>>>>>>> I think my previous assumption regarding the master_ticker was not >>>>>>>>>> too bad. >>>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>>>>>> active >>>>>>>>>> before the scheduler is really initialized properly. This could >>>>>>>>>> happen, if >>>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved >>>>>>>>>> and >>>>>>>>>> the >>>>>>>>>> critical section in schedule_cpu_switch(). >>>>>>>>>> >>>>>>>>>> The solution should be to activate the timers only if the >>>>>>>>>> scheduler is >>>>>>>>>> ready for them. >>>>>>>>>> >>>>>>>>>> George, do you think the master_ticker should be stopped in >>>>>>>>>> suspend_ticker >>>>>>>>>> as well? I still see potential problems for entering deep C-States. >>>>>>>>>> I think >>>>>>>>>> I''ll prepare a patch which will keep the master_ticker active >>>>>>>>>> for the >>>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine >>>>>>>>> without any >>>>>>>>> problems. >>>>>>>>> Andre, could you give it a try? >>>>>>>> Did, but unfortunately it crashed as always. Tried twice and made >>>>>>>> sure >>>>>>>> I booted the right kernel. Sorry. >>>>>>>> The idea with the race between the timer and the state changing >>>>>>>> sounded very appealing, actually that was suspicious to me from the >>>>>>>> beginning. >>>>>>>> >>>>>>>> I will add some code to dump the state of all cpupools to the BUG_ON >>>>>>>> to see in which situation we are when the bug triggers. >>>>>>> OK, here is a first try of this, the patch iterates over all CPU pools >>>>>>> and outputs some data if the BUG_ON >>>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition >>>>>>> triggers: >>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: >>>>>>> fffffffc003f >>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>>>>>> (XEN) Xen BUG at sched_credit.c:1010 >>>>>>> .... >>>>>>> The masks look proper (6 cores per node), the bug triggers when the >>>>>>> first CPU is about to be(?) inserted. >>>>>> Sure? I''m missing the cpu with mask 2000. >>>>>> I''ll try to reproduce the problem on a larger machine here (24 cores, 4 >>>>>> numa >>>>>> nodes). >>>>>> Andre, can you give me your xen boot parameters? Which xen changeset >>>>>> are >>>>>> you >>>>>> running, and do you have any additional patches in use? >>>>> The grub lines: >>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 >>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 >>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 >>>>> >>>>> All of my experiments are use c/s 22858 as a base. >>>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or >>>>> G34), >>>>> you should add the following patch (removing the line) >>>>> --- a/xen/arch/x86/traps.c >>>>> +++ b/xen/arch/x86/traps.c >>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) >>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c); >>>>> __clear_bit(X86_FEATURE_WDT % 32,&c); >>>>> __clear_bit(X86_FEATURE_LWP % 32,&c); >>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); >>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); >>>>> break; >>>>> case 5: /* MONITOR/MWAIT */ >>>>> >>>>> This is not necessary (in fact that reverts my patch c/s 22815), but >>>>> raises >>>>> the probability to trigger the bug, probably because it increases the >>>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, >>>>> try to >>>>> create a guest with many VCPUs and squeeze it into a small CPU-pool. >>>>> >>>>> Good luck ;-) >>>>> Andre. >>>>> >>>>> -- >>>>> Andre Przywara >>>>> AMD-OSRC (Dresden) >>>>> Tel: x29712 >>>>> >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xensource.com >>>>> http://lists.xensource.com/xen-devel >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xensource.com >>>>> http://lists.xensource.com/xen-devel >>> >> >> -- >> Juergen Gross Principal Developer Operating Systems >> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 >> Fujitsu Technology Solutions e-mail: >> juergen.gross@ts.fujitsu.com >> Domagkstr. 28 Internet: ts.fujitsu.com >> D-80807 Muenchen Company details: >> ts.fujitsu.com/imprint.html >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >>-- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-21 13:19 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/21/11 11:00, Andre Przywara wrote:> George Dunlap wrote: >> Andre (and Juergen), can you try again with the attached patch? > > I applied this patch on top of 22931 and it did _not_ work. > The crash occurred almost immediately after I started my script, so the > same behaviour as without the patch.Did you try my patch addressing races in the scheduler when moving cpus between cpupools? I''ve attached it again. For me it works quite well, while George''s patch seems not to be enough (machine hanging after some tests with cpupools). OTOH I can''t reproduce an error as fast as you even without any patch :-)> (attached my script for reference, though it will most likely only make > sense on bigger NUMA machines)Yeah, on my 2-node system I need several hundred tries to get an error. But it seems to be more effective than George''s script. Juergen> > Regards, > Andre. > > >> What the patch basically does is try to make "cpu_disable_scheduler()" >> do what it seems to say it does. :-) Namely, the various >> scheduler-related interrutps (both per-cpu ticks and the master tick) >> is a part of the scheduler, so disable them before doing anything, and >> don''t enable them until the cpu is really ready to go again. >> >> To be precise: >> * cpu_disable_scheduler() disables ticks >> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, >> and does it after inserting the idle vcpu >> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or >> stop tickers >> + Call tick_{resume,suspend} in cpu_{up,down}, respectively >> * Modify credit1''s tick_{suspend,resume} to handle the master ticker >> as well. >> >> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being >> on one pcpu), I can perform thousands of operations successfully. >> >> (NB this is not ready for application yet, I just wanted to check to >> see if it fixes Andre''s problem) >> >> -George >> >> On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross >> <juergen.gross@ts.fujitsu.com> wrote: >>> Okay, I have some more data. >>> >>> I activated cpupool_dprintk() and included checks in sched_credit.c to >>> test for weight inconsistencies. To reduce race possibilities I''ve added >>> my patch to execute cpu assigning/unassigning always in a tasklet on the >>> cpu to be moved. >>> >>> Here is the result: >>> >>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) >>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0 >>> (XEN) cpupool_assign_cpu(cpu=1) ret 0 >>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) >>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40 >>> (XEN) cpupool_assign_cpu(cpu=4) ret 0 >>> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0: >>> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1 >>> (XEN) Xen BUG at sched_credit.c:570 >>> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- >>> (XEN) CPU: 4 >>> (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f >>> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor >>> (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx: 0000000000000000 >>> (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi: ffff82c4802542e8 >>> (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8: 0000000000000004 >>> (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11: 0000000000000001 >>> (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14: ffff831002ad5e40 >>> (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4: 00000000000026f0 >>> (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98 >>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >>> (XEN) Xen stack trace from rsp=ffff830839dcfde8: >>> (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 >>> ffff830839d6c000 >>> (XEN) 0000000000000000 ffff830839dd1100 0000000000000004 >>> ffff82c480119651 >>> (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68 >>> ffff82c480126204 >>> (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 >>> 000000cae439ea7e >>> (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 >>> ffff830839dd1100 >>> (XEN) ffff831002b28010 0000000000000004 0000000000000004 >>> ffff82c4802b0880 >>> (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 >>> ffff82c480123647 >>> (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 >>> 00007fc5e9fa5b20 >>> (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08 >>> ffff82c4801236c2 >>> (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 >>> 0000000000000002 >>> (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 >>> 00007fff46826f50 >>> (XEN) 0000000000000246 0000000000000032 0000000000000000 >>> 00000000ffffffff >>> (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003 >>> 0000000000004848 >>> (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0 >>> 000000000000e033 >>> (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b >>> 0000000000000000 >>> (XEN) 0000000000000000 0000000000000000 0000000000000000 >>> 0000000000000004 >>> (XEN) ffff830077eee000 00000043b9afd180 0000000000000000 >>> (XEN) Xen call trace: >>> (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f >>> (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c >>> (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239 >>> (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99 >>> (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a >>> (XEN) >>> (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 4: >>> (XEN) Xen BUG at sched_credit.c:570 >>> (XEN) **************************************** >>> >>> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The >>> BUG_ON >>> triggered in csched_acct() is a logical result of this. >>> >>> How this can happen I don''t know yet. >>> Anyone any idea? I''ll keep searching... >>> >>> >>> Juergen >>> >>> On 02/15/11 08:22, Juergen Gross wrote: >>>> On 02/14/11 18:57, George Dunlap wrote: >>>>> The good news is, I''ve managed to reproduce this on my local test >>>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the >>>>> attached script. It''s time to go home now, but I should be able to >>>>> dig something up tomorrow. >>>>> >>>>> To use the script: >>>>> * Rename cpupool0 to "p0", and create an empty second pool, "p1" >>>>> * You can modify elements by adding "arg=val" as arguments. >>>>> * Arguments are: >>>>> + dryrun={true,false} Do the work, but don''t actually execute any xl >>>>> arguments. Default false. >>>>> + left: Number commands to execute. Default 10. >>>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is >>>>> 8 cpus). >>>>> + verbose={true,false} Print what you''re doing. Default is true. >>>>> >>>>> The script sometimes attempts to remove the last cpu from cpupool0; in >>>>> this case, libxl will print an error. If the script gets an error >>>>> under that condition, it will ignore it; under any other condition, it >>>>> will print diagnostic information. >>>>> >>>>> What finally crashed it for me was this command: >>>>> # ./cpupool-test.sh verbose=false left=1000 >>>> Nice! >>>> With your script I finally managed to get the error, too. On my box (2 >>>> sockets >>>> a 6 cores) I had to use >>>> >>>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11 >>>> >>>> to trigger it. >>>> Looking for more data now... >>>> >>>> >>>> Juergen >>>> >>>>> -George >>>>> >>>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre >>>>> Przywara<andre.przywara@amd.com> wrote: >>>>>> Juergen Gross wrote: >>>>>>> On 02/10/11 15:18, Andre Przywara wrote: >>>>>>>> Andre Przywara wrote: >>>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>>>>>>> Andre, George, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> What seems to be interesting: I think the problem did always >>>>>>>>>>> occur >>>>>>>>>>> when >>>>>>>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>>>>>>> >>>>>>>>>>> I think my previous assumption regarding the master_ticker >>>>>>>>>>> was not >>>>>>>>>>> too bad. >>>>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>>>>>>> active >>>>>>>>>>> before the scheduler is really initialized properly. This could >>>>>>>>>>> happen, if >>>>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved >>>>>>>>>>> and >>>>>>>>>>> the >>>>>>>>>>> critical section in schedule_cpu_switch(). >>>>>>>>>>> >>>>>>>>>>> The solution should be to activate the timers only if the >>>>>>>>>>> scheduler is >>>>>>>>>>> ready for them. >>>>>>>>>>> >>>>>>>>>>> George, do you think the master_ticker should be stopped in >>>>>>>>>>> suspend_ticker >>>>>>>>>>> as well? I still see potential problems for entering deep >>>>>>>>>>> C-States. >>>>>>>>>>> I think >>>>>>>>>>> I''ll prepare a patch which will keep the master_ticker active >>>>>>>>>>> for the >>>>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine >>>>>>>>>> without any >>>>>>>>>> problems. >>>>>>>>>> Andre, could you give it a try? >>>>>>>>> Did, but unfortunately it crashed as always. Tried twice and made >>>>>>>>> sure >>>>>>>>> I booted the right kernel. Sorry. >>>>>>>>> The idea with the race between the timer and the state changing >>>>>>>>> sounded very appealing, actually that was suspicious to me from >>>>>>>>> the >>>>>>>>> beginning. >>>>>>>>> >>>>>>>>> I will add some code to dump the state of all cpupools to the >>>>>>>>> BUG_ON >>>>>>>>> to see in which situation we are when the bug triggers. >>>>>>>> OK, here is a first try of this, the patch iterates over all CPU >>>>>>>> pools >>>>>>>> and outputs some data if the BUG_ON >>>>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition >>>>>>>> triggers: >>>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: >>>>>>>> fffffffc003f >>>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>>>>>>> (XEN) Xen BUG at sched_credit.c:1010 >>>>>>>> .... >>>>>>>> The masks look proper (6 cores per node), the bug triggers when the >>>>>>>> first CPU is about to be(?) inserted. >>>>>>> Sure? I''m missing the cpu with mask 2000. >>>>>>> I''ll try to reproduce the problem on a larger machine here (24 >>>>>>> cores, 4 >>>>>>> numa >>>>>>> nodes). >>>>>>> Andre, can you give me your xen boot parameters? Which xen changeset >>>>>>> are >>>>>>> you >>>>>>> running, and do you have any additional patches in use? >>>>>> The grub lines: >>>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga >>>>>> com1=115200 >>>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 >>>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 >>>>>> >>>>>> All of my experiments are use c/s 22858 as a base. >>>>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or >>>>>> G34), >>>>>> you should add the following patch (removing the line) >>>>>> --- a/xen/arch/x86/traps.c >>>>>> +++ b/xen/arch/x86/traps.c >>>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) >>>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c); >>>>>> __clear_bit(X86_FEATURE_WDT % 32,&c); >>>>>> __clear_bit(X86_FEATURE_LWP % 32,&c); >>>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); >>>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); >>>>>> break; >>>>>> case 5: /* MONITOR/MWAIT */ >>>>>> >>>>>> This is not necessary (in fact that reverts my patch c/s 22815), but >>>>>> raises >>>>>> the probability to trigger the bug, probably because it increases the >>>>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, >>>>>> try to >>>>>> create a guest with many VCPUs and squeeze it into a small CPU-pool. >>>>>> >>>>>> Good luck ;-) >>>>>> Andre. >>>>>> >>>>>> -- >>>>>> Andre Przywara >>>>>> AMD-OSRC (Dresden) >>>>>> Tel: x29712 >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-devel mailing list >>>>>> Xen-devel@lists.xensource.com >>>>>> http://lists.xensource.com/xen-devel >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-devel mailing list >>>>>> Xen-devel@lists.xensource.com >>>>>> http://lists.xensource.com/xen-devel >>>> >>> >>> -- >>> Juergen Gross Principal Developer Operating Systems >>> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 >>> Fujitsu Technology Solutions e-mail: >>> juergen.gross@ts.fujitsu.com >>> Domagkstr. 28 Internet: ts.fujitsu.com >>> D-80807 Muenchen Company details: >>> ts.fujitsu.com/imprint.html >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andre Przywara
2011-Feb-21 14:45 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Juergen Gross wrote:> On 02/21/11 11:00, Andre Przywara wrote: >> George Dunlap wrote: >>> Andre (and Juergen), can you try again with the attached patch? >> I applied this patch on top of 22931 and it did _not_ work. >> The crash occurred almost immediately after I started my script, so the >> same behaviour as without the patch. > > Did you try my patch addressing races in the scheduler when moving cpus > between cpupools?Sorry, I tried yours first, but it didn''t apply cleanly on my particular tree (sched_jg_fix ;-). So I tested George''s first.> I''ve attached it again. For me it works quite well, while George''s patch > seems not to be enough (machine hanging after some tests with cpupools).OK, it now applied after a rebase. And yes, I didn''t see a crash! At least until the script stopped while at lot of these messages appeared: (XEN) do_IRQ: 0.89 No irq handler for vector (irq -1) That is what I reported before and is most probably totally unrelated to this issue. So I consider this fix working! I will try to match my recent theories and debug results with your patch to see whether this fits.> OTOH I can''t reproduce an error as fast as you even without any patch :-) > >> (attached my script for reference, though it will most likely only make >> sense on bigger NUMA machines) > > Yeah, on my 2-node system I need several hundred tries to get an error. > But it seems to be more effective than George''s script.I consider the large over-provisioning the reason. With Dom0 having 48 VCPUs finally squashed together to 6 pCPUs, my script triggered at the second run the latest. With your patch it made 24 iterations before the other bug kicked in. Thanks very much! Andre.> > > Juergen > >> Regards, >> Andre. >> >> >>> What the patch basically does is try to make "cpu_disable_scheduler()" >>> do what it seems to say it does. :-) Namely, the various >>> scheduler-related interrutps (both per-cpu ticks and the master tick) >>> is a part of the scheduler, so disable them before doing anything, and >>> don''t enable them until the cpu is really ready to go again. >>> >>> To be precise: >>> * cpu_disable_scheduler() disables ticks >>> * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, >>> and does it after inserting the idle vcpu >>> * Modify semantics, s.t., {alloc,free}_pdata() don''t actually start or >>> stop tickers >>> + Call tick_{resume,suspend} in cpu_{up,down}, respectively >>> * Modify credit1''s tick_{suspend,resume} to handle the master ticker >>> as well. >>> >>> With this patch (if dom0 doesn''t get wedged due to all 8 vcpus being >>> on one pcpu), I can perform thousands of operations successfully. >>> >>> (NB this is not ready for application yet, I just wanted to check to >>> see if it fixes Andre''s problem) >>> >>> -George >>> >>> On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross >>> <juergen.gross@ts.fujitsu.com> wrote: >>>> Okay, I have some more data. >>>> >>>> I activated cpupool_dprintk() and included checks in sched_credit.c to >>>> test for weight inconsistencies. To reduce race possibilities I''ve added >>>> my patch to execute cpu assigning/unassigning always in a tasklet on the >>>> cpu to be moved. >>>> >>>> Here is the result: >>>> >>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) >>>> (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 >>>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) >>>> (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0 >>>> (XEN) cpupool_assign_cpu(cpu=1) ret 0 >>>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) >>>> (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40 >>>> (XEN) cpupool_assign_cpu(cpu=4) ret 0 >>>> (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0: >>>> (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1 >>>> (XEN) Xen BUG at sched_credit.c:570 >>>> (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- >>>> (XEN) CPU: 4 >>>> (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f >>>> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor >>>> (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx: 0000000000000000 >>>> (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi: ffff82c4802542e8 >>>> (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8: 0000000000000004 >>>> (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11: 0000000000000001 >>>> (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14: ffff831002ad5e40 >>>> (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4: 00000000000026f0 >>>> (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98 >>>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >>>> (XEN) Xen stack trace from rsp=ffff830839dcfde8: >>>> (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 >>>> ffff830839d6c000 >>>> (XEN) 0000000000000000 ffff830839dd1100 0000000000000004 >>>> ffff82c480119651 >>>> (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68 >>>> ffff82c480126204 >>>> (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 >>>> 000000cae439ea7e >>>> (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 >>>> ffff830839dd1100 >>>> (XEN) ffff831002b28010 0000000000000004 0000000000000004 >>>> ffff82c4802b0880 >>>> (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 >>>> ffff82c480123647 >>>> (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 >>>> 00007fc5e9fa5b20 >>>> (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08 >>>> ffff82c4801236c2 >>>> (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 >>>> 0000000000000002 >>>> (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 >>>> 00007fff46826f50 >>>> (XEN) 0000000000000246 0000000000000032 0000000000000000 >>>> 00000000ffffffff >>>> (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003 >>>> 0000000000004848 >>>> (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0 >>>> 000000000000e033 >>>> (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b >>>> 0000000000000000 >>>> (XEN) 0000000000000000 0000000000000000 0000000000000000 >>>> 0000000000000004 >>>> (XEN) ffff830077eee000 00000043b9afd180 0000000000000000 >>>> (XEN) Xen call trace: >>>> (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f >>>> (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c >>>> (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239 >>>> (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99 >>>> (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a >>>> (XEN) >>>> (XEN) >>>> (XEN) **************************************** >>>> (XEN) Panic on CPU 4: >>>> (XEN) Xen BUG at sched_credit.c:570 >>>> (XEN) **************************************** >>>> >>>> As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The >>>> BUG_ON >>>> triggered in csched_acct() is a logical result of this. >>>> >>>> How this can happen I don''t know yet. >>>> Anyone any idea? I''ll keep searching... >>>> >>>> >>>> Juergen >>>> >>>> On 02/15/11 08:22, Juergen Gross wrote: >>>>> On 02/14/11 18:57, George Dunlap wrote: >>>>>> The good news is, I''ve managed to reproduce this on my local test >>>>>> hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the >>>>>> attached script. It''s time to go home now, but I should be able to >>>>>> dig something up tomorrow. >>>>>> >>>>>> To use the script: >>>>>> * Rename cpupool0 to "p0", and create an empty second pool, "p1" >>>>>> * You can modify elements by adding "arg=val" as arguments. >>>>>> * Arguments are: >>>>>> + dryrun={true,false} Do the work, but don''t actually execute any xl >>>>>> arguments. Default false. >>>>>> + left: Number commands to execute. Default 10. >>>>>> + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is >>>>>> 8 cpus). >>>>>> + verbose={true,false} Print what you''re doing. Default is true. >>>>>> >>>>>> The script sometimes attempts to remove the last cpu from cpupool0; in >>>>>> this case, libxl will print an error. If the script gets an error >>>>>> under that condition, it will ignore it; under any other condition, it >>>>>> will print diagnostic information. >>>>>> >>>>>> What finally crashed it for me was this command: >>>>>> # ./cpupool-test.sh verbose=false left=1000 >>>>> Nice! >>>>> With your script I finally managed to get the error, too. On my box (2 >>>>> sockets >>>>> a 6 cores) I had to use >>>>> >>>>> ./cpupool-test.sh verbose=false left=10000 maxcpus=11 >>>>> >>>>> to trigger it. >>>>> Looking for more data now... >>>>> >>>>> >>>>> Juergen >>>>> >>>>>> -George >>>>>> >>>>>> On Fri, Feb 11, 2011 at 7:39 AM, Andre >>>>>> Przywara<andre.przywara@amd.com> wrote: >>>>>>> Juergen Gross wrote: >>>>>>>> On 02/10/11 15:18, Andre Przywara wrote: >>>>>>>>> Andre Przywara wrote: >>>>>>>>>> On 02/10/2011 07:42 AM, Juergen Gross wrote: >>>>>>>>>>> On 02/09/11 15:21, Juergen Gross wrote: >>>>>>>>>>>> Andre, George, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> What seems to be interesting: I think the problem did always >>>>>>>>>>>> occur >>>>>>>>>>>> when >>>>>>>>>>>> a new cpupool was created and the first cpu was moved to it. >>>>>>>>>>>> >>>>>>>>>>>> I think my previous assumption regarding the master_ticker >>>>>>>>>>>> was not >>>>>>>>>>>> too bad. >>>>>>>>>>>> I think somehow the master_ticker of the new cpupool is becoming >>>>>>>>>>>> active >>>>>>>>>>>> before the scheduler is really initialized properly. This could >>>>>>>>>>>> happen, if >>>>>>>>>>>> enough time is spent between alloc_pdata for the cpu to be moved >>>>>>>>>>>> and >>>>>>>>>>>> the >>>>>>>>>>>> critical section in schedule_cpu_switch(). >>>>>>>>>>>> >>>>>>>>>>>> The solution should be to activate the timers only if the >>>>>>>>>>>> scheduler is >>>>>>>>>>>> ready for them. >>>>>>>>>>>> >>>>>>>>>>>> George, do you think the master_ticker should be stopped in >>>>>>>>>>>> suspend_ticker >>>>>>>>>>>> as well? I still see potential problems for entering deep >>>>>>>>>>>> C-States. >>>>>>>>>>>> I think >>>>>>>>>>>> I''ll prepare a patch which will keep the master_ticker active >>>>>>>>>>>> for the >>>>>>>>>>>> C-State case and migrate it for the schedule_cpu_switch() case. >>>>>>>>>>> Okay, here is a patch for this. It ran on my 4-core machine >>>>>>>>>>> without any >>>>>>>>>>> problems. >>>>>>>>>>> Andre, could you give it a try? >>>>>>>>>> Did, but unfortunately it crashed as always. Tried twice and made >>>>>>>>>> sure >>>>>>>>>> I booted the right kernel. Sorry. >>>>>>>>>> The idea with the race between the timer and the state changing >>>>>>>>>> sounded very appealing, actually that was suspicious to me from >>>>>>>>>> the >>>>>>>>>> beginning. >>>>>>>>>> >>>>>>>>>> I will add some code to dump the state of all cpupools to the >>>>>>>>>> BUG_ON >>>>>>>>>> to see in which situation we are when the bug triggers. >>>>>>>>> OK, here is a first try of this, the patch iterates over all CPU >>>>>>>>> pools >>>>>>>>> and outputs some data if the BUG_ON >>>>>>>>> ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition >>>>>>>>> triggers: >>>>>>>>> (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: >>>>>>>>> fffffffc003f >>>>>>>>> (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 >>>>>>>>> (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 >>>>>>>>> (XEN) Xen BUG at sched_credit.c:1010 >>>>>>>>> .... >>>>>>>>> The masks look proper (6 cores per node), the bug triggers when the >>>>>>>>> first CPU is about to be(?) inserted. >>>>>>>> Sure? I''m missing the cpu with mask 2000. >>>>>>>> I''ll try to reproduce the problem on a larger machine here (24 >>>>>>>> cores, 4 >>>>>>>> numa >>>>>>>> nodes). >>>>>>>> Andre, can you give me your xen boot parameters? Which xen changeset >>>>>>>> are >>>>>>>> you >>>>>>>> running, and do you have any additional patches in use? >>>>>>> The grub lines: >>>>>>> kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga >>>>>>> com1=115200 >>>>>>> module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 >>>>>>> console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 >>>>>>> >>>>>>> All of my experiments are use c/s 22858 as a base. >>>>>>> If you use a AMD Magny-Cours box for your experiments (socket C32 or >>>>>>> G34), >>>>>>> you should add the following patch (removing the line) >>>>>>> --- a/xen/arch/x86/traps.c >>>>>>> +++ b/xen/arch/x86/traps.c >>>>>>> @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) >>>>>>> __clear_bit(X86_FEATURE_SKINIT % 32,&c); >>>>>>> __clear_bit(X86_FEATURE_WDT % 32,&c); >>>>>>> __clear_bit(X86_FEATURE_LWP % 32,&c); >>>>>>> - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); >>>>>>> __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); >>>>>>> break; >>>>>>> case 5: /* MONITOR/MWAIT */ >>>>>>> >>>>>>> This is not necessary (in fact that reverts my patch c/s 22815), but >>>>>>> raises >>>>>>> the probability to trigger the bug, probably because it increases the >>>>>>> pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, >>>>>>> try to >>>>>>> create a guest with many VCPUs and squeeze it into a small CPU-pool. >>>>>>> >>>>>>> Good luck ;-) >>>>>>> Andre. >>>>>>> >>>>>>> -- >>>>>>> Andre Przywara >>>>>>> AMD-OSRC (Dresden) >>>>>>> Tel: x29712 >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Xen-devel mailing list >>>>>>> Xen-devel@lists.xensource.com >>>>>>> http://lists.xensource.com/xen-devel >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Xen-devel mailing list >>>>>>> Xen-devel@lists.xensource.com >>>>>>> http://lists.xensource.com/xen-devel >>>> -- >>>> Juergen Gross Principal Developer Operating Systems >>>> TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 >>>> Fujitsu Technology Solutions e-mail: >>>> juergen.gross@ts.fujitsu.com >>>> Domagkstr. 28 Internet: ts.fujitsu.com >>>> D-80807 Muenchen Company details: >>>> ts.fujitsu.com/imprint.html >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>>> >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 > Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com > Domagkstr. 28 Internet: ts.fujitsu.com > D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html >-- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Feb-21 14:50 UTC
Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/21/11 15:45, Andre Przywara wrote:> Juergen Gross wrote: >> On 02/21/11 11:00, Andre Przywara wrote: >>> George Dunlap wrote: >>>> Andre (and Juergen), can you try again with the attached patch? >>> I applied this patch on top of 22931 and it did _not_ work. >>> The crash occurred almost immediately after I started my script, so the >>> same behaviour as without the patch. >> >> Did you try my patch addressing races in the scheduler when moving cpus >> between cpupools? > Sorry, I tried yours first, but it didn''t apply cleanly on my particular > tree (sched_jg_fix ;-). So I tested George''s first. > >> I''ve attached it again. For me it works quite well, while George''s patch >> seems not to be enough (machine hanging after some tests with cpupools). > OK, it now applied after a rebase. > And yes, I didn''t see a crash! At least until the script stopped while > at lot of these messages appeared: > (XEN) do_IRQ: 0.89 No irq handler for vector (irq -1) > > That is what I reported before and is most probably totally unrelated to > this issue. > So I consider this fix working! > I will try to match my recent theories and debug results with your patch > to see whether this fits. > >> OTOH I can''t reproduce an error as fast as you even without any patch :-) >> >>> (attached my script for reference, though it will most likely only make >>> sense on bigger NUMA machines) >> >> Yeah, on my 2-node system I need several hundred tries to get an error. >> But it seems to be more effective than George''s script. > I consider the large over-provisioning the reason. With Dom0 having 48 > VCPUs finally squashed together to 6 pCPUs, my script triggered at the > second run the latest. > With your patch it made 24 iterations before the other bug kicked in.Okay, I''ll prepare an official patch. Might last some days, as I''m not in the office until Thursday. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel