Sander Eikelenboom
2012-Dec-14 15:55 UTC
3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC
Hi Konrad, I just tried to boot a 3.8.0-rc0 kernel (last commit: 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current xen-unstable. The boot stalls: [ 0.000000] ACPI: PM-Timer IO Port: 0x808 [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled) [ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) [ 0.000000] IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 [ 0.000000] ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- [ 64.598628] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 64.598676] 0: (1 GPs behind) idle=aed/140000000000000/0 drain=5 . timer not pending [ 64.598683] (detected by 1, t=18004 jiffies, g=18446744073709551414, c=18446744073709551413, q=162) [ 64.598692] sending NMI to all CPUs: [ 64.598716] xen: vector 0x2 is not implemented Perhaps an interesting line is the incomplete (no end of range, and it stalls there some time before the kernel reports the stall itself: [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- The exact seem config with 3.7.0 as kernel works fine. Complete serial log is attached. -- Sander _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Dec-16 17:38 UTC
Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC
On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote:> Hi Konrad, > > I just tried to boot a 3.8.0-rc0 kernel (last commit: 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current xen-unstable.Yeah, saw it over the Dec 11->Dec 12 merges and was out on vacation during that time (just got back). Did you by any chance try to do a git bisect to narrow down which merge it was? Thanks!> The boot stalls: > > [ 0.000000] ACPI: PM-Timer IO Port: 0x808 > [ 0.000000] ACPI: Local APIC address 0xfee00000 > [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) > [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) > [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) > [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) > [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled) > [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled) > [ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) > [ 0.000000] IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 > [ 0.000000] ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) > [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- > [ 64.598628] INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 64.598676] 0: (1 GPs behind) idle=aed/140000000000000/0 drain=5 . timer not pending > [ 64.598683] (detected by 1, t=18004 jiffies, g=18446744073709551414, c=18446744073709551413, q=162) > [ 64.598692] sending NMI to all CPUs: > [ 64.598716] xen: vector 0x2 is not implemented > > > Perhaps an interesting line is the incomplete (no end of range, and it stalls there some time before the kernel reports the stall itself: > [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- > > > The exact seem config with 3.7.0 as kernel works fine. > Complete serial log is attached. > > -- > > Sander > >
Sander Eikelenboom
2012-Dec-16 19:42 UTC
Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC
Sunday, December 16, 2012, 6:38:24 PM, you wrote:> On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote: >> Hi Konrad, >> >> I just tried to boot a 3.8.0-rc0 kernel (last commit: 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current xen-unstable.> Yeah, saw it over the Dec 11->Dec 12 merges and was out on > vacation during that time (just got back).> Did you by any chance try to do a git bisect to narrow down > which merge it was?Hi Konrad, Nope haven''t had the time, I only tried resetting to commit 189251705649bdfdf5e5850eb178f8cbfdac5480 as a "hunch"(just before a lot of x86 and rcu commits), but the result didn''t boot .. -- Sander> Thanks! >> The boot stalls: >> >> [ 0.000000] ACPI: PM-Timer IO Port: 0x808 >> [ 0.000000] ACPI: Local APIC address 0xfee00000 >> [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled) >> [ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) >> [ 0.000000] IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 >> [ 0.000000] ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) >> [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- >> [ 64.598628] INFO: rcu_preempt detected stalls on CPUs/tasks: >> [ 64.598676] 0: (1 GPs behind) idle=aed/140000000000000/0 drain=5 . timer not pending >> [ 64.598683] (detected by 1, t=18004 jiffies, g=18446744073709551414, c=18446744073709551413, q=162) >> [ 64.598692] sending NMI to all CPUs: >> [ 64.598716] xen: vector 0x2 is not implemented >> >> >> Perhaps an interesting line is the incomplete (no end of range, and it stalls there some time before the kernel reports the stall itself: >> [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- >> >> >> The exact seem config with 3.7.0 as kernel works fine. >> Complete serial log is attached. >> >> -- >> >> Sander >> >>
Sander Eikelenboom
2012-Dec-17 14:58 UTC
Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC
Sunday, December 16, 2012, 6:38:24 PM, you wrote:> On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote: >> Hi Konrad, >> >> I just tried to boot a 3.8.0-rc0 kernel (last commit: 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current xen-unstable.> Yeah, saw it over the Dec 11->Dec 12 merges and was out on > vacation during that time (just got back).> Did you by any chance try to do a git bisect to narrow down > which merge it was?Hi Konrad, I tried to bisect, but did not succeed so far. But somehow i have the feeling it is at least partly .config related. After make a new clone, and by hand trying to bisecting down, i came back to v3.7, but that also didn''t boot. So i will see if i can do it the other way around :S -- Sander> Thanks! >> The boot stalls: >> >> [ 0.000000] ACPI: PM-Timer IO Port: 0x808 >> [ 0.000000] ACPI: Local APIC address 0xfee00000 >> [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled) >> [ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) >> [ 0.000000] IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 >> [ 0.000000] ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) >> [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- >> [ 64.598628] INFO: rcu_preempt detected stalls on CPUs/tasks: >> [ 64.598676] 0: (1 GPs behind) idle=aed/140000000000000/0 drain=5 . timer not pending >> [ 64.598683] (detected by 1, t=18004 jiffies, g=18446744073709551414, c=18446744073709551413, q=162) >> [ 64.598692] sending NMI to all CPUs: >> [ 64.598716] xen: vector 0x2 is not implemented >> >> >> Perhaps an interesting line is the incomplete (no end of range, and it stalls there some time before the kernel reports the stall itself: >> [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- >> >> >> The exact seem config with 3.7.0 as kernel works fine. >> Complete serial log is attached. >> >> -- >> >> Sander >> >>
Sander Eikelenboom
2012-Dec-17 20:32 UTC
Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC
Sunday, December 16, 2012, 6:38:24 PM, you wrote:> On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote: >> Hi Konrad, >> >> I just tried to boot a 3.8.0-rc0 kernel (last commit: 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current xen-unstable.> Yeah, saw it over the Dec 11->Dec 12 merges and was out on > vacation during that time (just got back).> Did you by any chance try to do a git bisect to narrow down > which merge it was?Hi Konrad, With some more effort it leads to: git bisect start # bad: [fa4c95bfdb85d568ae327d57aa33a4f55bab79c4] Merge branch ''for_linus'' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs git bisect bad fa4c95bfdb85d568ae327d57aa33a4f55bab79c4 # good: [29594404d7fe73cd80eaa4ee8c43dcc53970c60e] Linux 3.7 git bisect good 29594404d7fe73cd80eaa4ee8c43dcc53970c60e # bad: [98870901cce098bbe94d90d2c41d8d1fa8d94392] mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic() git bisect bad 98870901cce098bbe94d90d2c41d8d1fa8d94392 # good: [8966961b31c251b854169e9886394c2a20f2cea7] Merge tag ''staging-3.8-rc1'' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect good 8966961b31c251b854169e9886394c2a20f2cea7 # bad: [22a40fd9a60388aec8106b0baffc8f59f83bb1b4] Merge tag ''dlm-3.8'' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm git bisect bad 22a40fd9a60388aec8106b0baffc8f59f83bb1b4 # good: [aefb058b0c27dafb15072406fbfd92d2ac2c8790] Merge branch ''irq-core-for-linus'' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good aefb058b0c27dafb15072406fbfd92d2ac2c8790 # good: [b64c5fda3868cb29d5dae0909561aa7d93fb7330] Merge branch ''timers-core-for-linus'' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good b64c5fda3868cb29d5dae0909561aa7d93fb7330 # bad: [139353ffbe42ac7abda42f3259c1c374cbf4b779] Merge tag ''please-pull-einj-fix-for-acpi5'' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras git bisect bad 139353ffbe42ac7abda42f3259c1c374cbf4b779 # bad: [d07e43d70eef15a44a2c328a913d8d633a90e088] Merge branch ''omap-serial'' of git://git.linaro.org/people/rmk/linux-arm git bisect bad d07e43d70eef15a44a2c328a913d8d633a90e088 # bad: [a05a4e24dcd73c2de4ef3f8d520b8bbb44570c60] Merge branch ''x86-cpu-for-linus'' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad a05a4e24dcd73c2de4ef3f8d520b8bbb44570c60 # bad: [a71c8bc5dfefbbf80ef90739791554ef7ea4401b] x86, topology: Debug CPU0 hotplug git bisect bad a71c8bc5dfefbbf80ef90739791554ef7ea4401b # bad: [42e78e9719aa0c76711e2731b19c90fe5ae05278] x86-64, hotplug: Add start_cpu0() entry point to head_64.S git bisect bad 42e78e9719aa0c76711e2731b19c90fe5ae05278 # good: [4d25031a81d3cd32edc00de6596db76cc4010685] x86, topology: Don''t offline CPU0 if any PIC irq can not be migrated out of it git bisect good 4d25031a81d3cd32edc00de6596db76cc4010685 # bad: [209efae12981f3d2d694499b761def10895c078c] x86, hotplug, suspend: Online CPU0 for suspend or hibernate git bisect bad 209efae12981f3d2d694499b761def10895c078c # bad: [30106c174311b8cfaaa3186c7f6f9c36c62d17da] x86, hotplug: Support functions for CPU0 online/offline git bisect bad 30106c174311b8cfaaa3186c7f6f9c36c62d17da 30106c174311b8cfaaa3186c7f6f9c36c62d17da is the first bad commit commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da Author: Fenghua Yu <fenghua.yu@intel.com> Date: Tue Nov 13 11:32:41 2012 -0800 x86, hotplug: Support functions for CPU0 online/offline Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time. Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after it''s offline. Continue to online CPU0 in native_cpu_up(). Continue to offline CPU0 in native_cpu_disable(). Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> :040000 040000 729e56e8eddaaf5d0f55257b82f28006dffb9aab d5c98e50cd92814351ee6c741b7e4c9afa29487c M arch Which seems to be merged in http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=74b84233458e9db7c160cec67638efdbec748ca9 -- Sander> Thanks! >> The boot stalls: >> >> [ 0.000000] ACPI: PM-Timer IO Port: 0x808 >> [ 0.000000] ACPI: Local APIC address 0xfee00000 >> [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled) >> [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled) >> [ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) >> [ 0.000000] IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 >> [ 0.000000] ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) >> [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- >> [ 64.598628] INFO: rcu_preempt detected stalls on CPUs/tasks: >> [ 64.598676] 0: (1 GPs behind) idle=aed/140000000000000/0 drain=5 . timer not pending >> [ 64.598683] (detected by 1, t=18004 jiffies, g=18446744073709551414, c=18446744073709551413, q=162) >> [ 64.598692] sending NMI to all CPUs: >> [ 64.598716] xen: vector 0x2 is not implemented >> >> >> Perhaps an interesting line is the incomplete (no end of range, and it stalls there some time before the kernel reports the stall itself: >> [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- >> >> >> The exact seem config with 3.7.0 as kernel works fine. >> Complete serial log is attached. >> >> -- >> >> Sander >> >>
Konrad Rzeszutek Wilk
2012-Dec-17 20:46 UTC
Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC
On Mon, Dec 17, 2012 at 09:32:17PM +0100, Sander Eikelenboom wrote:> > Sunday, December 16, 2012, 6:38:24 PM, you wrote: > > > On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote: > >> Hi Konrad, > >> > >> I just tried to boot a 3.8.0-rc0 kernel (last commit: 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current xen-unstable. > > > Yeah, saw it over the Dec 11->Dec 12 merges and was out on > > vacation during that time (just got back). > > > Did you by any chance try to do a git bisect to narrow down > > which merge it was? > > Hi Konrad,Hey Sander, Thank you for doing the bisection. Fenghua - any ideas what might be amiss in the Xen subsystem? I hadn''t looked at the patchset of the CPU0 offlining/onlining so I am not completly up to speed on the particulars of the patches.> > With some more effort it leads to: > > git bisect start > # bad: [fa4c95bfdb85d568ae327d57aa33a4f55bab79c4] Merge branch ''for_linus'' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs > git bisect bad fa4c95bfdb85d568ae327d57aa33a4f55bab79c4 > # good: [29594404d7fe73cd80eaa4ee8c43dcc53970c60e] Linux 3.7 > git bisect good 29594404d7fe73cd80eaa4ee8c43dcc53970c60e > # bad: [98870901cce098bbe94d90d2c41d8d1fa8d94392] mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic() > git bisect bad 98870901cce098bbe94d90d2c41d8d1fa8d94392 > # good: [8966961b31c251b854169e9886394c2a20f2cea7] Merge tag ''staging-3.8-rc1'' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging > git bisect good 8966961b31c251b854169e9886394c2a20f2cea7 > # bad: [22a40fd9a60388aec8106b0baffc8f59f83bb1b4] Merge tag ''dlm-3.8'' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm > git bisect bad 22a40fd9a60388aec8106b0baffc8f59f83bb1b4 > # good: [aefb058b0c27dafb15072406fbfd92d2ac2c8790] Merge branch ''irq-core-for-linus'' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good aefb058b0c27dafb15072406fbfd92d2ac2c8790 > # good: [b64c5fda3868cb29d5dae0909561aa7d93fb7330] Merge branch ''timers-core-for-linus'' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect good b64c5fda3868cb29d5dae0909561aa7d93fb7330 > # bad: [139353ffbe42ac7abda42f3259c1c374cbf4b779] Merge tag ''please-pull-einj-fix-for-acpi5'' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras > git bisect bad 139353ffbe42ac7abda42f3259c1c374cbf4b779 > # bad: [d07e43d70eef15a44a2c328a913d8d633a90e088] Merge branch ''omap-serial'' of git://git.linaro.org/people/rmk/linux-arm > git bisect bad d07e43d70eef15a44a2c328a913d8d633a90e088 > # bad: [a05a4e24dcd73c2de4ef3f8d520b8bbb44570c60] Merge branch ''x86-cpu-for-linus'' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > git bisect bad a05a4e24dcd73c2de4ef3f8d520b8bbb44570c60 > # bad: [a71c8bc5dfefbbf80ef90739791554ef7ea4401b] x86, topology: Debug CPU0 hotplug > git bisect bad a71c8bc5dfefbbf80ef90739791554ef7ea4401b > # bad: [42e78e9719aa0c76711e2731b19c90fe5ae05278] x86-64, hotplug: Add start_cpu0() entry point to head_64.S > git bisect bad 42e78e9719aa0c76711e2731b19c90fe5ae05278 > # good: [4d25031a81d3cd32edc00de6596db76cc4010685] x86, topology: Don''t offline CPU0 if any PIC irq can not be migrated out of it > git bisect good 4d25031a81d3cd32edc00de6596db76cc4010685 > # bad: [209efae12981f3d2d694499b761def10895c078c] x86, hotplug, suspend: Online CPU0 for suspend or hibernate > git bisect bad 209efae12981f3d2d694499b761def10895c078c > # bad: [30106c174311b8cfaaa3186c7f6f9c36c62d17da] x86, hotplug: Support functions for CPU0 online/offline > git bisect bad 30106c174311b8cfaaa3186c7f6f9c36c62d17da > > > > 30106c174311b8cfaaa3186c7f6f9c36c62d17da is the first bad commit > commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da > Author: Fenghua Yu <fenghua.yu@intel.com> > Date: Tue Nov 13 11:32:41 2012 -0800 > > x86, hotplug: Support functions for CPU0 online/offline > > Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time. > > Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after > it''s offline. > > Continue to online CPU0 in native_cpu_up(). > > Continue to offline CPU0 in native_cpu_disable(). > > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > Link: http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua.yu@intel.com > Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> > > :040000 040000 729e56e8eddaaf5d0f55257b82f28006dffb9aab d5c98e50cd92814351ee6c741b7e4c9afa29487c M arch > > > Which seems to be merged in http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=74b84233458e9db7c160cec67638efdbec748ca9 > > -- > > Sander > > > > Thanks! > >> The boot stalls: > >> > >> [ 0.000000] ACPI: PM-Timer IO Port: 0x808 > >> [ 0.000000] ACPI: Local APIC address 0xfee00000 > >> [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) > >> [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) > >> [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) > >> [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) > >> [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled) > >> [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled) > >> [ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) > >> [ 0.000000] IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 > >> [ 0.000000] ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) > >> [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- > >> [ 64.598628] INFO: rcu_preempt detected stalls on CPUs/tasks: > >> [ 64.598676] 0: (1 GPs behind) idle=aed/140000000000000/0 drain=5 . timer not pending > >> [ 64.598683] (detected by 1, t=18004 jiffies, g=18446744073709551414, c=18446744073709551413, q=162) > >> [ 64.598692] sending NMI to all CPUs: > >> [ 64.598716] xen: vector 0x2 is not implemented > >> > >> > >> Perhaps an interesting line is the incomplete (no end of range, and it stalls there some time before the kernel reports the stall itself: > >> [ 0.000000] IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24- > >> > >> > >> The exact seem config with 3.7.0 as kernel works fine. > >> Complete serial log is attached. > >> > >> -- > >> > >> Sander > >> > >> > > > >
Konrad Rzeszutek Wilk
2012-Dec-17 21:12 UTC
Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC
On Mon, Dec 17, 2012 at 03:46:34PM -0500, Konrad Rzeszutek Wilk wrote:> On Mon, Dec 17, 2012 at 09:32:17PM +0100, Sander Eikelenboom wrote: > > > > Sunday, December 16, 2012, 6:38:24 PM, you wrote: > > > > > On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote: > > >> Hi Konrad, > > >> > > >> I just tried to boot a 3.8.0-rc0 kernel (last commit: 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current xen-unstable. > > > > > Yeah, saw it over the Dec 11->Dec 12 merges and was out on > > > vacation during that time (just got back). > > > > > Did you by any chance try to do a git bisect to narrow down > > > which merge it was? > > > > Hi Konrad, > > Hey Sander, > > Thank you for doing the bisection. > > Fenghua - any ideas what might be amiss in the Xen subsystem? > I hadn''t looked at the patchset of the CPU0 offlining/onlining > so I am not completly up to speed on the particulars of the patches.> > 30106c174311b8cfaaa3186c7f6f9c36c62d17da is the first bad commit > > commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da > > Author: Fenghua Yu <fenghua.yu@intel.com> > > Date: Tue Nov 13 11:32:41 2012 -0800 > > > > x86, hotplug: Support functions for CPU0 online/offline > > > > Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time. > > > > Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after > > it''s offline. > > > > Continue to online CPU0 in native_cpu_up(). > > > > Continue to offline CPU0 in native_cpu_disable(). > > > > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > > Link: http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua.yu@intel.com > > Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> > >This patch: diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index 353c50f..4f7d259 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -254,7 +254,7 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus) } xen_init_lock_cpu(0); - smp_store_cpu_info(0); + smp_store_boot_cpu_info(); cpu_data(0).x86_max_cores = 1; for_each_possible_cpu(i) { Would do the corresponding change in the Xen subsystem that the above mentioned commit did. Perhaps that is all that is needed? I am going to be able to test this and look in more details tomorrow.
Sander Eikelenboom
2012-Dec-17 21:35 UTC
Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC
Monday, December 17, 2012, 10:12:40 PM, you wrote:> On Mon, Dec 17, 2012 at 03:46:34PM -0500, Konrad Rzeszutek Wilk wrote: >> On Mon, Dec 17, 2012 at 09:32:17PM +0100, Sander Eikelenboom wrote: >> > >> > Sunday, December 16, 2012, 6:38:24 PM, you wrote: >> > >> > > On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote: >> > >> Hi Konrad, >> > >> >> > >> I just tried to boot a 3.8.0-rc0 kernel (last commit: 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current xen-unstable. >> > >> > > Yeah, saw it over the Dec 11->Dec 12 merges and was out on >> > > vacation during that time (just got back). >> > >> > > Did you by any chance try to do a git bisect to narrow down >> > > which merge it was? >> > >> > Hi Konrad, >> >> Hey Sander, >> >> Thank you for doing the bisection. >> >> Fenghua - any ideas what might be amiss in the Xen subsystem? >> I hadn''t looked at the patchset of the CPU0 offlining/onlining >> so I am not completly up to speed on the particulars of the patches.>> > 30106c174311b8cfaaa3186c7f6f9c36c62d17da is the first bad commit >> > commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da >> > Author: Fenghua Yu <fenghua.yu@intel.com> >> > Date: Tue Nov 13 11:32:41 2012 -0800 >> > >> > x86, hotplug: Support functions for CPU0 online/offline >> > >> > Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time. >> > >> > Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after >> > it''s offline. >> > >> > Continue to online CPU0 in native_cpu_up(). >> > >> > Continue to offline CPU0 in native_cpu_disable(). >> > >> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> >> > Link: http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua.yu@intel.com >> > Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> >> >> This patch:> diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c > index 353c50f..4f7d259 100644 > --- a/arch/x86/xen/smp.c > +++ b/arch/x86/xen/smp.c > @@ -254,7 +254,7 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus) > } > xen_init_lock_cpu(0); > > - smp_store_cpu_info(0); > + smp_store_boot_cpu_info(); > cpu_data(0).x86_max_cores = 1; > > for_each_possible_cpu(i) {> Would do the corresponding change in the Xen subsystem that the above > mentioned commit did. Perhaps that is all that is needed? I am going to > be able to test this and look in more details tomorrow.Seems like it, don''t know if there are other things still lurking, but with your patch it boots again as dom0 :-) Thx ! -- Sander
Konrad Rzeszutek Wilk
2012-Dec-18 01:12 UTC
Re: 3.8.0-rc0 on xen-unstable: RCU Stall during boot as dom0 kernel after IOAPIC
On Mon, Dec 17, 2012 at 10:35:58PM +0100, Sander Eikelenboom wrote:> > Monday, December 17, 2012, 10:12:40 PM, you wrote: > > > On Mon, Dec 17, 2012 at 03:46:34PM -0500, Konrad Rzeszutek Wilk wrote: > >> On Mon, Dec 17, 2012 at 09:32:17PM +0100, Sander Eikelenboom wrote: > >> > > >> > Sunday, December 16, 2012, 6:38:24 PM, you wrote: > >> > > >> > > On Fri, Dec 14, 2012 at 04:55:57PM +0100, Sander Eikelenboom wrote: > >> > >> Hi Konrad, > >> > >> > >> > >> I just tried to boot a 3.8.0-rc0 kernel (last commit: 7313264b899bbf3988841296265a6e0e8a7b6521) as dom0 on my machine with current xen-unstable. > >> > > >> > > Yeah, saw it over the Dec 11->Dec 12 merges and was out on > >> > > vacation during that time (just got back). > >> > > >> > > Did you by any chance try to do a git bisect to narrow down > >> > > which merge it was? > >> > > >> > Hi Konrad, > >> > >> Hey Sander, > >> > >> Thank you for doing the bisection. > >> > >> Fenghua - any ideas what might be amiss in the Xen subsystem? > >> I hadn''t looked at the patchset of the CPU0 offlining/onlining > >> so I am not completly up to speed on the particulars of the patches. > > >> > 30106c174311b8cfaaa3186c7f6f9c36c62d17da is the first bad commit > >> > commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da > >> > Author: Fenghua Yu <fenghua.yu@intel.com> > >> > Date: Tue Nov 13 11:32:41 2012 -0800 > >> > > >> > x86, hotplug: Support functions for CPU0 online/offline > >> > > >> > Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time. > >> > > >> > Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after > >> > it''s offline. > >> > > >> > Continue to online CPU0 in native_cpu_up(). > >> > > >> > Continue to offline CPU0 in native_cpu_disable(). > >> > > >> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > >> > Link: http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua.yu@intel.com > >> > Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> > >> > > > > This patch: > > > > diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c > > index 353c50f..4f7d259 100644 > > --- a/arch/x86/xen/smp.c > > +++ b/arch/x86/xen/smp.c > > @@ -254,7 +254,7 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus) > > } > > xen_init_lock_cpu(0); > > > > - smp_store_cpu_info(0); > > + smp_store_boot_cpu_info(); > > cpu_data(0).x86_max_cores = 1; > > > > for_each_possible_cpu(i) { > > > Would do the corresponding change in the Xen subsystem that the above > > mentioned commit did. Perhaps that is all that is needed? I am going to > > be able to test this and look in more details tomorrow. > > Seems like it, don''t know if there are other things still lurking, but with your patch it boots again as dom0 :-)Excellent. And it seems that it also fixes it on my test machine. Great. I am going to stick Reported-and-Tested-by: Sander Eikelenboom and push it to Linus shortly. Thanks!