Konrad Rzeszutek Wilk
2013-May-06 12:59 UTC
v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
Hey, As I was fixing the PV HVM''s broken CPU hotplug mechanism I discovered a deadlock in the microcode and generic code. I am not sure if the ACPI generic mechanism would expose this, but looking at the flow (arch_register_cpu, then expecting user-space to call cpu_up), it should trigger this. Anyhow, this can easily be triggered if a new CPU is added and from user-space do: echo 1 > /sys/devices/system/cpu/cpu3/online on a newly appeared CPU. The deadlock is that the "store_online" in drivers/base/cpu.c takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up". "cpu_up" eventually ends up calling "save_mc_for_early" which also takes the cpu_hotplug_driver_lock() lock. And here is that kernel thinks of it: smpboot: Stack at about ffff880075c39f44 smpboot: CPU3: has booted. microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25 ============================================[ INFO: possible recursive locking detected ] 3.9.0upstream-10129-g167af0e #1 Not tainted --------------------------------------------- sh/2487 is trying to acquire lock: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 but task is already holding lock: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(x86_cpu_hotplug_driver_mutex); lock(x86_cpu_hotplug_driver_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by sh/2487: #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190 #1: (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160 #2: (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160 #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20 #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60 stack backtrace: CPU: 1 PID: 2487 Comm: sh Not tainted 3.9.0upstream-10129-g167af0e #1 Hardware name: Xen HVM domU, BIOS 4.3-unstable 05/03/2013 ffffffff8229b710 ffff880064e75538 ffffffff816fe47e ffff880064e75608 ffffffff81100cb6 ffff880064e75560 ffff88006670e290 ffff880064e75588 ffffffff00000000 ffff88006670e9b8 21710c800c83a1f5 ffff880064e75598 Call Trace: [<ffffffff816fe47e>] dump_stack+0x19/0x1b [<ffffffff81100cb6>] __lock_acquire+0x726/0x1890 [<ffffffff8110b352>] ? is_module_text_address+0x22/0x40 [<ffffffff810bbff8>] ? __kernel_text_address+0x58/0x80 [<ffffffff81101eca>] lock_acquire+0xaa/0x190 [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20 [<ffffffff816fee1e>] __mutex_lock_common+0x5e/0x450 [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20 [<ffffffff810d4225>] ? sched_clock_local+0x25/0x90 [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20 [<ffffffff816ff340>] mutex_lock_nested+0x40/0x50 [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 [<ffffffff81080757>] save_mc_for_early+0x27/0xf0 [<ffffffff810ffd30>] ? mark_held_locks+0x90/0x150 [<ffffffff81176a5d>] ? get_page_from_freelist+0x46d/0x8e0 [<ffffffff8110029d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81056a39>] ? sched_clock+0x9/0x10 [<ffffffff81177275>] ? __alloc_pages_nodemask+0x165/0xa30 [<ffffffff810d4225>] ? sched_clock_local+0x25/0x90 [<ffffffff810d4348>] ? sched_clock_cpu+0xb8/0x110 [<ffffffff810fb88d>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff811a5e79>] ? vmap_page_range_noflush+0x279/0x370 [<ffffffff811a5f9d>] ? map_vm_area+0x2d/0x50 [<ffffffff811a7dce>] ? __vmalloc_node_range+0x18e/0x260 [<ffffffff810812a8>] ? generic_load_microcode+0xb8/0x1c0 [<ffffffff8108135c>] generic_load_microcode+0x16c/0x1c0 [<ffffffff8110917e>] ? generic_exec_single+0x7e/0xb0 [<ffffffff81081470>] ? request_microcode_user+0x20/0x20 [<ffffffff8108142f>] request_microcode_fw+0x7f/0xa0 [<ffffffff813356ab>] ? kobject_uevent+0xb/0x10 [<ffffffff81081004>] microcode_init_cpu+0xf4/0x110 [<ffffffff816f6b8c>] mc_cpu_callback+0x5b/0xb3 [<ffffffff81706d7c>] notifier_call_chain+0x5c/0x120 [<ffffffff810c6359>] __raw_notifier_call_chain+0x9/0x10 [<ffffffff8109616b>] __cpu_notify+0x1b/0x30 [<ffffffff816f72e1>] _cpu_up+0x103/0x14b [<ffffffff816f7404>] cpu_up+0xdb/0xee [<ffffffff816eda0a>] store_online+0xba/0x120 [<ffffffff8145f08b>] dev_attr_store+0x1b/0x20 [<ffffffff81246591>] sysfs_write_file+0xe1/0x160 [<ffffffff811ca3ef>] vfs_write+0xdf/0x190 [<ffffffff811ca96d>] SyS_write+0x5d/0xa0 [<ffffffff8133f4fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8170b7a9>] system_call_fastpath+0x16/0x1b Thoughts?
Konrad Rzeszutek Wilk
2013-May-07 19:00 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On Mon, May 06, 2013 at 08:59:37AM -0400, Konrad Rzeszutek Wilk wrote:> > Hey, > > As I was fixing the PV HVM''s broken CPU hotplug mechanism I discovered > a deadlock in the microcode and generic code. > > I am not sure if the ACPI generic mechanism would expose this, but > looking at the flow (arch_register_cpu, then expecting user-space to call > cpu_up), it should trigger this.Fenghua, I dug deeper in how QEMU does it and it looks to be actually doing the right thing. It triggers the ACPI SCI, the method that figures out the CPU online/offline bits kicks off the right OSPM notification and everything is going through ACPI (so _STA is on the processor is checked, returns 0x2 (ACPI_STA_DEVICE_PRESENT), MADT has now the CPU marked as enabled). I am now 99% sure you would be able to reproduce this on baremetal with ACPI hotplug where the CPUs at bootup are marked as disabled in MADT. (lapic->lapic_flags == 0). The comment for calling save_mc_for_early says: /* * If early loading microcode is supported, save this mc into * permanent memory. So it will be loaded early when a CPU is hot added * or resumes. */ Do you by any chance recall the testing you did on that? And how the ACPI CPU hotplug mechanism worked such that this deadlock would not have occured? Thanks.> > Anyhow, this can easily be triggered if a new CPU is added and > from user-space do: > > echo 1 > /sys/devices/system/cpu/cpu3/online > > on a newly appeared CPU. The deadlock is that the "store_online" in > drivers/base/cpu.c takes the cpu_hotplug_driver_lock() lock, then > calls "cpu_up". "cpu_up" eventually ends up calling "save_mc_for_early" > which also takes the cpu_hotplug_driver_lock() lock. > > And here is that kernel thinks of it: > > smpboot: Stack at about ffff880075c39f44 > smpboot: CPU3: has booted. > microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25 > > ============================================> [ INFO: possible recursive locking detected ] > 3.9.0upstream-10129-g167af0e #1 Not tainted > --------------------------------------------- > sh/2487 is trying to acquire lock: > (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 > > but task is already holding lock: > (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(x86_cpu_hotplug_driver_mutex); > lock(x86_cpu_hotplug_driver_mutex); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 6 locks held by sh/2487: > #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190 > #1: (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160 > #2: (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160 > #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 > #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20 > #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60 > > stack backtrace: > CPU: 1 PID: 2487 Comm: sh Not tainted 3.9.0upstream-10129-g167af0e #1 > Hardware name: Xen HVM domU, BIOS 4.3-unstable 05/03/2013 > ffffffff8229b710 ffff880064e75538 ffffffff816fe47e ffff880064e75608 > ffffffff81100cb6 ffff880064e75560 ffff88006670e290 ffff880064e75588 > ffffffff00000000 ffff88006670e9b8 21710c800c83a1f5 ffff880064e75598 > Call Trace: > [<ffffffff816fe47e>] dump_stack+0x19/0x1b > [<ffffffff81100cb6>] __lock_acquire+0x726/0x1890 > [<ffffffff8110b352>] ? is_module_text_address+0x22/0x40 > [<ffffffff810bbff8>] ? __kernel_text_address+0x58/0x80 > [<ffffffff81101eca>] lock_acquire+0xaa/0x190 > [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20 > [<ffffffff816fee1e>] __mutex_lock_common+0x5e/0x450 > [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20 > [<ffffffff810d4225>] ? sched_clock_local+0x25/0x90 > [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20 > [<ffffffff816ff340>] mutex_lock_nested+0x40/0x50 > [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 > [<ffffffff81080757>] save_mc_for_early+0x27/0xf0 > [<ffffffff810ffd30>] ? mark_held_locks+0x90/0x150 > [<ffffffff81176a5d>] ? get_page_from_freelist+0x46d/0x8e0 > [<ffffffff8110029d>] ? trace_hardirqs_on+0xd/0x10 > [<ffffffff81056a39>] ? sched_clock+0x9/0x10 > [<ffffffff81177275>] ? __alloc_pages_nodemask+0x165/0xa30 > [<ffffffff810d4225>] ? sched_clock_local+0x25/0x90 > [<ffffffff810d4348>] ? sched_clock_cpu+0xb8/0x110 > [<ffffffff810fb88d>] ? trace_hardirqs_off+0xd/0x10 > [<ffffffff811a5e79>] ? vmap_page_range_noflush+0x279/0x370 > [<ffffffff811a5f9d>] ? map_vm_area+0x2d/0x50 > [<ffffffff811a7dce>] ? __vmalloc_node_range+0x18e/0x260 > [<ffffffff810812a8>] ? generic_load_microcode+0xb8/0x1c0 > [<ffffffff8108135c>] generic_load_microcode+0x16c/0x1c0 > [<ffffffff8110917e>] ? generic_exec_single+0x7e/0xb0 > [<ffffffff81081470>] ? request_microcode_user+0x20/0x20 > [<ffffffff8108142f>] request_microcode_fw+0x7f/0xa0 > [<ffffffff813356ab>] ? kobject_uevent+0xb/0x10 > [<ffffffff81081004>] microcode_init_cpu+0xf4/0x110 > [<ffffffff816f6b8c>] mc_cpu_callback+0x5b/0xb3 > [<ffffffff81706d7c>] notifier_call_chain+0x5c/0x120 > [<ffffffff810c6359>] __raw_notifier_call_chain+0x9/0x10 > [<ffffffff8109616b>] __cpu_notify+0x1b/0x30 > [<ffffffff816f72e1>] _cpu_up+0x103/0x14b > [<ffffffff816f7404>] cpu_up+0xdb/0xee > [<ffffffff816eda0a>] store_online+0xba/0x120 > [<ffffffff8145f08b>] dev_attr_store+0x1b/0x20 > [<ffffffff81246591>] sysfs_write_file+0xe1/0x160 > [<ffffffff811ca3ef>] vfs_write+0xdf/0x190 > [<ffffffff811ca96d>] SyS_write+0x5d/0xa0 > [<ffffffff8133f4fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff8170b7a9>] system_call_fastpath+0x16/0x1b > > Thoughts?
Borislav Petkov
2013-May-08 12:54 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On Tue, May 07, 2013 at 03:00:24PM -0400, Konrad Rzeszutek Wilk wrote:> I dug deeper in how QEMU does it and it looks to be actually doing > the right thing. It triggers the ACPI SCI, the method that figures > out the CPU online/offline bits kicks off the right OSPM notification > and everything is going through ACPI (so _STA is on the processor is > checked, returns 0x2 (ACPI_STA_DEVICE_PRESENT), MADT has now the CPU > marked as enabled).AFAIUC, you mean physical hotplug here which is done with ACPI, right? And, if so, we actually need an x86 machine which supports that to test it on. Also, is this how physical hotplug is done? Put the number of max. supported CPUs in MADT and those which are not present are marked as disabled? Then, when they''re physically hotplugged, ACPI marks them as enabled? Questions over questions...?> I am now 99% sure you would be able to reproduce this on baremetal with > ACPI hotplug where the CPUs at bootup are marked as disabled in MADT. > (lapic->lapic_flags == 0). > > The comment for calling save_mc_for_early says:Looks like save_mc_for_early would need another, local mutex to fix that. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --
Konrad Rzeszutek Wilk
2013-May-08 14:03 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On Wed, May 08, 2013 at 02:54:14PM +0200, Borislav Petkov wrote:> On Tue, May 07, 2013 at 03:00:24PM -0400, Konrad Rzeszutek Wilk wrote: > > I dug deeper in how QEMU does it and it looks to be actually doing > > the right thing. It triggers the ACPI SCI, the method that figures > > out the CPU online/offline bits kicks off the right OSPM notification > > and everything is going through ACPI (so _STA is on the processor is > > checked, returns 0x2 (ACPI_STA_DEVICE_PRESENT), MADT has now the CPU > > marked as enabled). > > AFAIUC, you mean physical hotplug here which is done with ACPI, right?Yes.> And, if so, we actually need an x86 machine which supports that to test > it on. Also, is this how physical hotplug is done? Put the number of > max. supported CPUs in MADT and those which are not present are marked > as disabled? > > Then, when they''re physically hotplugged, ACPI marks them as enabled?Yes. The GPE is raised (by QEMU), the ACPI Method PRSC is invoked: Scope (_GPE) { Method (_L02, 0, NotSerialized) { Return (\_SB.PRSC ()) } } Which iterates over the AXF00 (32 bytes) and checks each bit to see if it is enabled (so CPU is on) or disabled. Then if it is different from the MADT.FLG entry (so the ''flags'' entry in the MADT), it updates the MADT entry to have one (or zero if it has been disabled). And then Notifies the Processor. Here is what the Processor entry looks like: Processor (PR02, 0x02, 0x0000B010, 0x06) { Name (_HID, "ACPI0007") OperationRegion (MATR, SystemMemory, Add (MAPA, 0x10), 0x08) [MAPA is the physical address to the MADT, the 0x10 increases by eight bytes for each CPU] Field (MATR, ByteAcc, NoLock, Preserve) { MAT, 64 } [so it is 64 bits, the MAT is used in the ''_STA'' method to return the whole contents of said memory location] Field (MATR, ByteAcc, NoLock, Preserve) { Offset (0x04), FLG, 1 } [and FLG is at offset 4 (out of 8 bytes), which means it lands on the lapic->flags entry] The PRSC method does what I mentioned above. OperationRegion (PRST, SystemIO, 0xAF00, 0x20) Field (PRST, ByteAcc, NoLock, Preserve) { PRS, 15 } Method (PRSC, 0, NotSerialized) { Store (ToBuffer (PRS), Local0) [Local0 has now the 32 bytes of data] Store (DerefOf (Index (Local0, Zero)), Local1) [Local1 has now the zero-th byte of the 32-bytes. Each bit is one CPU, so it contains the value of eight CPUs] And (Local1, One, Local2) [Local2 = gpe_state.cpu_sts[i] & 1, aka first CPU] If (LNotEqual (Local2, ^PR00.FLG)) { Store (Local2, ^PR00.FLG) [Write the bit in the PR00.FLG, so at offset four in the MADT] If (LEqual (Local2, One)) { [If it was enabled, and now is disabled, then notify with 1] Notify (PR00, One) Subtract (MSU, One, MSU) [fix up the checksum] } Else { Notify (PR00, 0x03) [if it was disabled, and now enabled, then notify with 0x3] Add (MSU, One, MSU) [again, fix up the checksum] } } ShiftRight (Local1, One, Local1) [here it shifts and continues on testing each CPU bit]> Questions over questions...?I probably went overboard with my answers :-)> > > I am now 99% sure you would be able to reproduce this on baremetal with > > ACPI hotplug where the CPUs at bootup are marked as disabled in MADT. > > (lapic->lapic_flags == 0). > > > > The comment for calling save_mc_for_early says: > > Looks like save_mc_for_early would need another, local mutex to fix that.Let me try that. Thanks for the suggestion.> > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > --
Borislav Petkov
2013-May-08 14:29 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote: [ … snip some funky BIOS code ]> [here it shifts and continues on testing each CPU bit] > > > Questions over questions...? > > I probably went overboard with my answers :-)Konrad, you''re killing me! :-) You actually went and looked at the BIOS disassembly voluntarily. You must be insane, I think you should immediately go to the doctor now for a thorough checkup. :-) I think I know who I can sling BIOS issues now to.> > Looks like save_mc_for_early would need another, local mutex to fix that. > > Let me try that. Thanks for the suggestion.Ok, seriously now: yeah, this was just an idea, it should at least get the nesting out of the way. About the BIOS deal: you''re probably staring at some BIOS out there but is this the way that it is actually going to be implemented on the physical hotplug BIOS? I mean, I''ve only heard rumors about IVB supporting physical hotplug but do you even have access to such BIOS to verify? All I''m saying is, let''s not hurry too much before we actually can really trigger this on baremetal. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --
H. Peter Anvin
2013-May-08 15:20 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On 05/08/2013 07:29 AM, Borislav Petkov wrote:> On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote: > > [ … snip some funky BIOS code ] > >> [here it shifts and continues on testing each CPU bit] >> >>> Questions over questions...? >> >> I probably went overboard with my answers :-) > > Konrad, you''re killing me! :-) You actually went and looked at the > BIOS disassembly voluntarily. You must be insane, I think you should > immediately go to the doctor now for a thorough checkup. :-) > > I think I know who I can sling BIOS issues now to. >Nah, that was just the ACPI AML. Real BIOS code is much scarier. -hpa
Ross Philipson
2013-May-08 15:41 UTC
RE: [Xen-devel] v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Borislav Petkov > Sent: Wednesday, May 08, 2013 7:30 AM > To: Konrad Rzeszutek Wilk > Cc: fenghua.yu@intel.com; xen-devel@lists.xensource.com; x86@kernel.org; > linux-kernel@vger.kernel.org; mingo@redhat.com; hpa@zytor.com; > tglx@linutronix.de > Subject: Re: [Xen-devel] v3.9 - CPU hotplug and microcode earlier > loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex) > > On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote: > > [ … snip some funky BIOS code ] > > > [here it shifts and continues on testing each CPU bit] > > > > > Questions over questions...? > > > > I probably went overboard with my answers :-) > > Konrad, you're killing me! :-) You actually went and looked at the > BIOS disassembly voluntarily. You must be insane, I think you should > immediately go to the doctor now for a thorough checkup. :-) > > I think I know who I can sling BIOS issues now to. > > > > Looks like save_mc_for_early would need another, local mutex to fix > that. > > > > Let me try that. Thanks for the suggestion. > > Ok, seriously now: yeah, this was just an idea, it should at least get > the nesting out of the way. > > About the BIOS deal: you're probably staring at some BIOS out there > but is this the way that it is actually going to be implemented on > the physical hotplug BIOS? I mean, I've only heard rumors about IVB > supporting physical hotplug but do you even have access to such BIOS to > verify?The ASL Konrad traced through is to make the ACPI bits a guest interoperate with qemu etc. On bare metal it will be much more opaque and ugly; the processor related AML will probably burrow down into an SMI to do the real work. Ross> > All I'm saying is, let's not hurry too much before we actually can > really trigger this on baremetal. > > Thanks. > > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > -- > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2013-May-08 16:13 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
> > > > > > The comment for calling save_mc_for_early says: > > > > Looks like save_mc_for_early would need another, local mutex to fix that. > > Let me try that. Thanks for the suggestion.Looks to work nicely. Just posted a patch for this.> > > > -- > > Regards/Gruss, > > Boris. > > > > Sent from a fat crate under my desk. Formatting is fine. > > --
Konrad Rzeszutek Wilk
2013-May-08 16:32 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov wrote:> On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote: > > [ … snip some funky BIOS code ] > > > [here it shifts and continues on testing each CPU bit] > > > > > Questions over questions...? > > > > I probably went overboard with my answers :-) > > Konrad, you''re killing me! :-) You actually went and looked at the > BIOS disassembly voluntarily. You must be insane, I think you should > immediately go to the doctor now for a thorough checkup. :-) > > I think I know who I can sling BIOS issues now to.Great .. :-)> > > > Looks like save_mc_for_early would need another, local mutex to fix that. > > > > Let me try that. Thanks for the suggestion. > > Ok, seriously now: yeah, this was just an idea, it should at least get > the nesting out of the way. > > About the BIOS deal: you''re probably staring at some BIOS out there > but is this the way that it is actually going to be implemented on > the physical hotplug BIOS? I mean, I''ve only heard rumors about IVB > supporting physical hotplug but do you even have access to such BIOS to > verify?Unfortunatly not. I am getting an IvyTown box so hopefully that has this support. But I thought that Fenghua did since he mentioned in the patch. Besides that I think this can also appear on VMWare if one is doing CPU hotplug and on some HP machines - let me CC the relevant people extracted from drivers/acpi/processor_driver.c. (see https://lkml.org/lkml/2013/5/7/588 for the thread) Rafael, do you know of any specific Intel hardware that has this implemented?> > All I''m saying is, let''s not hurry too much before we actually can > really trigger this on baremetal. > > Thanks. > > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > --
Toshi Kani
2013-May-08 18:19 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On Wed, 2013-05-08 at 12:32 -0400, Konrad Rzeszutek Wilk wrote:> On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov wrote: > > On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote: > > > > [ … snip some funky BIOS code ] > > > > > [here it shifts and continues on testing each CPU bit] > > > > > > > Questions over questions...? > > > > > > I probably went overboard with my answers :-) > > > > Konrad, you''re killing me! :-) You actually went and looked at the > > BIOS disassembly voluntarily. You must be insane, I think you should > > immediately go to the doctor now for a thorough checkup. :-) > > > > I think I know who I can sling BIOS issues now to. > > Great .. :-) > > > > > > Looks like save_mc_for_early would need another, local mutex to fix that. > > > > > > Let me try that. Thanks for the suggestion. > > > > Ok, seriously now: yeah, this was just an idea, it should at least get > > the nesting out of the way. > > > > About the BIOS deal: you''re probably staring at some BIOS out there > > but is this the way that it is actually going to be implemented on > > the physical hotplug BIOS? I mean, I''ve only heard rumors about IVB > > supporting physical hotplug but do you even have access to such BIOS to > > verify? > > Unfortunatly not. I am getting an IvyTown box so hopefully that has this > support. But I thought that Fenghua did since he mentioned in the patch. > > Besides that I think this can also appear on VMWare if one is doing > CPU hotplug and on some HP machines - let me CC the relevant people > extracted from drivers/acpi/processor_driver.c. > (see https://lkml.org/lkml/2013/5/7/588 for the thread)From the stack trace, it looks like a bug in save_mc_for_early(). This function may not call cpu_hotplug_driver_lock() during CPU online. I suppose it intends to protect from CPU offline when microcode is updated outside of the boot/CPU online context. If it indeed supports updating the microcode without using reboot/cpu hotplug, the lock should be held when such update request is made. Thanks, -Toshi
Konrad Rzeszutek Wilk
2013-May-08 18:42 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On Wed, May 08, 2013 at 12:19:45PM -0600, Toshi Kani wrote:> On Wed, 2013-05-08 at 12:32 -0400, Konrad Rzeszutek Wilk wrote: > > On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov wrote: > > > On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote: > > > > > > [ … snip some funky BIOS code ] > > > > > > > [here it shifts and continues on testing each CPU bit] > > > > > > > > > Questions over questions...? > > > > > > > > I probably went overboard with my answers :-) > > > > > > Konrad, you''re killing me! :-) You actually went and looked at the > > > BIOS disassembly voluntarily. You must be insane, I think you should > > > immediately go to the doctor now for a thorough checkup. :-) > > > > > > I think I know who I can sling BIOS issues now to. > > > > Great .. :-) > > > > > > > > Looks like save_mc_for_early would need another, local mutex to fix that. > > > > > > > > Let me try that. Thanks for the suggestion. > > > > > > Ok, seriously now: yeah, this was just an idea, it should at least get > > > the nesting out of the way. > > > > > > About the BIOS deal: you''re probably staring at some BIOS out there > > > but is this the way that it is actually going to be implemented on > > > the physical hotplug BIOS? I mean, I''ve only heard rumors about IVB > > > supporting physical hotplug but do you even have access to such BIOS to > > > verify? > > > > Unfortunatly not. I am getting an IvyTown box so hopefully that has this > > support. But I thought that Fenghua did since he mentioned in the patch. > > > > Besides that I think this can also appear on VMWare if one is doing > > CPU hotplug and on some HP machines - let me CC the relevant people > > extracted from drivers/acpi/processor_driver.c. > > (see https://lkml.org/lkml/2013/5/7/588 for the thread) > > >From the stack trace, it looks like a bug in save_mc_for_early(). This > function may not call cpu_hotplug_driver_lock() during CPU online. I > suppose it intends to protect from CPU offline when microcode is updated > outside of the boot/CPU online context. If it indeed supports updating > the microcode without using reboot/cpu hotplug, the lock should be held > when such update request is made.Hey Toshi, The fix for this I have posted, but I am more curious whether you have seen on baremetal this issue? Meaning using CPU ACPI hotplug on v3.9?> > Thanks, > -Toshi >
Toshi Kani
2013-May-08 18:59 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On Wed, 2013-05-08 at 14:42 -0400, Konrad Rzeszutek Wilk wrote:> On Wed, May 08, 2013 at 12:19:45PM -0600, Toshi Kani wrote: > > On Wed, 2013-05-08 at 12:32 -0400, Konrad Rzeszutek Wilk wrote: > > > On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov wrote: > > > > On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote: > > > > > > > > [ … snip some funky BIOS code ] > > > > > > > > > [here it shifts and continues on testing each CPU bit] > > > > > > > > > > > Questions over questions...? > > > > > > > > > > I probably went overboard with my answers :-) > > > > > > > > Konrad, you''re killing me! :-) You actually went and looked at the > > > > BIOS disassembly voluntarily. You must be insane, I think you should > > > > immediately go to the doctor now for a thorough checkup. :-) > > > > > > > > I think I know who I can sling BIOS issues now to. > > > > > > Great .. :-) > > > > > > > > > > Looks like save_mc_for_early would need another, local mutex to fix that. > > > > > > > > > > Let me try that. Thanks for the suggestion. > > > > > > > > Ok, seriously now: yeah, this was just an idea, it should at least get > > > > the nesting out of the way. > > > > > > > > About the BIOS deal: you''re probably staring at some BIOS out there > > > > but is this the way that it is actually going to be implemented on > > > > the physical hotplug BIOS? I mean, I''ve only heard rumors about IVB > > > > supporting physical hotplug but do you even have access to such BIOS to > > > > verify? > > > > > > Unfortunatly not. I am getting an IvyTown box so hopefully that has this > > > support. But I thought that Fenghua did since he mentioned in the patch. > > > > > > Besides that I think this can also appear on VMWare if one is doing > > > CPU hotplug and on some HP machines - let me CC the relevant people > > > extracted from drivers/acpi/processor_driver.c. > > > (see https://lkml.org/lkml/2013/5/7/588 for the thread) > > > > >From the stack trace, it looks like a bug in save_mc_for_early(). This > > function may not call cpu_hotplug_driver_lock() during CPU online. I > > suppose it intends to protect from CPU offline when microcode is updated > > outside of the boot/CPU online context. If it indeed supports updating > > the microcode without using reboot/cpu hotplug, the lock should be held > > when such update request is made. > > > Hey Toshi, > > The fix for this I have posted, but I am more curious whether you have > seen on baremetal this issue? Meaning using CPU ACPI hotplug on v3.9?Oh, I see. No, I do not have a beremetal test machine that supports CPU hotplug yet. My initial target is to support CPU hotplug on VMs. I did not hit this issue since my test env was limited to CPU hot-delete followed by CPU hot-add. In such case, uci->valid is still set during CPU hot-add and does not get into this code path. I am not sure if uci->valid is supposed to be set after CPU hot-delete, though. Thanks, -Toshi
Rafael J. Wysocki
2013-May-09 00:23 UTC
Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
On 5/8/2013 6:32 PM, Konrad Rzeszutek Wilk wrote:> On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov wrote: >> On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote: >> >> [ … snip some funky BIOS code ] >> >>> [here it shifts and continues on testing each CPU bit] >>> >>>> Questions over questions...? >>> I probably went overboard with my answers :-) >> Konrad, you're killing me! :-) You actually went and looked at the >> BIOS disassembly voluntarily. You must be insane, I think you should >> immediately go to the doctor now for a thorough checkup. :-) >> >> I think I know who I can sling BIOS issues now to. > Great .. :-) >>>> Looks like save_mc_for_early would need another, local mutex to fix that. >>> Let me try that. Thanks for the suggestion. >> Ok, seriously now: yeah, this was just an idea, it should at least get >> the nesting out of the way. >> >> About the BIOS deal: you're probably staring at some BIOS out there >> but is this the way that it is actually going to be implemented on >> the physical hotplug BIOS? I mean, I've only heard rumors about IVB >> supporting physical hotplug but do you even have access to such BIOS to >> verify? > Unfortunatly not. I am getting an IvyTown box so hopefully that has this > support. But I thought that Fenghua did since he mentioned in the patch. > > Besides that I think this can also appear on VMWare if one is doing > CPU hotplug and on some HP machines - let me CC the relevant people > extracted from drivers/acpi/processor_driver.c. > (see https://lkml.org/lkml/2013/5/7/588 for the thread) > > Rafael, do you know of any specific Intel hardware that has this implemented?No, I don't have this information. By the way, here's a general request. Please CC any messages having the word "ACPI" anywhere in the body or subject to linux-acpi@vger.kernel.org. I'm really tired of chasing ACPI-related threads on the LKML just because people don't care about CCing relevant lists and I'm going to NAK all patches touching ACPI in any way if they haven't been CCed to the ACPI list. Seriously. This particular item may or may not conflict with some ongoing work taking place in there (and patches present in my tree). And you've already had problems with ACPI-related code conflicts, so that's not like it's never happened. Thanks, Rafael --------------------------------------------------------------------- Intel Technology Poland sp. z o.o. z siedziba w Gdansku ul. Slowackiego 173 80-298 Gdansk Sad Rejonowy Gdansk Polnoc w Gdansku, VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, numer KRS 101882 NIP 957-07-52-316 Kapital zakladowy 200.000 zl This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. NrybXǧv^){.n+{ib{ayʇڙ,jfhzwj:+vwjmzZ+ݢj"!i