thr3ads.net - Xen devel - v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver

If this information is useful, please help other people find it:
Share via:

Konrad Rzeszutek Wilk

2013-May-06 12:59 UTC

v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Hey,

As I was fixing the PV HVM''s broken CPU hotplug mechanism I discovered
a deadlock in the microcode and generic code.

I am not sure if the ACPI generic mechanism would expose this, but
looking at the flow (arch_register_cpu, then expecting user-space to call
cpu_up), it should trigger this.

Anyhow, this can easily be triggered if a new CPU is added and
from user-space do:

echo 1 > /sys/devices/system/cpu/cpu3/online

on a newly appeared CPU. The deadlock is that the "store_online" in
drivers/base/cpu.c takes the cpu_hotplug_driver_lock() lock, then
calls "cpu_up". "cpu_up" eventually ends up calling
"save_mc_for_early"
which also takes the cpu_hotplug_driver_lock() lock.

And here is that kernel thinks of it:

smpboot: Stack at about ffff880075c39f44
smpboot: CPU3: has booted.
microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25

============================================[ INFO: possible recursive locking
detected ]
3.9.0upstream-10129-g167af0e #1 Not tainted
---------------------------------------------
sh/2487 is trying to acquire lock:
 (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>]
cpu_hotplug_driver_lock+0x12/0x20

but task is already holding lock:
 (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>]
cpu_hotplug_driver_lock+0x12/0x20

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(x86_cpu_hotplug_driver_mutex);
  lock(x86_cpu_hotplug_driver_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

6 locks held by sh/2487:
 #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>]
vfs_write+0x17d/0x190
 #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>]
sysfs_write_file+0x3f/0x160
 #2:  (s_active#20){.+.+.+}, at: [<ffffffff81246578>]
sysfs_write_file+0xc8/0x160
 #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>]
cpu_hotplug_driver_lock+0x12/0x20
 #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>]
cpu_maps_update_begin+0x12/0x20
 #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>]
cpu_hotplug_begin+0x27/0x60

stack backtrace:
CPU: 1 PID: 2487 Comm: sh Not tainted 3.9.0upstream-10129-g167af0e #1
Hardware name: Xen HVM domU, BIOS 4.3-unstable 05/03/2013
 ffffffff8229b710 ffff880064e75538 ffffffff816fe47e ffff880064e75608
 ffffffff81100cb6 ffff880064e75560 ffff88006670e290 ffff880064e75588
 ffffffff00000000 ffff88006670e9b8 21710c800c83a1f5 ffff880064e75598
Call Trace:
 [<ffffffff816fe47e>] dump_stack+0x19/0x1b
 [<ffffffff81100cb6>] __lock_acquire+0x726/0x1890
 [<ffffffff8110b352>] ? is_module_text_address+0x22/0x40
 [<ffffffff810bbff8>] ? __kernel_text_address+0x58/0x80
 [<ffffffff81101eca>] lock_acquire+0xaa/0x190
 [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20
 [<ffffffff816fee1e>] __mutex_lock_common+0x5e/0x450
 [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20
 [<ffffffff810d4225>] ? sched_clock_local+0x25/0x90
 [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20
 [<ffffffff816ff340>] mutex_lock_nested+0x40/0x50
 [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
 [<ffffffff81080757>] save_mc_for_early+0x27/0xf0
 [<ffffffff810ffd30>] ? mark_held_locks+0x90/0x150
 [<ffffffff81176a5d>] ? get_page_from_freelist+0x46d/0x8e0
 [<ffffffff8110029d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81056a39>] ? sched_clock+0x9/0x10
 [<ffffffff81177275>] ? __alloc_pages_nodemask+0x165/0xa30
 [<ffffffff810d4225>] ? sched_clock_local+0x25/0x90
 [<ffffffff810d4348>] ? sched_clock_cpu+0xb8/0x110
 [<ffffffff810fb88d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff811a5e79>] ? vmap_page_range_noflush+0x279/0x370
 [<ffffffff811a5f9d>] ? map_vm_area+0x2d/0x50
 [<ffffffff811a7dce>] ? __vmalloc_node_range+0x18e/0x260
 [<ffffffff810812a8>] ? generic_load_microcode+0xb8/0x1c0
 [<ffffffff8108135c>] generic_load_microcode+0x16c/0x1c0
 [<ffffffff8110917e>] ? generic_exec_single+0x7e/0xb0
 [<ffffffff81081470>] ? request_microcode_user+0x20/0x20
 [<ffffffff8108142f>] request_microcode_fw+0x7f/0xa0
 [<ffffffff813356ab>] ? kobject_uevent+0xb/0x10
 [<ffffffff81081004>] microcode_init_cpu+0xf4/0x110
 [<ffffffff816f6b8c>] mc_cpu_callback+0x5b/0xb3
 [<ffffffff81706d7c>] notifier_call_chain+0x5c/0x120
 [<ffffffff810c6359>] __raw_notifier_call_chain+0x9/0x10
 [<ffffffff8109616b>] __cpu_notify+0x1b/0x30
 [<ffffffff816f72e1>] _cpu_up+0x103/0x14b
 [<ffffffff816f7404>] cpu_up+0xdb/0xee
 [<ffffffff816eda0a>] store_online+0xba/0x120
 [<ffffffff8145f08b>] dev_attr_store+0x1b/0x20
 [<ffffffff81246591>] sysfs_write_file+0xe1/0x160
 [<ffffffff811ca3ef>] vfs_write+0xdf/0x190
 [<ffffffff811ca96d>] SyS_write+0x5d/0xa0
 [<ffffffff8133f4fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8170b7a9>] system_call_fastpath+0x16/0x1b

Thoughts?

Konrad Rzeszutek Wilk

2013-May-07 19:00 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On Mon, May 06, 2013 at 08:59:37AM -0400, Konrad Rzeszutek Wilk
wrote:> 
> Hey,
> 
> As I was fixing the PV HVM''s broken CPU hotplug mechanism I
discovered
> a deadlock in the microcode and generic code.
> 
> I am not sure if the ACPI generic mechanism would expose this, but
> looking at the flow (arch_register_cpu, then expecting user-space to call
> cpu_up), it should trigger this.
Fenghua,

I dug deeper in how QEMU does it and it looks to be actually doing
the right thing. It triggers the ACPI SCI, the method that figures out
the CPU online/offline bits kicks off the right OSPM notification and
everything is going through ACPI (so _STA is on the processor is checked,
returns 0x2 (ACPI_STA_DEVICE_PRESENT), MADT has now the CPU marked as enabled).

I am now 99% sure you would be able to reproduce this on baremetal with
ACPI hotplug where the CPUs at bootup are marked as disabled in MADT.
(lapic->lapic_flags == 0).

The comment for calling save_mc_for_early says:

         /*
          * If early loading microcode is supported, save this mc into
          * permanent memory. So it will be loaded early when a CPU is hot added
          * or resumes.
          */           

Do you by any chance recall the testing you did on that? And how the
ACPI CPU hotplug mechanism worked such that this deadlock would not
have occured?

Thanks.> 
> Anyhow, this can easily be triggered if a new CPU is added and
> from user-space do:
> 
> echo 1 > /sys/devices/system/cpu/cpu3/online
> 
> on a newly appeared CPU. The deadlock is that the "store_online"
in
> drivers/base/cpu.c takes the cpu_hotplug_driver_lock() lock, then
> calls "cpu_up". "cpu_up" eventually ends up calling
"save_mc_for_early"
> which also takes the cpu_hotplug_driver_lock() lock.
> 
> And here is that kernel thinks of it:
> 
> smpboot: Stack at about ffff880075c39f44
> smpboot: CPU3: has booted.
> microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25
> 
> ============================================> [ INFO: possible recursive
locking detected ]
> 3.9.0upstream-10129-g167af0e #1 Not tainted
> ---------------------------------------------
> sh/2487 is trying to acquire lock:
>  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>]
cpu_hotplug_driver_lock+0x12/0x20
> 
> but task is already holding lock:
>  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>]
cpu_hotplug_driver_lock+0x12/0x20
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(x86_cpu_hotplug_driver_mutex);
>   lock(x86_cpu_hotplug_driver_mutex);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation
> 
> 6 locks held by sh/2487:
>  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>]
vfs_write+0x17d/0x190
>  #1:  (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>]
sysfs_write_file+0x3f/0x160
>  #2:  (s_active#20){.+.+.+}, at: [<ffffffff81246578>]
sysfs_write_file+0xc8/0x160
>  #3:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at:
[<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
>  #4:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>]
cpu_maps_update_begin+0x12/0x20
>  #5:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>]
cpu_hotplug_begin+0x27/0x60
> 
> stack backtrace:
> CPU: 1 PID: 2487 Comm: sh Not tainted 3.9.0upstream-10129-g167af0e #1
> Hardware name: Xen HVM domU, BIOS 4.3-unstable 05/03/2013
>  ffffffff8229b710 ffff880064e75538 ffffffff816fe47e ffff880064e75608
>  ffffffff81100cb6 ffff880064e75560 ffff88006670e290 ffff880064e75588
>  ffffffff00000000 ffff88006670e9b8 21710c800c83a1f5 ffff880064e75598
> Call Trace:
>  [<ffffffff816fe47e>] dump_stack+0x19/0x1b
>  [<ffffffff81100cb6>] __lock_acquire+0x726/0x1890
>  [<ffffffff8110b352>] ? is_module_text_address+0x22/0x40
>  [<ffffffff810bbff8>] ? __kernel_text_address+0x58/0x80
>  [<ffffffff81101eca>] lock_acquire+0xaa/0x190
>  [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20
>  [<ffffffff816fee1e>] __mutex_lock_common+0x5e/0x450
>  [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20
>  [<ffffffff810d4225>] ? sched_clock_local+0x25/0x90
>  [<ffffffff81075512>] ? cpu_hotplug_driver_lock+0x12/0x20
>  [<ffffffff816ff340>] mutex_lock_nested+0x40/0x50
>  [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20
>  [<ffffffff81080757>] save_mc_for_early+0x27/0xf0
>  [<ffffffff810ffd30>] ? mark_held_locks+0x90/0x150
>  [<ffffffff81176a5d>] ? get_page_from_freelist+0x46d/0x8e0
>  [<ffffffff8110029d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff81056a39>] ? sched_clock+0x9/0x10
>  [<ffffffff81177275>] ? __alloc_pages_nodemask+0x165/0xa30
>  [<ffffffff810d4225>] ? sched_clock_local+0x25/0x90
>  [<ffffffff810d4348>] ? sched_clock_cpu+0xb8/0x110
>  [<ffffffff810fb88d>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff811a5e79>] ? vmap_page_range_noflush+0x279/0x370
>  [<ffffffff811a5f9d>] ? map_vm_area+0x2d/0x50
>  [<ffffffff811a7dce>] ? __vmalloc_node_range+0x18e/0x260
>  [<ffffffff810812a8>] ? generic_load_microcode+0xb8/0x1c0
>  [<ffffffff8108135c>] generic_load_microcode+0x16c/0x1c0
>  [<ffffffff8110917e>] ? generic_exec_single+0x7e/0xb0
>  [<ffffffff81081470>] ? request_microcode_user+0x20/0x20
>  [<ffffffff8108142f>] request_microcode_fw+0x7f/0xa0
>  [<ffffffff813356ab>] ? kobject_uevent+0xb/0x10
>  [<ffffffff81081004>] microcode_init_cpu+0xf4/0x110
>  [<ffffffff816f6b8c>] mc_cpu_callback+0x5b/0xb3
>  [<ffffffff81706d7c>] notifier_call_chain+0x5c/0x120
>  [<ffffffff810c6359>] __raw_notifier_call_chain+0x9/0x10
>  [<ffffffff8109616b>] __cpu_notify+0x1b/0x30
>  [<ffffffff816f72e1>] _cpu_up+0x103/0x14b
>  [<ffffffff816f7404>] cpu_up+0xdb/0xee
>  [<ffffffff816eda0a>] store_online+0xba/0x120
>  [<ffffffff8145f08b>] dev_attr_store+0x1b/0x20
>  [<ffffffff81246591>] sysfs_write_file+0xe1/0x160
>  [<ffffffff811ca3ef>] vfs_write+0xdf/0x190
>  [<ffffffff811ca96d>] SyS_write+0x5d/0xa0
>  [<ffffffff8133f4fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8170b7a9>] system_call_fastpath+0x16/0x1b
> 
> Thoughts?

Borislav Petkov

2013-May-08 12:54 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On Tue, May 07, 2013 at 03:00:24PM -0400, Konrad Rzeszutek Wilk
wrote:> I dug deeper in how QEMU does it and it looks to be actually doing
> the right thing. It triggers the ACPI SCI, the method that figures
> out the CPU online/offline bits kicks off the right OSPM notification
> and everything is going through ACPI (so _STA is on the processor is
> checked, returns 0x2 (ACPI_STA_DEVICE_PRESENT), MADT has now the CPU
> marked as enabled).
AFAIUC, you mean physical hotplug here which is done with ACPI, right?
And, if so, we actually need an x86 machine which supports that to test
it on. Also, is this how physical hotplug is done? Put the number of
max. supported CPUs in MADT and those which are not present are marked
as disabled?

Then, when they''re physically hotplugged, ACPI marks them as enabled?
Questions over questions...?
> I am now 99% sure you would be able to reproduce this on baremetal with
> ACPI hotplug where the CPUs at bootup are marked as disabled in MADT.
> (lapic->lapic_flags == 0).
> 
> The comment for calling save_mc_for_early says:
Looks like save_mc_for_early would need another, local mutex to fix that.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

Konrad Rzeszutek Wilk

2013-May-08 14:03 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On Wed, May 08, 2013 at 02:54:14PM +0200, Borislav Petkov
wrote:> On Tue, May 07, 2013 at 03:00:24PM -0400, Konrad Rzeszutek Wilk wrote:
> > I dug deeper in how QEMU does it and it looks to be actually doing
> > the right thing. It triggers the ACPI SCI, the method that figures
> > out the CPU online/offline bits kicks off the right OSPM notification
> > and everything is going through ACPI (so _STA is on the processor is
> > checked, returns 0x2 (ACPI_STA_DEVICE_PRESENT), MADT has now the CPU
> > marked as enabled).
> 
> AFAIUC, you mean physical hotplug here which is done with ACPI, right?
Yes.
> And, if so, we actually need an x86 machine which supports that to test
> it on. Also, is this how physical hotplug is done? Put the number of
> max. supported CPUs in MADT and those which are not present are marked
> as disabled?
> 
> Then, when they''re physically hotplugged, ACPI marks them as
enabled?
Yes. The GPE is raised (by QEMU), the ACPI Method PRSC is invoked:

 Scope (_GPE)                                                                
    {                                                                           
        Method (_L02, 0, NotSerialized)                                         
        {                                                                       
            Return (\_SB.PRSC ())                                               
        }                                                                       
    }                                                                           
                                

Which iterates over the AXF00 (32 bytes) and checks each bit to see if
it is enabled (so CPU is on) or disabled. Then if it is different
from the MADT.FLG entry (so the ''flags'' entry in the MADT), it
updates
the MADT entry to have one (or zero if it has been disabled). And then
Notifies the Processor. Here is what the Processor entry looks like:

  Processor (PR02, 0x02, 0x0000B010, 0x06)                                
        {                                                                       
            Name (_HID, "ACPI0007")
            OperationRegion (MATR, SystemMemory, Add (MAPA, 0x10), 0x08)        

[MAPA is the physical address to the MADT, the 0x10 increases by eight
bytes for each CPU]

            Field (MATR, ByteAcc, NoLock, Preserve)                             
            {                                                                   
                MAT,    64                                                      
            }                                                                   

[so it is 64 bits, the MAT is used in the ''_STA'' method to
return the whole
contents of said memory location]
                                                                                
            Field (MATR, ByteAcc, NoLock, Preserve)                             
            {                                                                   
                        Offset (0x04),                                          
                FLG,    1                                                       
            }     
[and FLG is at offset 4 (out of 8 bytes), which means it lands on the
lapic->flags entry]
                                       
The PRSC method does what I mentioned above.
                                                                           
        OperationRegion (PRST, SystemIO, 0xAF00, 0x20)                          
        Field (PRST, ByteAcc, NoLock, Preserve)                                 
        {                                                                       
            PRS,    15                                                          
        }                                                                       
                                                                                
        Method (PRSC, 0, NotSerialized)                                         
        {                                                                       
            Store (ToBuffer (PRS), Local0)                                      
[Local0 has now the 32 bytes of data]

            Store (DerefOf (Index (Local0, Zero)), Local1)                      

[Local1 has now the zero-th byte of the 32-bytes. Each bit is one CPU, so
it contains the value of eight CPUs]

            And (Local1, One, Local2)                                           

[Local2 = gpe_state.cpu_sts[i] & 1, aka first CPU]

            If (LNotEqual (Local2, ^PR00.FLG))                                  
            {                                                                   
                Store (Local2, ^PR00.FLG)                                       

[Write the bit in the PR00.FLG, so at offset four in the MADT]

                If (LEqual (Local2, One))                                       
                {                                                               
[If it was enabled, and now is disabled, then notify with 1]
                    Notify (PR00, One)                                          
                    Subtract (MSU, One, MSU)                                    
[fix up the checksum]
                }                                                               
                Else                                                            
                {                                                               
                    Notify (PR00, 0x03)                                         
[if it was disabled, and now enabled, then notify with 0x3]
                    Add (MSU, One, MSU)                    
[again, fix up the checksum]
                     
                }                                                               
            }                                                                   
                                                                                
            ShiftRight (Local1, One, Local1)              

[here it shifts and continues on testing each CPU bit]
> Questions over questions...?
I probably went overboard with my answers :-)> 
> > I am now 99% sure you would be able to reproduce this on baremetal
with
> > ACPI hotplug where the CPUs at bootup are marked as disabled in MADT.
> > (lapic->lapic_flags == 0).
> > 
> > The comment for calling save_mc_for_early says:
> 
> Looks like save_mc_for_early would need another, local mutex to fix that.
Let me try that. Thanks for the suggestion.> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

Borislav Petkov

2013-May-08 14:29 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote:

[ … snip some funky BIOS code ]
> [here it shifts and continues on testing each CPU bit]
> 
> > Questions over questions...?
> 
> I probably went overboard with my answers :-)
Konrad, you''re killing me! :-) You actually went and looked at the
BIOS disassembly voluntarily. You must be insane, I think you should
immediately go to the doctor now for a thorough checkup. :-)

I think I know who I can sling BIOS issues now to.
> > Looks like save_mc_for_early would need another, local mutex to fix
that.
> 
> Let me try that. Thanks for the suggestion.
Ok, seriously now: yeah, this was just an idea, it should at least get
the nesting out of the way.

About the BIOS deal: you''re probably staring at some BIOS out there
but is this the way that it is actually going to be implemented on
the physical hotplug BIOS? I mean, I''ve only heard rumors about IVB
supporting physical hotplug but do you even have access to such BIOS to
verify?

All I''m saying is, let''s not hurry too much before we actually
can
really trigger this on baremetal.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

H. Peter Anvin

2013-May-08 15:20 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On 05/08/2013 07:29 AM, Borislav Petkov wrote:> On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote:
> 
> [ … snip some funky BIOS code ]
> 
>> [here it shifts and continues on testing each CPU bit]
>>
>>> Questions over questions...?
>>
>> I probably went overboard with my answers :-)
> 
> Konrad, you''re killing me! :-) You actually went and looked at the
> BIOS disassembly voluntarily. You must be insane, I think you should
> immediately go to the doctor now for a thorough checkup. :-)
> 
> I think I know who I can sling BIOS issues now to.
> 
Nah, that was just the ACPI AML.  Real BIOS code is much scarier.

	-hpa

Ross Philipson

2013-May-08 15:41 UTC

head link

RE: [Xen-devel] v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Borislav Petkov
> Sent: Wednesday, May 08, 2013 7:30 AM
> To: Konrad Rzeszutek Wilk
> Cc: fenghua.yu@intel.com; xen-devel@lists.xensource.com; x86@kernel.org;
> linux-kernel@vger.kernel.org; mingo@redhat.com; hpa@zytor.com;
> tglx@linutronix.de
> Subject: Re: [Xen-devel] v3.9 - CPU hotplug and microcode earlier
> loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)
> 
> On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote:
> 
> [ … snip some funky BIOS code ]
> 
> > [here it shifts and continues on testing each CPU bit]
> >
> > > Questions over questions...?
> >
> > I probably went overboard with my answers :-)
> 
> Konrad, you're killing me! :-) You actually went and looked at the
> BIOS disassembly voluntarily. You must be insane, I think you should
> immediately go to the doctor now for a thorough checkup. :-)
> 
> I think I know who I can sling BIOS issues now to.
> 
> > > Looks like save_mc_for_early would need another, local mutex to
fix
> that.
> >
> > Let me try that. Thanks for the suggestion.
> 
> Ok, seriously now: yeah, this was just an idea, it should at least get
> the nesting out of the way.
> 
> About the BIOS deal: you're probably staring at some BIOS out there
> but is this the way that it is actually going to be implemented on
> the physical hotplug BIOS? I mean, I've only heard rumors about IVB
> supporting physical hotplug but do you even have access to such BIOS to
> verify?
The ASL Konrad traced through is to make the ACPI bits a guest
interoperate with qemu etc. On bare metal it will be much more opaque
and ugly; the processor related AML will probably burrow down into an
SMI to do the real work.

Ross> 
> All I'm saying is, let's not hurry too much before we actually can
> really trigger this on baremetal.
> 
> Thanks.
> 
> --
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-May-08 16:13 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

> > > 
> > > The comment for calling save_mc_for_early says:
> > 
> > Looks like save_mc_for_early would need another, local mutex to fix
that.
> 
> Let me try that. Thanks for the suggestion.
Looks to work nicely. Just posted a patch for this.> > 
> > -- 
> > Regards/Gruss,
> >     Boris.
> > 
> > Sent from a fat crate under my desk. Formatting is fine.
> > --

Konrad Rzeszutek Wilk

2013-May-08 16:32 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov
wrote:> On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote:
> 
> [ … snip some funky BIOS code ]
> 
> > [here it shifts and continues on testing each CPU bit]
> > 
> > > Questions over questions...?
> > 
> > I probably went overboard with my answers :-)
> 
> Konrad, you''re killing me! :-) You actually went and looked at the
> BIOS disassembly voluntarily. You must be insane, I think you should
> immediately go to the doctor now for a thorough checkup. :-)
> 
> I think I know who I can sling BIOS issues now to.
Great .. :-)> 
> > > Looks like save_mc_for_early would need another, local mutex to
fix that.
> > 
> > Let me try that. Thanks for the suggestion.
> 
> Ok, seriously now: yeah, this was just an idea, it should at least get
> the nesting out of the way.
> 
> About the BIOS deal: you''re probably staring at some BIOS out
there
> but is this the way that it is actually going to be implemented on
> the physical hotplug BIOS? I mean, I''ve only heard rumors about
IVB
> supporting physical hotplug but do you even have access to such BIOS to
> verify?
Unfortunatly not. I am getting an IvyTown box so hopefully that has this
support. But I thought that Fenghua did since he mentioned in the patch.

Besides that I think this can also appear on VMWare if one is doing 
CPU hotplug and on some HP machines - let me CC the relevant people
extracted from drivers/acpi/processor_driver.c.
(see  https://lkml.org/lkml/2013/5/7/588 for the thread)

Rafael, do you know of any specific Intel hardware that has this
implemented?> 
> All I''m saying is, let''s not hurry too much before we
actually can
> really trigger this on baremetal.
> 
> Thanks.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Sent from a fat crate under my desk. Formatting is fine.
> --

Toshi Kani

2013-May-08 18:19 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On Wed, 2013-05-08 at 12:32 -0400, Konrad Rzeszutek Wilk
wrote:> On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov wrote:
> > On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote:
> > 
> > [ … snip some funky BIOS code ]
> > 
> > > [here it shifts and continues on testing each CPU bit]
> > > 
> > > > Questions over questions...?
> > > 
> > > I probably went overboard with my answers :-)
> > 
> > Konrad, you''re killing me! :-) You actually went and looked
at the
> > BIOS disassembly voluntarily. You must be insane, I think you should
> > immediately go to the doctor now for a thorough checkup. :-)
> > 
> > I think I know who I can sling BIOS issues now to.
> 
> Great .. :-)
> > 
> > > > Looks like save_mc_for_early would need another, local mutex
to fix that.
> > > 
> > > Let me try that. Thanks for the suggestion.
> > 
> > Ok, seriously now: yeah, this was just an idea, it should at least get
> > the nesting out of the way.
> > 
> > About the BIOS deal: you''re probably staring at some BIOS out
there
> > but is this the way that it is actually going to be implemented on
> > the physical hotplug BIOS? I mean, I''ve only heard rumors
about IVB
> > supporting physical hotplug but do you even have access to such BIOS
to
> > verify?
> 
> Unfortunatly not. I am getting an IvyTown box so hopefully that has this
> support. But I thought that Fenghua did since he mentioned in the patch.
> 
> Besides that I think this can also appear on VMWare if one is doing 
> CPU hotplug and on some HP machines - let me CC the relevant people
> extracted from drivers/acpi/processor_driver.c.
> (see  https://lkml.org/lkml/2013/5/7/588 for the thread)
From the stack trace, it looks like a bug in save_mc_for_early().  This
function may not call cpu_hotplug_driver_lock() during CPU online.  I
suppose it intends to protect from CPU offline when microcode is updated
outside of the boot/CPU online context.  If it indeed supports updating
the microcode without using reboot/cpu hotplug, the lock should be held
when such update request is made.

Thanks,
-Toshi

Konrad Rzeszutek Wilk

2013-May-08 18:42 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On Wed, May 08, 2013 at 12:19:45PM -0600, Toshi Kani
wrote:> On Wed, 2013-05-08 at 12:32 -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov wrote:
> > > On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk
wrote:
> > > 
> > > [ … snip some funky BIOS code ]
> > > 
> > > > [here it shifts and continues on testing each CPU bit]
> > > > 
> > > > > Questions over questions...?
> > > > 
> > > > I probably went overboard with my answers :-)
> > > 
> > > Konrad, you''re killing me! :-) You actually went and
looked at the
> > > BIOS disassembly voluntarily. You must be insane, I think you
should
> > > immediately go to the doctor now for a thorough checkup. :-)
> > > 
> > > I think I know who I can sling BIOS issues now to.
> > 
> > Great .. :-)
> > > 
> > > > > Looks like save_mc_for_early would need another, local
mutex to fix that.
> > > > 
> > > > Let me try that. Thanks for the suggestion.
> > > 
> > > Ok, seriously now: yeah, this was just an idea, it should at
least get
> > > the nesting out of the way.
> > > 
> > > About the BIOS deal: you''re probably staring at some
BIOS out there
> > > but is this the way that it is actually going to be implemented
on
> > > the physical hotplug BIOS? I mean, I''ve only heard
rumors about IVB
> > > supporting physical hotplug but do you even have access to such
BIOS to
> > > verify?
> > 
> > Unfortunatly not. I am getting an IvyTown box so hopefully that has
this
> > support. But I thought that Fenghua did since he mentioned in the
patch.
> > 
> > Besides that I think this can also appear on VMWare if one is doing 
> > CPU hotplug and on some HP machines - let me CC the relevant people
> > extracted from drivers/acpi/processor_driver.c.
> > (see  https://lkml.org/lkml/2013/5/7/588 for the thread)
> 
> >From the stack trace, it looks like a bug in save_mc_for_early().  This
> function may not call cpu_hotplug_driver_lock() during CPU online.  I
> suppose it intends to protect from CPU offline when microcode is updated
> outside of the boot/CPU online context.  If it indeed supports updating
> the microcode without using reboot/cpu hotplug, the lock should be held
> when such update request is made.

Hey Toshi,

The fix for this I have posted, but I am more curious whether you have
seen on baremetal this issue? Meaning using CPU ACPI hotplug on v3.9?
> 
> Thanks,
> -Toshi
>

Toshi Kani

2013-May-08 18:59 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On Wed, 2013-05-08 at 14:42 -0400, Konrad Rzeszutek Wilk
wrote:> On Wed, May 08, 2013 at 12:19:45PM -0600, Toshi Kani wrote:
> > On Wed, 2013-05-08 at 12:32 -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov wrote:
> > > > On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek
Wilk wrote:
> > > > 
> > > > [ … snip some funky BIOS code ]
> > > > 
> > > > > [here it shifts and continues on testing each CPU bit]
> > > > > 
> > > > > > Questions over questions...?
> > > > > 
> > > > > I probably went overboard with my answers :-)
> > > > 
> > > > Konrad, you''re killing me! :-) You actually went
and looked at the
> > > > BIOS disassembly voluntarily. You must be insane, I think
you should
> > > > immediately go to the doctor now for a thorough checkup. :-)
> > > > 
> > > > I think I know who I can sling BIOS issues now to.
> > > 
> > > Great .. :-)
> > > > 
> > > > > > Looks like save_mc_for_early would need another,
local mutex to fix that.
> > > > > 
> > > > > Let me try that. Thanks for the suggestion.
> > > > 
> > > > Ok, seriously now: yeah, this was just an idea, it should at
least get
> > > > the nesting out of the way.
> > > > 
> > > > About the BIOS deal: you''re probably staring at
some BIOS out there
> > > > but is this the way that it is actually going to be
implemented on
> > > > the physical hotplug BIOS? I mean, I''ve only heard
rumors about IVB
> > > > supporting physical hotplug but do you even have access to
such BIOS to
> > > > verify?
> > > 
> > > Unfortunatly not. I am getting an IvyTown box so hopefully that
has this
> > > support. But I thought that Fenghua did since he mentioned in the
patch.
> > > 
> > > Besides that I think this can also appear on VMWare if one is
doing
> > > CPU hotplug and on some HP machines - let me CC the relevant
people
> > > extracted from drivers/acpi/processor_driver.c.
> > > (see  https://lkml.org/lkml/2013/5/7/588 for the thread)
> > 
> > >From the stack trace, it looks like a bug in save_mc_for_early(). 
This
> > function may not call cpu_hotplug_driver_lock() during CPU online.  I
> > suppose it intends to protect from CPU offline when microcode is
updated
> > outside of the boot/CPU online context.  If it indeed supports
updating
> > the microcode without using reboot/cpu hotplug, the lock should be
held
> > when such update request is made.
> 
> 
> Hey Toshi,
> 
> The fix for this I have posted, but I am more curious whether you have
> seen on baremetal this issue? Meaning using CPU ACPI hotplug on v3.9?
Oh, I see.  No, I do not have a beremetal test machine that supports CPU
hotplug yet.  My initial target is to support CPU hotplug on VMs.  I did
not hit this issue since my test env was limited to CPU hot-delete
followed by CPU hot-add.  In such case, uci->valid is still set during
CPU hot-add and does not get into this code path.  I am not sure if
uci->valid is supposed to be set after CPU hot-delete, though.

Thanks,
-Toshi

Rafael J. Wysocki

2013-May-09 00:23 UTC

head link

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

On 5/8/2013 6:32 PM, Konrad Rzeszutek Wilk wrote:> On Wed, May 08, 2013 at 04:29:49PM +0200, Borislav Petkov wrote:
>> On Wed, May 08, 2013 at 10:03:42AM -0400, Konrad Rzeszutek Wilk wrote:
>>
>> [ … snip some funky BIOS code ]
>>
>>> [here it shifts and continues on testing each CPU bit]
>>>
>>>> Questions over questions...?
>>> I probably went overboard with my answers :-)
>> Konrad, you're killing me! :-) You actually went and looked at the
>> BIOS disassembly voluntarily. You must be insane, I think you should
>> immediately go to the doctor now for a thorough checkup. :-)
>>
>> I think I know who I can sling BIOS issues now to.
> Great .. :-)
>>>> Looks like save_mc_for_early would need another, local mutex to
fix that.
>>> Let me try that. Thanks for the suggestion.
>> Ok, seriously now: yeah, this was just an idea, it should at least get
>> the nesting out of the way.
>>
>> About the BIOS deal: you're probably staring at some BIOS out there
>> but is this the way that it is actually going to be implemented on
>> the physical hotplug BIOS? I mean, I've only heard rumors about IVB
>> supporting physical hotplug but do you even have access to such BIOS to
>> verify?
> Unfortunatly not. I am getting an IvyTown box so hopefully that has this
> support. But I thought that Fenghua did since he mentioned in the patch.
>
> Besides that I think this can also appear on VMWare if one is doing
> CPU hotplug and on some HP machines - let me CC the relevant people
> extracted from drivers/acpi/processor_driver.c.
> (see  https://lkml.org/lkml/2013/5/7/588 for the thread)
>
> Rafael, do you know of any specific Intel hardware that has this
implemented?
No, I don't have this information.

By the way, here's a general request.

Please CC any messages having the word "ACPI" anywhere in the body or 
subject to linux-acpi@vger.kernel.org.  I'm really tired of chasing 
ACPI-related threads on the LKML just because people don't care about 
CCing relevant lists and I'm going to NAK all patches touching ACPI in 
any way if they haven't been CCed to the ACPI list.  Seriously.

This particular item may or may not conflict with some ongoing work 
taking place in there (and patches present in my tree).

And you've already had problems with ACPI-related code conflicts, so 
that's not like it's never happened.

Thanks,
Rafael

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
NrybXǧv^)޺{.n+{ib{ayʇڙ,jfhzwj:+vwjmzZ+ݢj"!i

Xen devel - May 2013 - v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

RE: [Xen-devel] v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)

Re: v3.9 - CPU hotplug and microcode earlier loading hits a mutex deadlock (x86_cpu_hotplug_driver_mutex)