Haitao Shan
2011-Mar-01 07:42 UTC
[Xen-devel] [Question] Is it safe to call "xmalloc()" with irq disabled?
Hi, Keir, In recent effort on debugging cpu offline/online, I met Xen panic some times. The reason of the panic is caused by following code path: xmalloc ---> alloc_heap_pages ---> flush_area_mask { ASSERT(local_irq_enabled)........} This bring me the question: is it safe to call xmalloc with local irq disabled? As you can see, not all alloc_heap_pages will result in TLB flushing. But once it calls, the assertion will fail. In my case, the xmalloc is called with starting secondary processors. Some initialization code run with local irq enabled, for example, the MCA initialization. Normally this piece of code runs when all heap pages do not have a former owner (no domain is initialized at booting time, I guess), so calling xmalloc won''t be a problem. But later when this same piece of code runs as a result of cpu online operation, it has possibility to trigger the assertion failure. What''s you view on this, Keir? Is it the design that xmalloc must be called with local irq enabled? I have done a hack to remove the assertion. Every things work just fine to me. But maybe I just happened not to run into any problem with the hack. Shan Haitao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Mar-01 08:16 UTC
[Xen-devel] Re: [Question] Is it safe to call "xmalloc()" with irq disabled?
Haitao, Both _xmalloc and xfree can only safely be called with irqs enabled. I know there is a somewhat suspicious area during CPU bringup where we temporarily disable spinlock debugging. It would be nice to not need this. And for this particular bug you are dealing with, perhaps we can fix it now -- what is the backtrace for the failing allocation? -- Keir On 01/03/2011 07:42, "Haitao Shan" <maillists.shan@gmail.com> wrote:> Hi, Keir, > > In recent effort on debugging cpu offline/online, I met Xen panic some times. > > The reason of the panic is caused by following code path: > > xmalloc ---> alloc_heap_pages ---> flush_area_mask { > ASSERT(local_irq_enabled)........} > > This bring me the question: is it safe to call xmalloc with local irq > disabled? As you can see, not all alloc_heap_pages will result in TLB > flushing. But once it calls, the assertion will fail. > > In my case, the xmalloc is called with starting secondary processors. Some > initialization code run with local irq enabled, for example, the MCA > initialization. Normally this piece of code runs when all heap pages do not > have a former owner (no domain is initialized at booting time, I guess), so > calling xmalloc won''t be a problem. But later when this same piece of code > runs as a result of cpu online operation, it has possibility to trigger the > assertion failure. > > What''s you view on this, Keir? Is it the design that xmalloc must be called > with local irq enabled? I have done a hack to remove the assertion. Every > things work just fine to me. But maybe I just happened not to run into any > problem with the hack. > > Shan Haitao >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Haitao Shan
2011-Mar-01 08:22 UTC
[Xen-devel] Re: [Question] Is it safe to call "xmalloc()" with irq disabled?
Hi, Keir, Below is the log when I met the issue.>> =============== >> (XEN) ffff83023e257e40 ffff82c48019cc80 0000000100000000>> 0000000000000039 (XEN) ffff82c4802c6a00 0000000000000039>> ffff82c4802c6a00 0000000000000039 (XEN) 0000000000000000>> 0000000000000000 ffff83023e257e70>> ffff82c48019f49a>> (XEN) 0000000000000039 ffff82c4802c6a00 ffff83023e257e90>> ffff82c48019cb8d (XEN) 0000000000000039 ffff82c4802c6a00>> ffff83023e257eb0 ffff82c48017815b (XEN) Xen call trace: (XEN)>> [<ffff82c480177b56>] flush_area_mask+0x1b/0x127 (XEN)>> [<ffff82c480115d69>] alloc_heap_pages+0x5d6/0x61b (XEN)>> [<ffff82c480115e75>] alloc_domheap_pages+0xc7/0x13d (XEN)>> [<ffff82c480115f3b>] alloc_xenheap_pages+0x50/0xd8 (XEN)>> [<ffff82c480129e50>] xmalloc_pool_get+0x2b/0x2d (XEN)>> [<ffff82c48012a674>] xmem_pool_alloc+0x26c/0x4c2 (XEN)>> [<ffff82c48012a9d0>] _xmalloc+0x106/0x1b6 (XEN)>> [<ffff82c48019ec25>] mcabanks_alloc+0x18/0xa4 (XEN)>> [<ffff82c4801a27b6>] intel_mcheck_init+0x21/0x64e (XEN)>> [<ffff82c48019f49a>] mcheck_init+0xdd/0x1b2 (XEN)>> [<ffff82c48019cb8d>] identify_cpu+0x27d/0x282 (XEN)>> [<ffff82c48017815b>] smp_store_cpu_info+0x3b/0xca (XEN)>> [<ffff82c4801782e5>] smp_callin+0x8e/0x157 (XEN)>> [<ffff82c4801799b5>] start_secondary+0xab/0x126 (XEN) (XEN)>> (XEN) ****************************************>> (XEN) Panic on CPU 57:>> (XEN) Assertion ''local_irq_is_enabled()'' failed at smp.c:234>> (XEN) ****************************************>> (XEN)>> (XEN) Reboot in five seconds...^M>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.>> ==============2011/3/1 Keir Fraser <keir.xen@gmail.com>> Haitao, > > Both _xmalloc and xfree can only safely be called with irqs enabled. I know > there is a somewhat suspicious area during CPU bringup where we temporarily > disable spinlock debugging. It would be nice to not need this. And for this > particular bug you are dealing with, perhaps we can fix it now -- what is > the backtrace for the failing allocation? > > -- Keir > > On 01/03/2011 07:42, "Haitao Shan" <maillists.shan@gmail.com> wrote: > > > Hi, Keir, > > > > In recent effort on debugging cpu offline/online, I met Xen panic some > times. > > > > The reason of the panic is caused by following code path: > > > > xmalloc ---> alloc_heap_pages ---> flush_area_mask { > > ASSERT(local_irq_enabled)........} > > > > This bring me the question: is it safe to call xmalloc with local irq > > disabled? As you can see, not all alloc_heap_pages will result in TLB > > flushing. But once it calls, the assertion will fail. > > > > In my case, the xmalloc is called with starting secondary processors. > Some > > initialization code run with local irq enabled, for example, the MCA > > initialization. Normally this piece of code runs when all heap pages do > not > > have a former owner (no domain is initialized at booting time, I guess), > so > > calling xmalloc won''t be a problem. But later when this same piece of > code > > runs as a result of cpu online operation, it has possibility to trigger > the > > assertion failure. > > > > What''s you view on this, Keir? Is it the design that xmalloc must be > called > > with local irq enabled? I have done a hack to remove the assertion. Every > > things work just fine to me. But maybe I just happened not to run into > any > > problem with the hack. > > > > Shan Haitao > > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Mar-01 08:50 UTC
[Xen-devel] Re: [Question] Is it safe to call "xmalloc()" with irq disabled?
We need to move dynamic allocation into CPU_UP_PREPARE context. Sadly that will need surgery on the machine-check cruft. I''ll take a look later today and see if I can do a suitable hatchet job for 4.1. -- Keir On 01/03/2011 08:22, "Haitao Shan" <maillists.shan@gmail.com> wrote:> Hi, Keir, > > Below is the log when I met the issue. >>> ===============>>> (XEN) ffff83023e257e40 ffff82c48019cc80 0000000100000000 >>> 0000000000000039 (XEN) ffff82c4802c6a00 0000000000000039 >>> ffff82c4802c6a00 0000000000000039 (XEN) 0000000000000000 >>> 0000000000000000 ffff83023e257e70 >>> ffff82c48019f49a >>> (XEN) 0000000000000039 ffff82c4802c6a00 ffff83023e257e90 >>> ffff82c48019cb8d (XEN) 0000000000000039 ffff82c4802c6a00 >>> ffff83023e257eb0 ffff82c48017815b (XEN) Xen call trace: (XEN) >>> [<ffff82c480177b56>] flush_area_mask+0x1b/0x127 (XEN) >>> [<ffff82c480115d69>] alloc_heap_pages+0x5d6/0x61b (XEN) >>> [<ffff82c480115e75>] alloc_domheap_pages+0xc7/0x13d (XEN) >>> [<ffff82c480115f3b>] alloc_xenheap_pages+0x50/0xd8 (XEN) >>> [<ffff82c480129e50>] xmalloc_pool_get+0x2b/0x2d (XEN) >>> [<ffff82c48012a674>] xmem_pool_alloc+0x26c/0x4c2 (XEN) >>> [<ffff82c48012a9d0>] _xmalloc+0x106/0x1b6 (XEN) >>> [<ffff82c48019ec25>] mcabanks_alloc+0x18/0xa4 (XEN) >>> [<ffff82c4801a27b6>] intel_mcheck_init+0x21/0x64e (XEN) >>> [<ffff82c48019f49a>] mcheck_init+0xdd/0x1b2 (XEN) >>> [<ffff82c48019cb8d>] identify_cpu+0x27d/0x282 (XEN) >>> [<ffff82c48017815b>] smp_store_cpu_info+0x3b/0xca (XEN) >>> [<ffff82c4801782e5>] smp_callin+0x8e/0x157 (XEN) >>> [<ffff82c4801799b5>] start_secondary+0xab/0x126 (XEN) (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 57: >>> (XEN) Assertion ''local_irq_is_enabled()'' failed at smp.c:234 >>> (XEN) **************************************** >>> (XEN) >>> (XEN) Reboot in five seconds...^M >>> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. >>> ==============> > 2011/3/1 Keir Fraser <keir.xen@gmail.com> >> Haitao, >> >> Both _xmalloc and xfree can only safely be called with irqs enabled. I know >> there is a somewhat suspicious area during CPU bringup where we temporarily >> disable spinlock debugging. It would be nice to not need this. And for this >> particular bug you are dealing with, perhaps we can fix it now -- what is >> the backtrace for the failing allocation? >> >> -- Keir >> >> On 01/03/2011 07:42, "Haitao Shan" <maillists.shan@gmail.com> wrote: >> >>> Hi, Keir, >>> >>> In recent effort on debugging cpu offline/online, I met Xen panic some >>> times. >>> >>> The reason of the panic is caused by following code path: >>> >>> xmalloc ---> alloc_heap_pages ---> flush_area_mask { >>> ASSERT(local_irq_enabled)........} >>> >>> This bring me the question: is it safe to call xmalloc with local irq >>> disabled? As you can see, not all alloc_heap_pages will result in TLB >>> flushing. But once it calls, the assertion will fail. >>> >>> In my case, the xmalloc is called with starting secondary processors. Some >>> initialization code run with local irq enabled, for example, the MCA >>> initialization. Normally this piece of code runs when all heap pages do not >>> have a former owner (no domain is initialized at booting time, I guess), so >>> calling xmalloc won''t be a problem. But later when this same piece of code >>> runs as a result of cpu online operation, it has possibility to trigger the >>> assertion failure. >>> >>> What''s you view on this, Keir? Is it the design that xmalloc must be called >>> with local irq enabled? I have done a hack to remove the assertion. Every >>> things work just fine to me. But maybe I just happened not to run into any >>> problem with the hack. >>> >>> Shan Haitao >>> >> >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel