Sometime after c/s 19133 (I tried bisecting but there were some indeterminate results), during a shutdown/reboot in the tboot routines when it creates the 1:1 mapping for itself, the map_pages_to_xen() call ends up in alloc_domheap_pages() where it triggers the assertion ''ASSERT(!in_irq());''. In addition, and even stranger, is that when resuming from S3 is generates another assertion, ''BUG_ON(unlikely(in_irq()));'' in invalidate_shadow_ldt(). From some debugging, the first assertion on entry is because the irq_count is 1 and the second is because it''s -1. Adding a irq_exit() before map_pages_to_xen() fixes the first assertion and causes the second, which is then fixed by irq_enter() on resume. But why are these necessary? Even if we say that something has caused the irq_count to go positive before shutdown (but what-it wasn''t like this before pulling a more recent tree), the irq_exit() that gets rid of the assertion means that the count has gone to 0-so why is it negative on resume? Joe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On > Behalf Of Cihula, Joseph > Sent: Thursday, February 26, 2009 5:58 PM > > Sometime after c/s 19133 (I tried bisecting but there were some indeterminate results), during > a shutdown/reboot in the tboot routines when it creates the 1:1 mapping for itself, the > map_pages_to_xen() call ends up in alloc_domheap_pages() where it triggers the assertion > ''ASSERT(!in_irq());''. In addition, and even stranger, is that when resuming from S3 is > generates another assertion, ''BUG_ON(unlikely(in_irq()));'' in invalidate_shadow_ldt(). From > some debugging, the first assertion on entry is because the irq_count is 1 and the second is > because it''s -1. > > Adding a irq_exit() before map_pages_to_xen() fixes the first assertion and causes the second, > which is then fixed by irq_enter() on resume. > > But why are these necessary? Even if we say that something has caused the irq_count to go > positive before shutdown (but what-it wasn''t like this before pulling a more recent tree), the > irq_exit() that gets rid of the assertion means that the count has gone to 0-so why is it > negative on resume?As an additional data point/issue, if I build with debug=y, the map_pages_to_xen() call (on a reboot) generates a BUG_ON(seen == !irq_safe) in check_lock(). But prior to the map_pages_to_xen() call, we call local_irq_disable(), so it should be called as irq_safe. I''m not sure how to fix this. Joe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27/02/2009 03:52, "Cihula, Joseph" <joseph.cihula@intel.com> wrote:>> But why are these necessary? Even if we say that something has caused the >> irq_count to go >> positive before shutdown (but what-it wasn''t like this before pulling a more >> recent tree), the >> irq_exit() that gets rid of the assertion means that the count has gone to >> 0-so why is it >> negative on resume? > > As an additional data point/issue, if I build with debug=y, the > map_pages_to_xen() call (on a reboot) generates a BUG_ON(seen == !irq_safe) in > check_lock(). But prior to the map_pages_to_xen() call, we call > local_irq_disable(), so it should be called as irq_safe. I''m not sure how to > fix this.Please provide some backtraces. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Friday, February 27, 2009 1:33 AM > > On 27/02/2009 03:52, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: > > >> But why are these necessary? Even if we say that something has caused the > >> irq_count to go > >> positive before shutdown (but what-it wasn''t like this before pulling a more > >> recent tree), the > >> irq_exit() that gets rid of the assertion means that the count has gone to > >> 0-so why is it > >> negative on resume? > > > > As an additional data point/issue, if I build with debug=y, the > > map_pages_to_xen() call (on a reboot) generates a BUG_ON(seen == !irq_safe) in > > check_lock(). But prior to the map_pages_to_xen() call, we call > > local_irq_disable(), so it should be called as irq_safe. I''m not sure how to > > fix this. > > Please provide some backtraces. > > -- KeirTrace of assertion at shutdown entry: ===================================Restarting system. (XEN) Domain 0 shutdown: rebooting machine. (XEN) Assertion ''!((irq_stat[(((get_cpu_info()->processor_id)))].__local_irq_cou nt) != 0)'' failed at page_alloc.c:843 (XEN) ----[ Xen-3.4-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff828c8011305e>] alloc_domheap_pages+0x5e/0x171 (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff830078ce0020 rcx: 0000000000000000 (XEN) rdx: ffff828c8028e900 rsi: 0000000000000000 rdi: 0000000000000000 (XEN) rbp: ffff828c80277d18 rsp: ffff828c80277ce8 r8: 0000000000000010 (XEN) r9: 0000000000000700 r10: ffff828c80156e68 r11: 000000099b1d6973 (XEN) r12: 0000000000000063 r13: 00000000000000ff r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000026f0 (XEN) cr3: 0000000072ca7000 cr2: 0000003801031320 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff828c80277ce8: (XEN) 0000000000000000 ffff830078ce0020 0000000000000063 0000000000000063 (XEN) 0000000000000803 0000000000803000 ffff828c80277d28 ffff828c801cecd0 (XEN) ffff828c80277d98 ffff828c8014a465 00000000000000e3 000000000000004c (XEN) 0000000000000803 0000000000000000 0000000000000063 ffff830078ce1000 (XEN) 0000000000000063 0000000000000803 000000000000004c 0000000000000000 (XEN) ffff828c80210660 ffff828c8028ad60 ffff828c80277dc8 ffff828c801734a3 (XEN) ffff828c80277dc8 0000000000000000 ffff828c80277f28 ffff828c8028e900 (XEN) ffff828c80277e08 ffff828c80156aac 00000000000004e8 00000000802a0824 (XEN) ffff828c80277f28 ffff828c80277f28 ffff828c8028e900 ffff828c80210660 (XEN) ffff828c80277e18 ffff828c80156b40 ffff828c80277e28 ffff828c80156e03 (XEN) 00007d737fd881a7 ffff828c80141450 ffff828c8028ad60 ffff828c80210660 (XEN) ffff828c8028e900 ffff828c80277f28 ffff828c80277ee0 ffff828c80277f28 (XEN) 000000099b1d6973 ffff828c802bdaa0 0000000000000005 0000000000000001 (XEN) 0000000000000000 0000000000000001 ffff828c8028e900 ffff828c802305c0 (XEN) 0000000000000000 000000fb00000000 ffff828c8013a4ce 000000000000e008 (XEN) 0000000000000246 ffff828c80277ee0 000000000000e010 ffff828c80277f20 (XEN) ffff828c8013c60a ffff830077598000 ffff828c80230120 ffff830079ede000 (XEN) ffff830079ede000 0000000001c9c380 ffff828c80230100 ffff828c80277e10 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000001 0000000000000246 00000000000014e6 (XEN) Xen call trace: (XEN) [<ffff828c8011305e>] alloc_domheap_pages+0x5e/0x171 (XEN) [<ffff828c801cecd0>] alloc_xen_pagetable+0x21/0x8d (XEN) [<ffff828c8014a465>] map_pages_to_xen+0x676/0xc67 (XEN) [<ffff828c801734a3>] tboot_shutdown+0x50/0x9ad (XEN) [<ffff828c80156aac>] machine_restart+0xa4/0x12d (XEN) [<ffff828c80156b40>] __machine_restart+0xb/0xd (XEN) [<ffff828c80156e03>] smp_call_function_interrupt+0xc0/0xe8 (XEN) [<ffff828c80141450>] call_function_interrupt+0x30/0x40 (XEN) [<ffff828c8013a4ce>] default_idle+0x2f/0x34 (XEN) [<ffff828c8013c60a>] idle_loop+0xc8/0xe2 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion ''!((irq_stat[(((get_cpu_info()->processor_id)))].__local_irq_cou nt) != 0)'' failed at page_alloc.c:843 (XEN) **************************************** Assertion at S3 resume: =====================(XEN) Finishing wakeup from ACPI S3 state. (XEN) Thawing cpus ... (XEN) Booting processor 1/1 eip 8c000 TBOOT: cpu 1 waking up, SIPI vector=8c000 (XEN) CPU1: Intel Genuine Intel(R) CPU @ 2.40GHz stepping 04 (XEN) CPU1 is up (XEN) ioapic_guest_write: apic=0, pin=9, old_irq=9, new_irq=9 (XEN) ioapic_guest_write: old_entry=0000d950, new_entry=00008950 (XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ! (XEN) Xen BUG at mm.c:467 (XEN) ----[ Xen-3.4-unstable x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff828c8014984f>] invalidate_shadow_ldt+0x3f/0x110 (XEN) RFLAGS: 0000000000010286 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: 0000000000000005 rcx: 8000000000000000 (XEN) rdx: ffff828c8027e900 rsi: 0000000000000000 rdi: ffff8300775ea000 (XEN) rbp: ffff8300775ea000 rsp: ffff828c80267de8 r8: ffff8284000f8150 (XEN) r9: 8000000000000000 r10: 9800000000000002 r11: 0000000000000000 (XEN) r12: ffff8300775ea000 r13: 00000000ffffffff r14: ffff88006f3f1dc8 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000000af66000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff828c80267de8: (XEN) ffffffff00000200 0000000000000005 ffff8300775ea000 ffff830079e50000 (XEN) 0000000000007c0a ffff88006f3f1dc8 ffffffffffff8000 ffff828c8014d0ea (XEN) 0000000000000000 0000000000000005 0000000000007c0a 0000000000000000 (XEN) ffff8284000f8140 ffff828c8014d8e8 ffffffff804d593c ffff828c8021d740 (XEN) ffff828c8021d5a0 ffff828c8021d5a8 ffff828c8021d540 ffff828c80267f28 (XEN) ffff828c80267f28 00007ff079e50000 0000000000000000 0000000000000002 (XEN) ffff8300775ea000 ffff830079e50000 0000000000000005 0000000000007c0a (XEN) ffff88006f3f1df8 0000000000000050 ffff828c00000001 000000008027e900 (XEN) ffff828c8028fc80 ffff8300775ea000 ffff88006f3f1e58 ffff88006e017080 (XEN) ffff8800010044c0 0000000000000000 0000000944d77c74 ffff828c801c0169 (XEN) 0000000944d77c74 0000000000000000 ffff8800010044c0 ffff88006e017080 (XEN) ffff88006f3f1e58 ffff88006ef12080 0000000000000a07 0000000000007ff0 (XEN) ffff8800710d0168 ffff88006f3f0000 000000000000001a ffffffff8020634a (XEN) 0000000000000000 0000000000000002 ffff88006f3f1dc8 0000010000000000 (XEN) ffffffff8020634a 000000000000e033 0000000000000a07 ffff88006f3f1d70 (XEN) 000000000000e02b 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 ffff8300775ea000 (XEN) Xen call trace: (XEN) [<ffff828c8014984f>] invalidate_shadow_ldt+0x3f/0x110 (XEN) [<ffff828c8014d0ea>] new_guest_cr3+0x12a/0x1f0 (XEN) [<ffff828c8014d8e8>] do_mmuext_op+0x738/0x10d0 (XEN) [<ffff828c801c0169>] syscall_enter+0xa9/0xae (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at mm.c:467 (XEN) **************************************** Debug build shutdown BUG_ON: ==========================(XEN) Xen BUG at spinlock.c:23 (XEN) ----[ Xen-3.4-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff828c8011b414>] check_lock+0x44/0x55 (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff828c801fc6d4 rcx: 0000000000000001 (XEN) rdx: 0000000000000000 rsi: 0000000000000001 rdi: ffff828c801fc6d8 (XEN) rbp: ffff828c8027fc50 rsp: ffff828c8027fc50 r8: 0000000000000001 (XEN) r9: 0000ffff0000ffff r10: 00ff00ff00ff00ff r11: 0f0f0f0f0f0f0f0f (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000027 (XEN) r15: 0000000000000001 cr0: 0000000080050033 cr4: 00000000000026f0 (XEN) cr3: 0000000079e84000 cr2: 0000003801031320 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff828c8027fc50: (XEN) ffff828c8027fc68 ffff828c8011b456 0000000000000000 ffff828c8027fcd8 (XEN) ffff828c80112afd ffff828c8027fc88 ffff828c8011b69f 000000018027fd98 (XEN) 0000000100000000 0000000000000000 ffff828c80297e8e 0000003000000010 (XEN) 0000000000000027 0000000000000063 00000000000000ff 0000000000000000 (XEN) 0000000000000000 ffff828c8027fd18 ffff828c801131e1 0000000000000000 (XEN) ffff830078ce0020 0000000000000063 0000000000000063 0000000000000803 (XEN) 0000000000803000 ffff828c8027fd28 ffff828c801d0cd0 ffff828c8027fd98 (XEN) ffff828c8014b5a5 00000000000000e3 000000000000004c 0000000000000803 (XEN) 0000000000000000 0000000000000063 ffff830078ce1000 0000000000000063 (XEN) 0000000000000803 000000000000004c 0000000000000000 ffff828c80212860 (XEN) ffff828c80292d60 ffff828c8027fdc8 ffff828c801747a4 ffff828c8027fdc8 (XEN) 0000000000000000 ffff828c8027ff28 ffff828c80296900 ffff828c8027fe08 (XEN) ffff828c80157c6c 00000000000004e8 00000000802a88a4 ffff828c8027ff28 (XEN) ffff828c8027ff28 ffff828c80296900 ffff828c80212860 ffff828c8027fe18 (XEN) ffff828c80157d00 ffff828c8027fe28 ffff828c80157fc3 00007d737fd801a7 (XEN) ffff828c801424b0 ffff828c80292d60 ffff828c80212860 ffff828c80296900 (XEN) ffff828c8027ff28 ffff828c8027fee0 ffff828c8027ff28 000000161034d8e8 (XEN) ffff830077590060 0000000000000005 0000000000000001 0000000000000000 (XEN) 0000000000000001 ffff828c80296900 ffff828c802325c0 0000000000000000 (XEN) 000000fb00000000 ffff828c8013b4fe 000000000000e008 0000000000000246 (XEN) Xen call trace: (XEN) [<ffff828c8011b414>] check_lock+0x44/0x55 (XEN) [<ffff828c8011b456>] _spin_lock+0x11/0x24 (XEN) [<ffff828c80112afd>] alloc_heap_pages+0xe6/0x3d9 (XEN) [<ffff828c801131e1>] alloc_domheap_pages+0x11c/0x171 (XEN) [<ffff828c801d0cd0>] alloc_xen_pagetable+0x21/0x8d (XEN) [<ffff828c8014b5a5>] map_pages_to_xen+0x676/0xc67 (XEN) [<ffff828c801747a4>] tboot_shutdown+0xb8/0xa14 (XEN) [<ffff828c80157c6c>] machine_restart+0xa4/0x12d (XEN) [<ffff828c80157d00>] __machine_restart+0xb/0xd (XEN) [<ffff828c80157fc3>] smp_call_function_interrupt+0xc0/0xe8 (XEN) [<ffff828c801424b0>] call_function_interrupt+0x30/0x40 (XEN) [<ffff828c8013b4fe>] default_idle+0x2f/0x34 (XEN) [<ffff828c8013d63a>] idle_loop+0xc8/0xe2 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at spinlock.c:23 (XEN) **************************************** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 27/02/2009 14:35, "Cihula, Joseph" <joseph.cihula@intel.com> wrote:>> Please provide some backtraces. >> >> -- Keir > > Trace of assertion at shutdown entry:Okay, the ASSERT(!in_irq()) crash on shutdown is due to having IPIed to the BP to make it drive the shutdown. Hence the BP is actually in IRQ context. If you then zero the IRQ counter, it will get decremented on exit from the shutdown code (and the IPI interrupt) on S3 resume... Except I''m confused by this because machine_restart() is not used for S3 suspend/resume. So actually I''m not sure how you end up in IRQ context on S3 suspend, and you didn''t send a backtrace of that situation. To get to the BP on S3 suspend we use continue_hypercall_on_cpu(), which does not leave us in IRQ context. For the check_lock() assertion, either do not local_irq_disable() in tboot_shutdown() (is that needed?) or perhaps even better (solves all the crashes you''ve seen) why not disable paging before jumping into tboot and hence avoid needing map_pages_to_xen()? I can''t see why tboot would care to run on our random pagetables, and implementing a stub to jump into / out of non-paged mode would be very easy. I can explain more how to do this if this is a suitable solution. Another alternative would be to map tboot pages during boot and leave them mapped forever after. That also would avoid these map_pages_to_xen() during S3/shutdown issues, and I suppose is easier than implementing a return-to-no-paging stub. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Friday, February 27, 2009 7:42 AM > > On 27/02/2009 14:35, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: > > >> Please provide some backtraces. > >> > >> -- Keir > > > > Trace of assertion at shutdown entry: > > Okay, the ASSERT(!in_irq()) crash on shutdown is due to having IPIed to the > BP to make it drive the shutdown. Hence the BP is actually in IRQ context. > If you then zero the IRQ counter, it will get decremented on exit from the > shutdown code (and the IPI interrupt) on S3 resume... Except I''m confused by > this because machine_restart() is not used for S3 suspend/resume. So > actually I''m not sure how you end up in IRQ context on S3 suspend, and you > didn''t send a backtrace of that situation. To get to the BP on S3 suspend we > use continue_hypercall_on_cpu(), which does not leave us in IRQ context. > > For the check_lock() assertion, either do not local_irq_disable() in > tboot_shutdown() (is that needed?) or perhaps even better (solves all the > crashes you''ve seen) why not disable paging before jumping into tboot and > hence avoid needing map_pages_to_xen()? I can''t see why tboot would care to > run on our random pagetables, and implementing a stub to jump into / out of > non-paged mode would be very easy. I can explain more how to do this if this > is a suitable solution. Another alternative would be to map tboot pages > during boot and leave them mapped forever after. That also would avoid these > map_pages_to_xen() during S3/shutdown issues, and I suppose is easier than > implementing a return-to-no-paging stub.On S3, we also need to create integrity measurements for the xenheap and domheap, which require mapping pages. So I don''t think that either moving the 1:1 mapping early or the disable paging would solve the problem (it would just shift it). I think that I did find a way to avoid the reboot spinlock BUG_ON by using spin_debug_{disable/enable}(). S3 still isn''t working, but I haven''t been able to track down why yet (unfortunately, serial output stops before it fails). Joe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Cihula, Joseph >Sent: Saturday, February 28, 2009 2:55 AM >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Friday, February 27, 2009 7:42 AM >> >> On 27/02/2009 14:35, "Cihula, Joseph" ><joseph.cihula@intel.com> wrote: >> >> >> Please provide some backtraces. >> >> >> >> -- Keir >> > >> > Trace of assertion at shutdown entry: >> >> Okay, the ASSERT(!in_irq()) crash on shutdown is due to >having IPIed to the >> BP to make it drive the shutdown. Hence the BP is actually >in IRQ context. >> If you then zero the IRQ counter, it will get decremented on >exit from the >> shutdown code (and the IPI interrupt) on S3 resume... Except >I''m confused by >> this because machine_restart() is not used for S3 suspend/resume. So >> actually I''m not sure how you end up in IRQ context on S3 >suspend, and you >> didn''t send a backtrace of that situation. To get to the BP >on S3 suspend we >> use continue_hypercall_on_cpu(), which does not leave us in >IRQ context. >> >> For the check_lock() assertion, either do not local_irq_disable() in >> tboot_shutdown() (is that needed?) or perhaps even better >(solves all the >> crashes you''ve seen) why not disable paging before jumping >into tboot and >> hence avoid needing map_pages_to_xen()? I can''t see why >tboot would care to >> run on our random pagetables, and implementing a stub to >jump into / out of >> non-paged mode would be very easy. I can explain more how to >do this if this >> is a suitable solution. Another alternative would be to map >tboot pages >> during boot and leave them mapped forever after. That also >would avoid these >> map_pages_to_xen() during S3/shutdown issues, and I suppose >is easier than >> implementing a return-to-no-paging stub. > >On S3, we also need to create integrity measurements for the >xenheap and domheap, which require mapping pages. So I don''t >think that either moving the 1:1 mapping early or the disable >paging would solve the problem (it would just shift it). > >I think that I did find a way to avoid the reboot spinlock >BUG_ON by using spin_debug_{disable/enable}(). S3 still isn''t >working, but I haven''t been able to track down why yet >(unfortunately, serial output stops before it fails). >Just be caution that current Xen upstream has several S3 bugs which was fixed by Guanqun but not checked in yet. Maybe you can have a try by http://markmail.org/thread/fkcowjzdbqq3qbjp to avoid hunting same issues. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 01/03/2009 13:48, "Tian, Kevin" <kevin.tian@intel.com> wrote:> > Just be caution that current Xen upstream has several S3 bugs > which was fixed by Guanqun but not checked in yet. Maybe you > can have a try by http://markmail.org/thread/fkcowjzdbqq3qbjp > to avoid hunting same issues.Applied as of c/s 19243. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, Feb 28, 2009 at 2:55 AM, Cihula, Joseph <joseph.cihula@intel.com> wrote:>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Friday, February 27, 2009 7:42 AM >> >> On 27/02/2009 14:35, "Cihula, Joseph" <joseph.cihula@intel.com> wrote: >> >> >> Please provide some backtraces. >> >> >> >> -- Keir >> > >> > Trace of assertion at shutdown entry: >> >> Okay, the ASSERT(!in_irq()) crash on shutdown is due to having IPIed to the >> BP to make it drive the shutdown. Hence the BP is actually in IRQ context. >> If you then zero the IRQ counter, it will get decremented on exit from the >> shutdown code (and the IPI interrupt) on S3 resume... Except I''m confused by >> this because machine_restart() is not used for S3 suspend/resume. So >> actually I''m not sure how you end up in IRQ context on S3 suspend, and you >> didn''t send a backtrace of that situation. To get to the BP on S3 suspend we >> use continue_hypercall_on_cpu(), which does not leave us in IRQ context. >> >> For the check_lock() assertion, either do not local_irq_disable() in >> tboot_shutdown() (is that needed?) or perhaps even better (solves all the >> crashes you''ve seen) why not disable paging before jumping into tboot and >> hence avoid needing map_pages_to_xen()? I can''t see why tboot would care to >> run on our random pagetables, and implementing a stub to jump into / out of >> non-paged mode would be very easy. I can explain more how to do this if this >> is a suitable solution. Another alternative would be to map tboot pages >> during boot and leave them mapped forever after. That also would avoid these >> map_pages_to_xen() during S3/shutdown issues, and I suppose is easier than >> implementing a return-to-no-paging stub. > > On S3, we also need to create integrity measurements for the xenheap and domheap, which require mapping pages. So I don''t think that either moving the 1:1 mapping early or the disable paging would solve the problem (it would just shift it). > > I think that I did find a way to avoid the reboot spinlock BUG_ON by using spin_debug_{disable/enable}(). S3 still isn''t working, but I haven''t been able to track down why yet (unfortunately, serial output stops before it fails).On this serial output thing, you can delay the console suspend a little bit to see more messages: (use it for testing and debugging...) diff -r e5c696aaf2a6 xen/arch/x86/acpi/power.c --- a/xen/arch/x86/acpi/power.c Sun Mar 01 14:58:07 2009 +0000 +++ b/xen/arch/x86/acpi/power.c Mon Mar 02 21:53:17 2009 +0800 @@ -46,21 +46,23 @@ static int device_power_down(void) { iommu_suspend(); + time_suspend(); + + i8259A_suspend(); + + ioapic_suspend(); + + lapic_suspend(); + console_suspend(); - time_suspend(); - - i8259A_suspend(); - - ioapic_suspend(); - - lapic_suspend(); - return 0; } static void device_power_up(void) { + console_resume(); + lapic_resume(); ioapic_resume(); @@ -68,8 +70,6 @@ static void device_power_up(void) i8259A_resume(); time_resume(); - - console_resume(); iommu_resume(); }> > Joe > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >-- Guanqun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel