thr3ads.net - Xen devel - [Xen-devel] odd IRQ behavior [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Cihula, Joseph

2009-Feb-27 01:58 UTC

[Xen-devel] odd IRQ behavior

Sometime after c/s 19133 (I tried bisecting but there were some indeterminate
results), during a shutdown/reboot in the tboot routines when it creates the 1:1
mapping for itself, the map_pages_to_xen() call ends up in alloc_domheap_pages()
where it triggers the assertion ''ASSERT(!in_irq());''.  In
addition, and even stranger, is that when resuming from S3 is generates another
assertion, ''BUG_ON(unlikely(in_irq()));'' in
invalidate_shadow_ldt().  From some debugging, the first assertion on entry is
because the irq_count is 1 and the second is because it''s -1.

Adding a irq_exit() before map_pages_to_xen() fixes the first assertion and
causes the second, which is then fixed by irq_enter() on resume.

But why are these necessary?  Even if we say that something has caused the
irq_count to go positive before shutdown (but what-it wasn''t like this
before pulling a more recent tree), the irq_exit() that gets rid of the
assertion means that the count has gone to 0-so why is it negative on resume?

Joe


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cihula, Joseph

2009-Feb-27 03:52 UTC

head link

[Xen-devel] RE: odd IRQ behavior

> From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On
> Behalf Of Cihula, Joseph
> Sent: Thursday, February 26, 2009 5:58 PM
>
> Sometime after c/s 19133 (I tried bisecting but there were some
indeterminate results), during
> a shutdown/reboot in the tboot routines when it creates the 1:1 mapping for
itself, the
> map_pages_to_xen() call ends up in alloc_domheap_pages() where it triggers
the assertion
> ''ASSERT(!in_irq());''.  In addition, and even stranger, is
that when resuming from S3 is
> generates another assertion,
''BUG_ON(unlikely(in_irq()));'' in invalidate_shadow_ldt(). 
From
> some debugging, the first assertion on entry is because the irq_count is 1
and the second is
> because it''s -1.
>
> Adding a irq_exit() before map_pages_to_xen() fixes the first assertion and
causes the second,
> which is then fixed by irq_enter() on resume.
>
> But why are these necessary?  Even if we say that something has caused the
irq_count to go
> positive before shutdown (but what-it wasn''t like this before
pulling a more recent tree), the
> irq_exit() that gets rid of the assertion means that the count has gone to
0-so why is it
> negative on resume?
As an additional data point/issue, if I build with debug=y, the
map_pages_to_xen() call (on a reboot) generates a BUG_ON(seen == !irq_safe) in
check_lock().  But prior to the map_pages_to_xen() call, we call
local_irq_disable(), so it should be called as irq_safe.  I''m not sure
how to fix this.

Joe

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Feb-27 09:33 UTC

head link

Re: [Xen-devel] RE: odd IRQ behavior

On 27/02/2009 03:52, "Cihula, Joseph" <joseph.cihula@intel.com>
wrote:
>> But why are these necessary?  Even if we say that something has caused
the
>> irq_count to go
>> positive before shutdown (but what-it wasn''t like this before
pulling a more
>> recent tree), the
>> irq_exit() that gets rid of the assertion means that the count has gone
to
>> 0-so why is it
>> negative on resume?
> 
> As an additional data point/issue, if I build with debug=y, the
> map_pages_to_xen() call (on a reboot) generates a BUG_ON(seen == !irq_safe)
in
> check_lock().  But prior to the map_pages_to_xen() call, we call
> local_irq_disable(), so it should be called as irq_safe.  I''m not
sure how to
> fix this.
Please provide some backtraces.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cihula, Joseph

2009-Feb-27 14:35 UTC

head link

RE: [Xen-devel] RE: odd IRQ behavior

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Friday, February 27, 2009 1:33 AM
>
> On 27/02/2009 03:52, "Cihula, Joseph"
<joseph.cihula@intel.com> wrote:
>
> >> But why are these necessary?  Even if we say that something has
caused the
> >> irq_count to go
> >> positive before shutdown (but what-it wasn''t like this
before pulling a more
> >> recent tree), the
> >> irq_exit() that gets rid of the assertion means that the count has
gone to
> >> 0-so why is it
> >> negative on resume?
> >
> > As an additional data point/issue, if I build with debug=y, the
> > map_pages_to_xen() call (on a reboot) generates a BUG_ON(seen ==
!irq_safe) in
> > check_lock().  But prior to the map_pages_to_xen() call, we call
> > local_irq_disable(), so it should be called as irq_safe.  I''m
not sure how to
> > fix this.
>
> Please provide some backtraces.
>
>  -- Keir
Trace of assertion at shutdown entry:
===================================Restarting system.
(XEN) Domain 0 shutdown: rebooting machine.
(XEN) Assertion
''!((irq_stat[(((get_cpu_info()->processor_id)))].__local_irq_cou
nt) != 0)'' failed at page_alloc.c:843
(XEN) ----[ Xen-3.4-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff828c8011305e>] alloc_domheap_pages+0x5e/0x171
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830078ce0020   rcx: 0000000000000000
(XEN) rdx: ffff828c8028e900   rsi: 0000000000000000   rdi: 0000000000000000
(XEN) rbp: ffff828c80277d18   rsp: ffff828c80277ce8   r8:  0000000000000010
(XEN) r9:  0000000000000700   r10: ffff828c80156e68   r11: 000000099b1d6973
(XEN) r12: 0000000000000063   r13: 00000000000000ff   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 0000000072ca7000   cr2: 0000003801031320
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff828c80277ce8:
(XEN)    0000000000000000 ffff830078ce0020 0000000000000063 0000000000000063
(XEN)    0000000000000803 0000000000803000 ffff828c80277d28 ffff828c801cecd0
(XEN)    ffff828c80277d98 ffff828c8014a465 00000000000000e3 000000000000004c
(XEN)    0000000000000803 0000000000000000 0000000000000063 ffff830078ce1000
(XEN)    0000000000000063 0000000000000803 000000000000004c 0000000000000000
(XEN)    ffff828c80210660 ffff828c8028ad60 ffff828c80277dc8 ffff828c801734a3
(XEN)    ffff828c80277dc8 0000000000000000 ffff828c80277f28 ffff828c8028e900
(XEN)    ffff828c80277e08 ffff828c80156aac 00000000000004e8 00000000802a0824
(XEN)    ffff828c80277f28 ffff828c80277f28 ffff828c8028e900 ffff828c80210660
(XEN)    ffff828c80277e18 ffff828c80156b40 ffff828c80277e28 ffff828c80156e03
(XEN)    00007d737fd881a7 ffff828c80141450 ffff828c8028ad60 ffff828c80210660
(XEN)    ffff828c8028e900 ffff828c80277f28 ffff828c80277ee0 ffff828c80277f28
(XEN)    000000099b1d6973 ffff828c802bdaa0 0000000000000005 0000000000000001
(XEN)    0000000000000000 0000000000000001 ffff828c8028e900 ffff828c802305c0
(XEN)    0000000000000000 000000fb00000000 ffff828c8013a4ce 000000000000e008
(XEN)    0000000000000246 ffff828c80277ee0 000000000000e010 ffff828c80277f20
(XEN)    ffff828c8013c60a ffff830077598000 ffff828c80230120 ffff830079ede000
(XEN)    ffff830079ede000 0000000001c9c380 ffff828c80230100 ffff828c80277e10
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000001 0000000000000246 00000000000014e6
(XEN) Xen call trace:
(XEN)    [<ffff828c8011305e>] alloc_domheap_pages+0x5e/0x171
(XEN)    [<ffff828c801cecd0>] alloc_xen_pagetable+0x21/0x8d
(XEN)    [<ffff828c8014a465>] map_pages_to_xen+0x676/0xc67
(XEN)    [<ffff828c801734a3>] tboot_shutdown+0x50/0x9ad
(XEN)    [<ffff828c80156aac>] machine_restart+0xa4/0x12d
(XEN)    [<ffff828c80156b40>] __machine_restart+0xb/0xd
(XEN)    [<ffff828c80156e03>] smp_call_function_interrupt+0xc0/0xe8
(XEN)    [<ffff828c80141450>] call_function_interrupt+0x30/0x40
(XEN)    [<ffff828c8013a4ce>] default_idle+0x2f/0x34
(XEN)    [<ffff828c8013c60a>] idle_loop+0xc8/0xe2
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion
''!((irq_stat[(((get_cpu_info()->processor_id)))].__local_irq_cou
nt) != 0)'' failed at page_alloc.c:843
(XEN) ****************************************

Assertion at S3 resume:
=====================(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Thawing cpus ...
(XEN) Booting processor 1/1 eip 8c000
TBOOT: cpu 1 waking up, SIPI vector=8c000
(XEN) CPU1: Intel Genuine Intel(R) CPU                  @ 2.40GHz stepping 04
(XEN) CPU1 is up
(XEN) ioapic_guest_write: apic=0, pin=9, old_irq=9, new_irq=9
(XEN) ioapic_guest_write: old_entry=0000d950, new_entry=00008950
(XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ!
(XEN) Xen BUG at mm.c:467
(XEN) ----[ Xen-3.4-unstable  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff828c8014984f>] invalidate_shadow_ldt+0x3f/0x110
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000005   rcx: 8000000000000000
(XEN) rdx: ffff828c8027e900   rsi: 0000000000000000   rdi: ffff8300775ea000
(XEN) rbp: ffff8300775ea000   rsp: ffff828c80267de8   r8:  ffff8284000f8150
(XEN) r9:  8000000000000000   r10: 9800000000000002   r11: 0000000000000000
(XEN) r12: ffff8300775ea000   r13: 00000000ffffffff   r14: ffff88006f3f1dc8
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000000af66000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff828c80267de8:
(XEN)    ffffffff00000200 0000000000000005 ffff8300775ea000 ffff830079e50000
(XEN)    0000000000007c0a ffff88006f3f1dc8 ffffffffffff8000 ffff828c8014d0ea
(XEN)    0000000000000000 0000000000000005 0000000000007c0a 0000000000000000
(XEN)    ffff8284000f8140 ffff828c8014d8e8 ffffffff804d593c ffff828c8021d740
(XEN)    ffff828c8021d5a0 ffff828c8021d5a8 ffff828c8021d540 ffff828c80267f28
(XEN)    ffff828c80267f28 00007ff079e50000 0000000000000000 0000000000000002
(XEN)    ffff8300775ea000 ffff830079e50000 0000000000000005 0000000000007c0a
(XEN)    ffff88006f3f1df8 0000000000000050 ffff828c00000001 000000008027e900
(XEN)    ffff828c8028fc80 ffff8300775ea000 ffff88006f3f1e58 ffff88006e017080
(XEN)    ffff8800010044c0 0000000000000000 0000000944d77c74 ffff828c801c0169
(XEN)    0000000944d77c74 0000000000000000 ffff8800010044c0 ffff88006e017080
(XEN)    ffff88006f3f1e58 ffff88006ef12080 0000000000000a07 0000000000007ff0
(XEN)    ffff8800710d0168 ffff88006f3f0000 000000000000001a ffffffff8020634a
(XEN)    0000000000000000 0000000000000002 ffff88006f3f1dc8 0000010000000000
(XEN)    ffffffff8020634a 000000000000e033 0000000000000a07 ffff88006f3f1d70
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff8300775ea000
(XEN) Xen call trace:
(XEN)    [<ffff828c8014984f>] invalidate_shadow_ldt+0x3f/0x110
(XEN)    [<ffff828c8014d0ea>] new_guest_cr3+0x12a/0x1f0
(XEN)    [<ffff828c8014d8e8>] do_mmuext_op+0x738/0x10d0
(XEN)    [<ffff828c801c0169>] syscall_enter+0xa9/0xae
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at mm.c:467
(XEN) ****************************************

Debug build shutdown BUG_ON:
==========================(XEN) Xen BUG at spinlock.c:23
(XEN) ----[ Xen-3.4-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff828c8011b414>] check_lock+0x44/0x55
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff828c801fc6d4   rcx: 0000000000000001
(XEN) rdx: 0000000000000000   rsi: 0000000000000001   rdi: ffff828c801fc6d8
(XEN) rbp: ffff828c8027fc50   rsp: ffff828c8027fc50   r8:  0000000000000001
(XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f0f0f
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000027
(XEN) r15: 0000000000000001   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 0000000079e84000   cr2: 0000003801031320
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff828c8027fc50:
(XEN)    ffff828c8027fc68 ffff828c8011b456 0000000000000000 ffff828c8027fcd8
(XEN)    ffff828c80112afd ffff828c8027fc88 ffff828c8011b69f 000000018027fd98
(XEN)    0000000100000000 0000000000000000 ffff828c80297e8e 0000003000000010
(XEN)    0000000000000027 0000000000000063 00000000000000ff 0000000000000000
(XEN)    0000000000000000 ffff828c8027fd18 ffff828c801131e1 0000000000000000
(XEN)    ffff830078ce0020 0000000000000063 0000000000000063 0000000000000803
(XEN)    0000000000803000 ffff828c8027fd28 ffff828c801d0cd0 ffff828c8027fd98
(XEN)    ffff828c8014b5a5 00000000000000e3 000000000000004c 0000000000000803
(XEN)    0000000000000000 0000000000000063 ffff830078ce1000 0000000000000063
(XEN)    0000000000000803 000000000000004c 0000000000000000 ffff828c80212860
(XEN)    ffff828c80292d60 ffff828c8027fdc8 ffff828c801747a4 ffff828c8027fdc8
(XEN)    0000000000000000 ffff828c8027ff28 ffff828c80296900 ffff828c8027fe08
(XEN)    ffff828c80157c6c 00000000000004e8 00000000802a88a4 ffff828c8027ff28
(XEN)    ffff828c8027ff28 ffff828c80296900 ffff828c80212860 ffff828c8027fe18
(XEN)    ffff828c80157d00 ffff828c8027fe28 ffff828c80157fc3 00007d737fd801a7
(XEN)    ffff828c801424b0 ffff828c80292d60 ffff828c80212860 ffff828c80296900
(XEN)    ffff828c8027ff28 ffff828c8027fee0 ffff828c8027ff28 000000161034d8e8
(XEN)    ffff830077590060 0000000000000005 0000000000000001 0000000000000000
(XEN)    0000000000000001 ffff828c80296900 ffff828c802325c0 0000000000000000
(XEN)    000000fb00000000 ffff828c8013b4fe 000000000000e008 0000000000000246
(XEN) Xen call trace:
(XEN)    [<ffff828c8011b414>] check_lock+0x44/0x55
(XEN)    [<ffff828c8011b456>] _spin_lock+0x11/0x24
(XEN)    [<ffff828c80112afd>] alloc_heap_pages+0xe6/0x3d9
(XEN)    [<ffff828c801131e1>] alloc_domheap_pages+0x11c/0x171
(XEN)    [<ffff828c801d0cd0>] alloc_xen_pagetable+0x21/0x8d
(XEN)    [<ffff828c8014b5a5>] map_pages_to_xen+0x676/0xc67
(XEN)    [<ffff828c801747a4>] tboot_shutdown+0xb8/0xa14
(XEN)    [<ffff828c80157c6c>] machine_restart+0xa4/0x12d
(XEN)    [<ffff828c80157d00>] __machine_restart+0xb/0xd
(XEN)    [<ffff828c80157fc3>] smp_call_function_interrupt+0xc0/0xe8
(XEN)    [<ffff828c801424b0>] call_function_interrupt+0x30/0x40
(XEN)    [<ffff828c8013b4fe>] default_idle+0x2f/0x34
(XEN)    [<ffff828c8013d63a>] idle_loop+0xc8/0xe2
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at spinlock.c:23
(XEN) ****************************************


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Feb-27 15:42 UTC

head link

Re: [Xen-devel] RE: odd IRQ behavior

On 27/02/2009 14:35, "Cihula, Joseph" <joseph.cihula@intel.com>
wrote:
>> Please provide some backtraces.
>> 
>>  -- Keir
> 
> Trace of assertion at shutdown entry:
Okay, the ASSERT(!in_irq()) crash on shutdown is due to having IPIed to the
BP to make it drive the shutdown. Hence the BP is actually in IRQ context.
If you then zero the IRQ counter, it will get decremented on exit from the
shutdown code (and the IPI interrupt) on S3 resume... Except I''m
confused by
this because machine_restart() is not used for S3 suspend/resume. So
actually I''m not sure how you end up in IRQ context on S3 suspend, and
you
didn''t send a backtrace of that situation. To get to the BP on S3
suspend we
use continue_hypercall_on_cpu(), which does not leave us in IRQ context.

For the check_lock() assertion, either do not local_irq_disable() in
tboot_shutdown() (is that needed?) or perhaps even better (solves all the
crashes you''ve seen) why not disable paging before jumping into tboot
and
hence avoid needing map_pages_to_xen()? I can''t see why tboot would
care to
run on our random pagetables, and implementing a stub to jump into / out of
non-paged mode would be very easy. I can explain more how to do this if this
is a suitable solution. Another alternative would be to map tboot pages
during boot and leave them mapped forever after. That also would avoid these
map_pages_to_xen() during S3/shutdown issues, and I suppose is easier than
implementing a return-to-no-paging stub.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Cihula, Joseph

2009-Feb-27 18:55 UTC

head link

RE: [Xen-devel] RE: odd IRQ behavior

> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Friday, February 27, 2009 7:42 AM
>
> On 27/02/2009 14:35, "Cihula, Joseph"
<joseph.cihula@intel.com> wrote:
>
> >> Please provide some backtraces.
> >>
> >>  -- Keir
> >
> > Trace of assertion at shutdown entry:
>
> Okay, the ASSERT(!in_irq()) crash on shutdown is due to having IPIed to the
> BP to make it drive the shutdown. Hence the BP is actually in IRQ context.
> If you then zero the IRQ counter, it will get decremented on exit from the
> shutdown code (and the IPI interrupt) on S3 resume... Except I''m
confused by
> this because machine_restart() is not used for S3 suspend/resume. So
> actually I''m not sure how you end up in IRQ context on S3 suspend,
and you
> didn''t send a backtrace of that situation. To get to the BP on S3
suspend we
> use continue_hypercall_on_cpu(), which does not leave us in IRQ context.
>
> For the check_lock() assertion, either do not local_irq_disable() in
> tboot_shutdown() (is that needed?) or perhaps even better (solves all the
> crashes you''ve seen) why not disable paging before jumping into
tboot and
> hence avoid needing map_pages_to_xen()? I can''t see why tboot
would care to
> run on our random pagetables, and implementing a stub to jump into / out of
> non-paged mode would be very easy. I can explain more how to do this if
this
> is a suitable solution. Another alternative would be to map tboot pages
> during boot and leave them mapped forever after. That also would avoid
these
> map_pages_to_xen() during S3/shutdown issues, and I suppose is easier than
> implementing a return-to-no-paging stub.
On S3, we also need to create integrity measurements for the xenheap and
domheap, which require mapping pages.  So I don''t think that either
moving the 1:1 mapping early or the disable paging would solve the problem (it
would just shift it).

I think that I did find a way to avoid the reboot spinlock BUG_ON by using
spin_debug_{disable/enable}().  S3 still isn''t working, but I
haven''t been able to track down why yet (unfortunately, serial output
stops before it fails).

Joe

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-Mar-01 13:48 UTC

head link

RE: [Xen-devel] RE: odd IRQ behavior

>From: Cihula, Joseph
>Sent: Saturday, February 28, 2009 2:55 AM
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Friday, February 27, 2009 7:42 AM
>>
>> On 27/02/2009 14:35, "Cihula, Joseph" 
><joseph.cihula@intel.com> wrote:
>>
>> >> Please provide some backtraces.
>> >>
>> >>  -- Keir
>> >
>> > Trace of assertion at shutdown entry:
>>
>> Okay, the ASSERT(!in_irq()) crash on shutdown is due to 
>having IPIed to the
>> BP to make it drive the shutdown. Hence the BP is actually 
>in IRQ context.
>> If you then zero the IRQ counter, it will get decremented on 
>exit from the
>> shutdown code (and the IPI interrupt) on S3 resume... Except 
>I''m confused by
>> this because machine_restart() is not used for S3 suspend/resume. So
>> actually I''m not sure how you end up in IRQ context on S3 
>suspend, and you
>> didn''t send a backtrace of that situation. To get to the BP 
>on S3 suspend we
>> use continue_hypercall_on_cpu(), which does not leave us in 
>IRQ context.
>>
>> For the check_lock() assertion, either do not local_irq_disable() in
>> tboot_shutdown() (is that needed?) or perhaps even better 
>(solves all the
>> crashes you''ve seen) why not disable paging before jumping 
>into tboot and
>> hence avoid needing map_pages_to_xen()? I can''t see why 
>tboot would care to
>> run on our random pagetables, and implementing a stub to 
>jump into / out of
>> non-paged mode would be very easy. I can explain more how to 
>do this if this
>> is a suitable solution. Another alternative would be to map 
>tboot pages
>> during boot and leave them mapped forever after. That also 
>would avoid these
>> map_pages_to_xen() during S3/shutdown issues, and I suppose 
>is easier than
>> implementing a return-to-no-paging stub.
>
>On S3, we also need to create integrity measurements for the 
>xenheap and domheap, which require mapping pages.  So I don''t 
>think that either moving the 1:1 mapping early or the disable 
>paging would solve the problem (it would just shift it).
>
>I think that I did find a way to avoid the reboot spinlock 
>BUG_ON by using spin_debug_{disable/enable}().  S3 still isn''t 
>working, but I haven''t been able to track down why yet 
>(unfortunately, serial output stops before it fails).
>
Just be caution that current Xen upstream has several S3 bugs
which was fixed by Guanqun but not checked in yet. Maybe you
can have a try by http://markmail.org/thread/fkcowjzdbqq3qbjp
to avoid hunting same issues.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Mar-01 14:58 UTC

head link

Re: [Xen-devel] RE: odd IRQ behavior

On 01/03/2009 13:48, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> Just be caution that current Xen upstream has several S3 bugs
> which was fixed by Guanqun but not checked in yet. Maybe you
> can have a try by http://markmail.org/thread/fkcowjzdbqq3qbjp
> to avoid hunting same issues.
Applied as of c/s 19243.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Guanqun Lu

2009-Mar-02 06:09 UTC

head link

Re: [Xen-devel] RE: odd IRQ behavior

On Sat, Feb 28, 2009 at 2:55 AM, Cihula, Joseph <joseph.cihula@intel.com>
wrote:>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Friday, February 27, 2009 7:42 AM
>>
>> On 27/02/2009 14:35, "Cihula, Joseph"
<joseph.cihula@intel.com> wrote:
>>
>> >> Please provide some backtraces.
>> >>
>> >>  -- Keir
>> >
>> > Trace of assertion at shutdown entry:
>>
>> Okay, the ASSERT(!in_irq()) crash on shutdown is due to having IPIed to
the
>> BP to make it drive the shutdown. Hence the BP is actually in IRQ
context.
>> If you then zero the IRQ counter, it will get decremented on exit from
the
>> shutdown code (and the IPI interrupt) on S3 resume... Except
I''m confused by
>> this because machine_restart() is not used for S3 suspend/resume. So
>> actually I''m not sure how you end up in IRQ context on S3
suspend, and you
>> didn''t send a backtrace of that situation. To get to the BP on
S3 suspend we
>> use continue_hypercall_on_cpu(), which does not leave us in IRQ
context.
>>
>> For the check_lock() assertion, either do not local_irq_disable() in
>> tboot_shutdown() (is that needed?) or perhaps even better (solves all
the
>> crashes you''ve seen) why not disable paging before jumping
into tboot and
>> hence avoid needing map_pages_to_xen()? I can''t see why tboot
would care to
>> run on our random pagetables, and implementing a stub to jump into /
out of
>> non-paged mode would be very easy. I can explain more how to do this if
this
>> is a suitable solution. Another alternative would be to map tboot pages
>> during boot and leave them mapped forever after. That also would avoid
these
>> map_pages_to_xen() during S3/shutdown issues, and I suppose is easier
than
>> implementing a return-to-no-paging stub.
>
> On S3, we also need to create integrity measurements for the xenheap and
domheap, which require mapping pages.  So I don''t think that either
moving the 1:1 mapping early or the disable paging would solve the problem (it
would just shift it).
>
> I think that I did find a way to avoid the reboot spinlock BUG_ON by using
spin_debug_{disable/enable}().  S3 still isn''t working, but I
haven''t been able to track down why yet (unfortunately, serial output
stops before it fails).
On this serial output thing, you can delay the console suspend a
little bit to see more messages:
(use it for testing and debugging...)

diff -r e5c696aaf2a6 xen/arch/x86/acpi/power.c
--- a/xen/arch/x86/acpi/power.c Sun Mar 01 14:58:07 2009 +0000
+++ b/xen/arch/x86/acpi/power.c Mon Mar 02 21:53:17 2009 +0800
@@ -46,21 +46,23 @@ static int device_power_down(void)
 {
     iommu_suspend();

+    time_suspend();
+
+    i8259A_suspend();
+
+    ioapic_suspend();
+
+    lapic_suspend();
+
     console_suspend();

-    time_suspend();
-
-    i8259A_suspend();
-
-    ioapic_suspend();
-
-    lapic_suspend();
-
     return 0;
 }

 static void device_power_up(void)
 {
+    console_resume();
+
     lapic_resume();

     ioapic_resume();
@@ -68,8 +70,6 @@ static void device_power_up(void)
     i8259A_resume();

     time_resume();
-
-    console_resume();

     iommu_resume();
 }


>
> Joe
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>


-- 
Guanqun

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Feb 2009 - odd IRQ behavior

[Xen-devel] odd IRQ behavior

[Xen-devel] RE: odd IRQ behavior

Re: [Xen-devel] RE: odd IRQ behavior

RE: [Xen-devel] RE: odd IRQ behavior

Re: [Xen-devel] RE: odd IRQ behavior

RE: [Xen-devel] RE: odd IRQ behavior

RE: [Xen-devel] RE: odd IRQ behavior

Re: [Xen-devel] RE: odd IRQ behavior

Re: [Xen-devel] RE: odd IRQ behavior