thr3ads.net - Xen devel - DomU: kernel BUG at arch/x86/xen/enlighten.c:425 [Mar 2013]

If this information is useful, please help other people find it:
Share via:

James Sinclair

2013-Mar-08 02:23 UTC

DomU: kernel BUG at arch/x86/xen/enlighten.c:425

Xen: (XEN) Xen version 4.1.5-pre (maker@(none)) (gcc version 4.4.5 (Debian
4.4.5-8) ) Mon Feb  4 12:59:38 EST 2013
Dom0: 3.7.7-1-x86_64 #1 SMP Thu Feb 14 15:58:35 EST 2013 x86_64 GNU/Linux
DomU: 3.7.10 (32 bit)

We''re seeing this bug pop up fairly regularly. The BUG below is from
the first time it''s triggered after a reboot - subsequent triggers
produce similar tracebacks with the kernel flagged as "Tainted: G     
D" - I can provide more examples if desired.

    ------------[ cut here ]------------
    kernel BUG at arch/x86/xen/enlighten.c:425!
    invalid opcode: 0000 [#1] SMP 
    Modules linked in:
    Pid: 3158, comm: ntpd Not tainted 3.7.10-linode49 #2  
    EIP: 0061:[<c0102e8a>] EFLAGS: 00210282 CPU: 0
    EIP is at set_aliased_prot+0x9a/0x110
    EAX: ffffffea EBX: ee2e6000 ECX: 2c74a063 EDX: 80000019
    ESI: 00000000 EDI: 00002000 EBP: 2c74a063 ESP: ead9da10
     DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
    CR0: 8005003b CR2: 4a258bc4 CR3: 2aa7a000 CR4: 00002660
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff0ff0 DR7: 00000400
    Process ntpd (pid: 3158, ti=ead9c000 task=eb6e3300 task.ti=ead9c000)
    Stack:
     0007fb6b ee2e6000 80000019 ec1ed000 00000001 ee2e6000 00000200 00002000
     eb6e3300 c0102f27 eb19cc80 eb19cc80 00000000 c010ba5f eb19cc80 eb19cc80
     c013083c c010863b c078c405 00000000 00000000 c091e1c0 c015ade1 00000000
    Call Trace:
     [<c0102f27>] ? xen_free_ldt+0x27/0x40
     [<c010ba5f>] ? destroy_context+0x2f/0xa0
     [<c013083c>] ? __mmdrop+0x1c/0xb0
     [<c010863b>] ? __switch_to+0x13b/0x3a0
     [<c078c405>] ? _raw_spin_lock+0x5/0x10
     [<c015ade1>] ? finish_task_switch+0xb1/0xc0
     [<c078b77d>] ? __schedule+0x1fd/0x590
     [<c0639aa7>] ? get_unique_tuple+0x167/0x200
     [<c06258c2>] ? nf_ct_invert_tuple+0x52/0x70
     [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
     [<c078ad9d>] ? schedule_hrtimeout_range_clock+0x10d/0x120
     [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
     [<c014ecca>] ? add_wait_queue+0x1a/0x50
     [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
     [<c078adbf>] ? schedule_hrtimeout_range+0xf/0x20
     [<c01e4a47>] ? poll_schedule_timeout+0x37/0x50
     [<c01e57f5>] ? do_select+0x445/0x510
     [<c04f97f5>] ? info_for_irq+0x5/0x20
     [<c04f9eb0>] ? evtchn_from_irq+0x10/0x40
     [<c01e4be0>] ? __pollwait+0xf0/0xf0
     [<c01e4be0>] ? __pollwait+0xf0/0xf0
     [<c01e4be0>] ? __pollwait+0xf0/0xf0
     [<c01e4be0>] ? __pollwait+0xf0/0xf0
     [<c01e4be0>] ? __pollwait+0xf0/0xf0
     [<c01e4be0>] ? __pollwait+0xf0/0xf0
     [<c01e4be0>] ? __pollwait+0xf0/0xf0
     [<c069c55c>] ? udp_sendmsg+0x2ec/0x820
     [<c0678250>] ? ip_append_page+0x4e0/0x4e0
     [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
     [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
     [<c05d9cc2>] ? __skb_recv_datagram+0xf2/0x260
     [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
     [<c05d1586>] ? lock_sock_fast+0x16/0x50
     [<c069bf76>] ? udp_recvmsg+0x76/0x2e0
     [<c06a293c>] ? inet_recvmsg+0x5c/0xb0
     [<c06a3817>] ? inet_sendmsg+0x47/0xb0
     [<c05ce847>] ? sock_recvmsg+0xe7/0x100
     [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
     [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
     [<c0112017>] ? save_xstate_sig+0x297/0x310
     [<c01e5a6d>] ? core_sys_select+0x1ad/0x2a0
     [<c014165f>] ? __set_task_blocked+0x2f/0x80
     [<c01419a9>] ? set_current_blocked+0x49/0x60
     [<c0142fd4>] ? signal_delivered+0x44/0x60
     [<c0111002>] ? fpu_finit+0x52/0x60
     [<c018adb6>] ? __audit_syscall_exit+0x3b6/0x3f0
     [<c0112749>] ? syscall_trace_leave+0xd9/0x110
     [<c01e5dcf>] ? sys_select+0x2f/0xb0
     [<c078c844>] ? syscall_call+0x7/0xb
    Code: d2 0f a4 c2 0c c1 e0 0c 09 da 09 c6 89 f0 ff 15 d8 34 92 c0 31 f6 89
c5 8b 5c 24 04 89 54 24 08 89 c1 e8 3a e3 ff ff 85 c0 74 04 <0f> 0b eb fe
8b 04 24 8b 54 24 0c c1 e0 05 8b 04 10 c1 e8 1e 69
    EIP: [<c0102e8a>] set_aliased_prot+0x9a/0x110 SS:ESP 0069:ead9da10
    ---[ end trace a8da5f670b3c74bb ]---

Several processes have been seen to trigger it - audispd gam java klogd ntpd
perl syslogd
Eventually init will trigger it and cause a kernel panic:

    Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

I''ve done some Googling to see if this is a known issue. All I could
find was a closed Debian bug from Dec 2010 with no resolution:

    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607709

Please let me know if you need further info from me.

Thanks,
James Sinclair



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2013-Mar-08 08:38 UTC

head link

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

>>> On 08.03.13 at 03:23, James Sinclair
<james.sinclair@linode.com> wrote:
> Xen: (XEN) Xen version 4.1.5-pre (maker@(none)) (gcc version 4.4.5 (Debian 
> 4.4.5-8) ) Mon Feb  4 12:59:38 EST 2013
> Dom0: 3.7.7-1-x86_64 #1 SMP Thu Feb 14 15:58:35 EST 2013 x86_64 GNU/Linux
> DomU: 3.7.10 (32 bit)
> 
> We''re seeing this bug pop up fairly regularly. The BUG below is
from the
> first time it''s triggered after a reboot - subsequent triggers
produce similar
> tracebacks with the kernel flagged as "Tainted: G      D" - I can
provide more
> examples if desired.
Two fundamental things that are missing: For one, this is almost
certainly being accompanied by some hypervisor message(s), so
you will want to provide the hypervisor log (making sure the log
level is high enough). And then, even more with you using a not
yet released hypervisor version, you would want to clarify whether
this is a regression of some sort (i.e. whether a certain hypervisor
and/or kernel version are known not to exhibit this).
>     ------------[ cut here ]------------
>     kernel BUG at arch/x86/xen/enlighten.c:425!
>     invalid opcode: 0000 [#1] SMP 
>     Modules linked in:
>     Pid: 3158, comm: ntpd Not tainted 3.7.10-linode49 #2  
>     EIP: 0061:[<c0102e8a>] EFLAGS: 00210282 CPU: 0
>     EIP is at set_aliased_prot+0x9a/0x110
>     EAX: ffffffea EBX: ee2e6000 ECX: 2c74a063 EDX: 80000019
>     ESI: 00000000 EDI: 00002000 EBP: 2c74a063 ESP: ead9da10
>      DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>     CR0: 8005003b CR2: 4a258bc4 CR3: 2aa7a000 CR4: 00002660
>     DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>     DR6: ffff0ff0 DR7: 00000400
>     Process ntpd (pid: 3158, ti=ead9c000 task=eb6e3300 task.ti=ead9c000)
>     Stack:
>      0007fb6b ee2e6000 80000019 ec1ed000 00000001 ee2e6000 00000200
00002000
>      eb6e3300 c0102f27 eb19cc80 eb19cc80 00000000 c010ba5f eb19cc80
eb19cc80
>      c013083c c010863b c078c405 00000000 00000000 c091e1c0 c015ade1
00000000
>     Call Trace:
>      [<c0102f27>] ? xen_free_ldt+0x27/0x40
Quite unusual to have a process with LDT these days. And hence
quite likely for this to be the culprit. The primary guess therefore
would be that the LDT didn''t get cleared yet (MMUEXT_SET_LDT).

Jan
>      [<c010ba5f>] ? destroy_context+0x2f/0xa0
>      [<c013083c>] ? __mmdrop+0x1c/0xb0
>      [<c010863b>] ? __switch_to+0x13b/0x3a0
>      [<c078c405>] ? _raw_spin_lock+0x5/0x10
>      [<c015ade1>] ? finish_task_switch+0xb1/0xc0
>      [<c078b77d>] ? __schedule+0x1fd/0x590
>      [<c0639aa7>] ? get_unique_tuple+0x167/0x200
>      [<c06258c2>] ? nf_ct_invert_tuple+0x52/0x70
>      [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
>      [<c078ad9d>] ? schedule_hrtimeout_range_clock+0x10d/0x120
>      [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
>      [<c014ecca>] ? add_wait_queue+0x1a/0x50
>      [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>      [<c078adbf>] ? schedule_hrtimeout_range+0xf/0x20
>      [<c01e4a47>] ? poll_schedule_timeout+0x37/0x50
>      [<c01e57f5>] ? do_select+0x445/0x510
>      [<c04f97f5>] ? info_for_irq+0x5/0x20
>      [<c04f9eb0>] ? evtchn_from_irq+0x10/0x40
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c01e4be0>] ? __pollwait+0xf0/0xf0
>      [<c069c55c>] ? udp_sendmsg+0x2ec/0x820
>      [<c0678250>] ? ip_append_page+0x4e0/0x4e0
>      [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
>      [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>      [<c05d9cc2>] ? __skb_recv_datagram+0xf2/0x260
>      [<c078c62f>] ? _raw_spin_lock_bh+0xf/0x20
>      [<c05d1586>] ? lock_sock_fast+0x16/0x50
>      [<c069bf76>] ? udp_recvmsg+0x76/0x2e0
>      [<c06a293c>] ? inet_recvmsg+0x5c/0xb0
>      [<c06a3817>] ? inet_sendmsg+0x47/0xb0
>      [<c05ce847>] ? sock_recvmsg+0xe7/0x100
>      [<c078c437>] ? _raw_spin_lock_irqsave+0x27/0x40
>      [<c078c481>] ? _raw_spin_unlock_irqrestore+0x11/0x20
>      [<c0112017>] ? save_xstate_sig+0x297/0x310
>      [<c01e5a6d>] ? core_sys_select+0x1ad/0x2a0
>      [<c014165f>] ? __set_task_blocked+0x2f/0x80
>      [<c01419a9>] ? set_current_blocked+0x49/0x60
>      [<c0142fd4>] ? signal_delivered+0x44/0x60
>      [<c0111002>] ? fpu_finit+0x52/0x60
>      [<c018adb6>] ? __audit_syscall_exit+0x3b6/0x3f0
>      [<c0112749>] ? syscall_trace_leave+0xd9/0x110
>      [<c01e5dcf>] ? sys_select+0x2f/0xb0
>      [<c078c844>] ? syscall_call+0x7/0xb
>     Code: d2 0f a4 c2 0c c1 e0 0c 09 da 09 c6 89 f0 ff 15 d8 34 92 c0 31 f6
> 89 c5 8b 5c 24 04 89 54 24 08 89 c1 e8 3a e3 ff ff 85 c0 74 04 <0f>
0b eb fe 8b
> 04 24 8b 54 24 0c c1 e0 05 8b 04 10 c1 e8 1e 69
>     EIP: [<c0102e8a>] set_aliased_prot+0x9a/0x110 SS:ESP
0069:ead9da10
>     ---[ end trace a8da5f670b3c74bb ]---
> 
> Several processes have been seen to trigger it - audispd gam java klogd
ntpd
> perl syslogd
> Eventually init will trigger it and cause a kernel panic:
> 
>     Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> 
> I''ve done some Googling to see if this is a known issue. All I
could find
> was a closed Debian bug from Dec 2010 with no resolution:
> 
>     http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=607709 
> 
> Please let me know if you need further info from me.
> 
> Thanks,
> James Sinclair

Jan Beulich

2013-Mar-12 07:45 UTC

head link

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

>>> On 12.03.13 at 04:56, James Sinclair
<james.sinclair@linode.com> wrote:
(re-adding xen-devel to Cc)
> I''ve done further testing and I have confirmed that I can trigger
the bug
> pretty reliably on several Xen versions:
> 
>     * xen-3.4.4
>     * xen-4.1.4
>     * xen-4.1.5-pre
> 
> Dom0 kernels range between 2.6.18 & 3.7.7-1. I''ve also tried
quite a few DomU
> kernels and I can trigger it in everything I''ve tried between
2.6.39.1 and
> 3.7.10. In the end, I haven''t managed to find a combination of
Xen, Dom0 &
> DomU where I cannot trigger it.
Partly because you apparently never tried non-pvops DomU kernels.
But that''s not the point here.
> As for the hypervisor log, I''m not seeing anything get logged at
all. I
> suspect that I''m not capturing them correctly. Using Xen 4.1.4 I
added the
> following to the command line:
> 
>     loglvl=all guest_loglvl=all sync_console console_to_ring
earlyprintk=xen
> debug loglevel=8
Now, this mixture of hypervisor and kernel options already
suggests that you don''t look in the right place. Please put Xen
options on the Xen command line, and kernel ones on the kernel
line.
> I''m seeing extra logs from Xen during boot, but nothing when the
bug is
> triggered in the DomU. I''m assuming any extra logging will show up
on the
> console or in /var/log/xen/xend.log - is there somewhere else I should be 
> looking? Or something I''m not setting correctly?
They show up on the serial console (if in use) or in the output of
"xl dmesg" (or "xm dmesg" if xend is in use). Without extra
precautions, they _won''t_ show up anywhere under /var/log.

Jan

James Sinclair

2013-Mar-12 22:56 UTC

head link

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

On 12/03/2013, at 6:45 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 12.03.13 at 04:56, James Sinclair
<james.sinclair@linode.com> wrote:
> 
> (re-adding xen-devel to Cc)
> 
>> I''ve done further testing and I have confirmed that I can
trigger the bug
>> pretty reliably on several Xen versions:
>> 
>>    * xen-3.4.4
>>    * xen-4.1.4
>>    * xen-4.1.5-pre
>> 
>> Dom0 kernels range between 2.6.18 & 3.7.7-1. I''ve also
tried quite a few DomU
>> kernels and I can trigger it in everything I''ve tried between
2.6.39.1 and
>> 3.7.10. In the end, I haven''t managed to find a combination of
Xen, Dom0 &
>> DomU where I cannot trigger it.
> 
> Partly because you apparently never tried non-pvops DomU kernels.
> But that''s not the point here.
> 
>> As for the hypervisor log, I''m not seeing anything get logged
at all. I
>> suspect that I''m not capturing them correctly. Using Xen 4.1.4
I added the
>> following to the command line:
>> 
>>    loglvl=all guest_loglvl=all sync_console console_to_ring
earlyprintk=xen
>> debug loglevel=8
> 
> Now, this mixture of hypervisor and kernel options already
> suggests that you don''t look in the right place. Please put Xen
> options on the Xen command line, and kernel ones on the kernel
> line.
The documentation I found that mentioned using "earlyprintk=xen debug
loglevel=8" did not make it clear those are kernel options. I''ve
corrected that now:

    Hypervisor options: loglvl=all guest_loglvl=all sync_console console_to_ring
    Kernel options: earlyprintk=xen debug loglevel=8
> 
>> I''m seeing extra logs from Xen during boot, but nothing when
the bug is
>> triggered in the DomU. I''m assuming any extra logging will
show up on the
>> console or in /var/log/xen/xend.log - is there somewhere else I should
be
>> looking? Or something I''m not setting correctly?
> 
> They show up on the serial console (if in use) or in the output of
> "xl dmesg" (or "xm dmesg" if xend is in use). Without
extra
> precautions, they _won''t_ show up anywhere under /var/log.
I''m not seeing anything extra logged on the serial console nor in
"xl dmesg" when the bug hits. The only log entry I''m getting
is a warning from XendDomainInfo that the domain has crashed, and
that''s only logged when the bug hits a critical process (usually init)
that causes the domU kernel to panic.

I''m working on building a hypervisor with "debug=y" enabled
to see if that gives me anything extra.

James

James Sinclair

2013-Mar-13 00:38 UTC

head link

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

On 13/03/2013, at 9:56 AM, James Sinclair <james.sinclair@linode.com>
wrote:
> On 12/03/2013, at 6:45 PM, Jan Beulich <JBeulich@suse.com> wrote:
> 
>>>>> On 12.03.13 at 04:56, James Sinclair
<james.sinclair@linode.com> wrote:
>> 
>> (re-adding xen-devel to Cc)
>> 
>>> I''ve done further testing and I have confirmed that I can
trigger the bug
>>> pretty reliably on several Xen versions:
>>> 
>>>   * xen-3.4.4
>>>   * xen-4.1.4
>>>   * xen-4.1.5-pre
>>> 
>>> Dom0 kernels range between 2.6.18 & 3.7.7-1. I''ve also
tried quite a few DomU
>>> kernels and I can trigger it in everything I''ve tried
between 2.6.39.1 and
>>> 3.7.10. In the end, I haven''t managed to find a
combination of Xen, Dom0 &
>>> DomU where I cannot trigger it.
>> 
>> Partly because you apparently never tried non-pvops DomU kernels.
>> But that''s not the point here.
>> 
>>> As for the hypervisor log, I''m not seeing anything get
logged at all. I
>>> suspect that I''m not capturing them correctly. Using Xen
4.1.4 I added the
>>> following to the command line:
>>> 
>>>   loglvl=all guest_loglvl=all sync_console console_to_ring
earlyprintk=xen
>>> debug loglevel=8
>> 
>> Now, this mixture of hypervisor and kernel options already
>> suggests that you don''t look in the right place. Please put
Xen
>> options on the Xen command line, and kernel ones on the kernel
>> line.
> 
> The documentation I found that mentioned using "earlyprintk=xen debug
loglevel=8" did not make it clear those are kernel options. I''ve
corrected that now:
> 
>    Hypervisor options: loglvl=all guest_loglvl=all sync_console
console_to_ring
>    Kernel options: earlyprintk=xen debug loglevel=8
> 
>> 
>>> I''m seeing extra logs from Xen during boot, but nothing
when the bug is
>>> triggered in the DomU. I''m assuming any extra logging will
show up on the
>>> console or in /var/log/xen/xend.log - is there somewhere else I
should be
>>> looking? Or something I''m not setting correctly?
>> 
>> They show up on the serial console (if in use) or in the output of
>> "xl dmesg" (or "xm dmesg" if xend is in use).
Without extra
>> precautions, they _won''t_ show up anywhere under /var/log.
> 
> I''m not seeing anything extra logged on the serial console nor in
"xl dmesg" when the bug hits. The only log entry I''m getting
is a warning from XendDomainInfo that the domain has crashed, and
that''s only logged when the bug hits a critical process (usually init)
that causes the domU kernel to panic.
> 
> I''m working on building a hypervisor with "debug=y"
enabled to see if that gives me anything extra.
I''ve managed to set up a debug build of the 4.1.4 hypervisor and, with
logging turned all the way up (as far as I can tell), nothing gets logged when
the domU hits the kernel bug.

I''m not sure what to try next.

James

Ian Campbell

2013-Mar-13 09:50 UTC

head link

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

On Tue, 2013-03-12 at 22:56 +0000, James Sinclair wrote:> The documentation I found that mentioned using "earlyprintk=xen debug
> loglevel=8" did not make it clear those are kernel options.
Which document was this? Perhaps we can update it.
>  I''ve corrected that now:
> 
>     Hypervisor options: loglvl=all guest_loglvl=all sync_console
console_to_ring
>     Kernel options: earlyprintk=xen debug loglevel=8
Ian.

James Sinclair

2013-Mar-18 00:27 UTC

head link

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

On 13/03/2013, at 8:50 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Tue, 2013-03-12 at 22:56 +0000, James Sinclair wrote:
>> The documentation I found that mentioned using "earlyprintk=xen
debug
>> loglevel=8" did not make it clear those are kernel options.
> 
> Which document was this? Perhaps we can update it.
I thought it was in the wiki but after going through my browser''s
history of wiki pages I didn''t find it.

Anyway, I''ve corrected my configuration and triggered the bug - still
nothing being logged when it happens. What''s the next step? I there
something else I should try to diagnose the issue?

James

Xen devel - Mar 2013 - DomU: kernel BUG at arch/x86/xen/enlighten.c:425

DomU: kernel BUG at arch/x86/xen/enlighten.c:425

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425

Re: DomU: kernel BUG at arch/x86/xen/enlighten.c:425