thr3ads.net - Xen users - [Xen-users] Nasty kernel panic [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Steven Timm

2008-Aug-28 21:52 UTC

[Xen-users] Nasty kernel panic

I have seen the following kernel panic 5 times today on
three different machines, two of which had been stable
for months and one of which is a brand new install.

We are running the x86_64 xen kernel and userland tools that came in the 
Xen 3.1.0
tarball from xen.org, on top of scientific linux (redhat clone)
5.1 or 5.2.


<Aug/28 12:21 pm>Unable to handle kernel NULL pointer dereference at 
00000000000
000f4 RIP:
<Aug/28 12:21 pm> [<ffffffff88256375>] :ipv6:rt6_select+0x38/0x1f4
<Aug/28 12:21 pm>PGD 70010067 PUD 715bf067 PMD 0
<Aug/28 12:21 pm>Oops: 0000 [1] SMP
<Aug/28 12:21 pm>CPU 0
<Aug/28 12:21 pm>Modules linked in: dell_rbu firmware_class ipmi_devintf 
ipmi_si
  ipmi_msghandler mptctl mptbase nls_utf8 nfs lockd nfs_acl xt_physdev 
iptable_fi
lter ip_tables x_tables bridge ipv6 autofs4 hidp rfcomm l2cap bluetooth 
sunrpc b
infmt_misc dm_multipath video thermal sbs processor i2c_ec i2c_core fan 
containe
r button battery asus_acpi ac parport_pc lp parport floppy ide_cd cdrom 
ide_flop
py intel_rng joydev tsdev usbkbd usbmouse piix e752x_edac edac_mc sg e1000 
usbhi
d pcspkr serio_raw siimage dm_snapshot dm_zero dm_mirror dm_mod ide_disk 
ata_pii
x libata megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd 
ohci_hcd uh
ci_hcd usbcore
<Aug/28 12:21 pm>Pid: 3075, comm: avahi-daemon Tainted: GF     2.6.18-xen 
#1
<Aug/28 12:21 pm>RIP: e030:[<ffffffff88256375>] 
[<ffffffff88256375>]
:ipv6:rt6_
select+0x38/0x1f4
<Aug/28 12:21 pm>RSP: e02b:ffffffff80526b00  EFLAGS: 00010286
<Aug/28 12:21 pm>RAX: ffff88006cbd6000 RBX: ffffffff88283580 RCX: 
00000000000000
0d
<Aug/28 12:21 pm>RDX: 0000000000000001 RSI: 000000000000000d RDI: 
ffff880070a3d4
e0
<Aug/28 12:21 pm>RBP: ffff880070a3d4c0 R08: ffffffff8824f148 R09: 
ffffffff80526b
60
<Aug/28 12:21 pm>R10: ffffffff88293906 R11: ffff880061730180 R12: 
ffff880053e997
80
<Aug/28 12:21 pm>R13: 0000000000000000 R14: 0000000000000003 R15: 
00000000000000
01
<Aug/28 12:21 pm>FS:  00002b34a5da6370(0000) GS:ffffffff804d3000(0000) 
knlGS:000
0000000000000
<Aug/28 12:21 pm>CS:  e033 DS: 0000 ES: 0000
<Aug/28 12:21 pm>Process avahi-daemon (pid: 3075, threadinfo 
ffff88006fc8a000, t
ask ffff880000b0c860)
<Aug/28 12:21 pm>Stack:  0000000080526bb8 00000000ffffffff 
0000000000000000 0000
000000000000
<Aug/28 12:21 pm> 0000000d00000001 ffff880070a3d4e0 ffffffff8824f148 
ffffffff882
83580
<Aug/28 12:21 pm> ffff880070a3d4c0 ffff880053e99780 0000000000000000 
00000000000
00003
<Aug/28 12:21 pm>Call Trace:
<Aug/28 12:21 pm> <IRQ> [<ffffffff8824f148>]
:ipv6:ip6_rcv_finish+0x0/0x28
<Aug/28 12:21 pm> [<ffffffff882568e7>]
:ipv6:ip6_route_input+0x70/0x1cf
<Aug/28 12:21 pm> [<ffffffff8824f3c5>] :ipv6:ipv6_rcv+0x255/0x2ba
<Aug/28 12:21 pm> [<ffffffff80395cbc>] netif_receive_skb+0x2d3/0x2f3
<Aug/28 12:21 pm> [<ffffffff8828f9b4>]
:bridge:br_pass_frame_up+0x64/0x66
<Aug/28 12:21 pm> [<ffffffff8828fa7a>] 
:bridge:br_handle_frame_finish+0xc4/0xf6
<Aug/28 12:21 pm> [<ffffffff88292e57>] 
:bridge:br_nf_pre_routing_finish_ipv6+0xd
f/0xe3
<Aug/28 12:21 pm> [<ffffffff882935e6>] 
:bridge:br_nf_pre_routing+0x39b/0x667
<Aug/28 12:21 pm> [<ffffffff803ad73c>] nf_iterate+0x52/0x79
<Aug/28 12:21 pm> [<ffffffff8828f9b6>] 
:bridge:br_handle_frame_finish+0x0/0xf6
<Aug/28 12:21 pm> [<ffffffff803ad7d6>] nf_hook_slow+0x73/0xea
<Aug/28 12:21 pm> [<ffffffff8828f9b6>] 
:bridge:br_handle_frame_finish+0x0/0xf6
<Aug/28 12:21 pm> [<ffffffff8828fc43>]
:bridge:br_handle_frame+0x167/0x190
<Aug/28 12:21 pm> [<ffffffff80395c14>] netif_receive_skb+0x22b/0x2f3
<Aug/28 12:21 pm> [<ffffffff88107a79>] 
:e1000:e1000_clean_rx_irq+0x430/0x4d5
<Aug/28 12:21 pm> [<ffffffff881074ec>] :e1000:e1000_clean+0x82/0x160
<Aug/28 12:21 pm> [<ffffffff80395f51>] net_rx_action+0xe7/0x254
<Aug/28 12:21 pm> [<ffffffff80233d97>] __do_softirq+0x7b/0x10d
<Aug/28 12:21 pm> [<ffffffff8020b094>] call_softirq+0x1c/0x28
<Aug/28 12:21 pm> [<ffffffff8020cdfd>] do_softirq+0x62/0xd9
<Aug/28 12:21 pm> [<ffffffff8020cc9c>] do_IRQ+0x68/0x71
<Aug/28 12:21 pm> [<ffffffff8034b347>] evtchn_do_upcall+0xee/0x165
<Aug/28 12:21 pm> [<ffffffff8020abca>]
do_hypervisor_callback+0x1e/0x2c
<Aug/28 12:21 pm> <EOI>

<Aug/28 12:21 pm>Code: 41 8b 85 f4 00 00 00 4d 85 ed 4d 89 ec 89 44 24 0c 
0f 84
36
<Aug/28 12:21 pm>RIP  [<ffffffff88256375>]
:ipv6:rt6_select+0x38/0x1f4
<Aug/28 12:21 pm> RSP <ffffffff80526b00>
<Aug/28 12:21 pm>CR2: 00000000000000f4
<Aug/28 12:21 pm> <0>Kernel panic - not syncing: Aiee, killing
interrupt
handler


------------------------------------------------

There are different processes pid''s that show as the triggering process
but the base error is the same.  A couple times it is triggered by the 
swapper.

What is puzzling is the references to ipv6 which I was pretty sure I
have disabled everywhere.  To be clear these crashes
are from the dom0, and when it happens the dom0 hangs and does
not auto-reboot, it requires a reset.

Any ideas?  This config has been pretty stable for us on 7
different machines including these ones.  A couple of times it happened
just about the time we were shutting down a xen domU, a couple
other times today it happened on a machine that I wasn''t even working
on.

Steve Timm



-- 
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Todd Deshane

2008-Aug-29 00:30 UTC

head link

Re: [Xen-users] Nasty kernel panic

On Thu, Aug 28, 2008 at 5:52 PM, Steven Timm <timm@fnal.gov>
wrote:>
> I have seen the following kernel panic 5 times today on
> three different machines, two of which had been stable
> for months and one of which is a brand new install.
>
I just have to ask the obvious question, what changed?
Something in your environment maybe?

Have you figured out a way to reliable reproduce it?

If so, it may be worthwhile to set debug = y" in Config.mk in
the Xen source tree
>From the messages it does look network related...
Hope that helps,
Cheers,
Todd


-- 
Todd Deshane
http://todddeshane.net
check out our book: http://runningxen.com

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Asim

2008-Aug-29 00:35 UTC

head link

Re: [Xen-users] Nasty kernel panic

Hi,

I''m also using e1000 on my domUs. I have been keeping track of e1000
internal function calling sequences (as a part of my project). I do
not see any ipv6 relaated calls because it is disabled. I would
encourage you to double check whether ipv6 is actually disabled. I
don''t remember exactly but it was a little difficult actually
disabling it.

Regards,
Asim
On 8/28/08, Steven Timm <timm@fnal.gov> wrote:>
> I have seen the following kernel panic 5 times today on
> three different machines, two of which had been stable
> for months and one of which is a brand new install.
>
> We are running the x86_64 xen kernel and userland tools that came in the
> Xen 3.1.0
> tarball from xen.org, on top of scientific linux (redhat clone)
> 5.1 or 5.2.
>
>
> <Aug/28 12:21 pm>Unable to handle kernel NULL pointer dereference at
> 00000000000
> 000f4 RIP:
> <Aug/28 12:21 pm> [<ffffffff88256375>]
:ipv6:rt6_select+0x38/0x1f4
> <Aug/28 12:21 pm>PGD 70010067 PUD 715bf067 PMD 0
> <Aug/28 12:21 pm>Oops: 0000 [1] SMP
> <Aug/28 12:21 pm>CPU 0
> <Aug/28 12:21 pm>Modules linked in: dell_rbu firmware_class
ipmi_devintf
> ipmi_si
>   ipmi_msghandler mptctl mptbase nls_utf8 nfs lockd nfs_acl xt_physdev
> iptable_fi
> lter ip_tables x_tables bridge ipv6 autofs4 hidp rfcomm l2cap bluetooth
> sunrpc b
> infmt_misc dm_multipath video thermal sbs processor i2c_ec i2c_core fan
> containe
> r button battery asus_acpi ac parport_pc lp parport floppy ide_cd cdrom
> ide_flop
> py intel_rng joydev tsdev usbkbd usbmouse piix e752x_edac edac_mc sg e1000
> usbhi
> d pcspkr serio_raw siimage dm_snapshot dm_zero dm_mirror dm_mod ide_disk
> ata_pii
> x libata megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd
> ohci_hcd uh
> ci_hcd usbcore
> <Aug/28 12:21 pm>Pid: 3075, comm: avahi-daemon Tainted: GF    
2.6.18-xen
> #1
> <Aug/28 12:21 pm>RIP: e030:[<ffffffff88256375>] 
[<ffffffff88256375>]
> :ipv6:rt6_
> select+0x38/0x1f4
> <Aug/28 12:21 pm>RSP: e02b:ffffffff80526b00  EFLAGS: 00010286
> <Aug/28 12:21 pm>RAX: ffff88006cbd6000 RBX: ffffffff88283580 RCX:
> 00000000000000
> 0d
> <Aug/28 12:21 pm>RDX: 0000000000000001 RSI: 000000000000000d RDI:
> ffff880070a3d4
> e0
> <Aug/28 12:21 pm>RBP: ffff880070a3d4c0 R08: ffffffff8824f148 R09:
> ffffffff80526b
> 60
> <Aug/28 12:21 pm>R10: ffffffff88293906 R11: ffff880061730180 R12:
> ffff880053e997
> 80
> <Aug/28 12:21 pm>R13: 0000000000000000 R14: 0000000000000003 R15:
> 00000000000000
> 01
> <Aug/28 12:21 pm>FS:  00002b34a5da6370(0000)
GS:ffffffff804d3000(0000)
> knlGS:000
> 0000000000000
> <Aug/28 12:21 pm>CS:  e033 DS: 0000 ES: 0000
> <Aug/28 12:21 pm>Process avahi-daemon (pid: 3075, threadinfo
> ffff88006fc8a000, t
> ask ffff880000b0c860)
> <Aug/28 12:21 pm>Stack:  0000000080526bb8 00000000ffffffff
> 0000000000000000 0000
> 000000000000
> <Aug/28 12:21 pm> 0000000d00000001 ffff880070a3d4e0 ffffffff8824f148
> ffffffff882
> 83580
> <Aug/28 12:21 pm> ffff880070a3d4c0 ffff880053e99780 0000000000000000
> 00000000000
> 00003
> <Aug/28 12:21 pm>Call Trace:
> <Aug/28 12:21 pm> <IRQ> [<ffffffff8824f148>]
:ipv6:ip6_rcv_finish+0x0/0x28
> <Aug/28 12:21 pm> [<ffffffff882568e7>]
:ipv6:ip6_route_input+0x70/0x1cf
> <Aug/28 12:21 pm> [<ffffffff8824f3c5>]
:ipv6:ipv6_rcv+0x255/0x2ba
> <Aug/28 12:21 pm> [<ffffffff80395cbc>]
netif_receive_skb+0x2d3/0x2f3
> <Aug/28 12:21 pm> [<ffffffff8828f9b4>]
:bridge:br_pass_frame_up+0x64/0x66
> <Aug/28 12:21 pm> [<ffffffff8828fa7a>]
> :bridge:br_handle_frame_finish+0xc4/0xf6
> <Aug/28 12:21 pm> [<ffffffff88292e57>]
> :bridge:br_nf_pre_routing_finish_ipv6+0xd
> f/0xe3
> <Aug/28 12:21 pm> [<ffffffff882935e6>]
> :bridge:br_nf_pre_routing+0x39b/0x667
> <Aug/28 12:21 pm> [<ffffffff803ad73c>] nf_iterate+0x52/0x79
> <Aug/28 12:21 pm> [<ffffffff8828f9b6>]
> :bridge:br_handle_frame_finish+0x0/0xf6
> <Aug/28 12:21 pm> [<ffffffff803ad7d6>] nf_hook_slow+0x73/0xea
> <Aug/28 12:21 pm> [<ffffffff8828f9b6>]
> :bridge:br_handle_frame_finish+0x0/0xf6
> <Aug/28 12:21 pm> [<ffffffff8828fc43>]
:bridge:br_handle_frame+0x167/0x190
> <Aug/28 12:21 pm> [<ffffffff80395c14>]
netif_receive_skb+0x22b/0x2f3
> <Aug/28 12:21 pm> [<ffffffff88107a79>]
> :e1000:e1000_clean_rx_irq+0x430/0x4d5
> <Aug/28 12:21 pm> [<ffffffff881074ec>]
:e1000:e1000_clean+0x82/0x160
> <Aug/28 12:21 pm> [<ffffffff80395f51>] net_rx_action+0xe7/0x254
> <Aug/28 12:21 pm> [<ffffffff80233d97>] __do_softirq+0x7b/0x10d
> <Aug/28 12:21 pm> [<ffffffff8020b094>] call_softirq+0x1c/0x28
> <Aug/28 12:21 pm> [<ffffffff8020cdfd>] do_softirq+0x62/0xd9
> <Aug/28 12:21 pm> [<ffffffff8020cc9c>] do_IRQ+0x68/0x71
> <Aug/28 12:21 pm> [<ffffffff8034b347>]
evtchn_do_upcall+0xee/0x165
> <Aug/28 12:21 pm> [<ffffffff8020abca>]
do_hypervisor_callback+0x1e/0x2c
> <Aug/28 12:21 pm> <EOI>
>
> <Aug/28 12:21 pm>Code: 41 8b 85 f4 00 00 00 4d 85 ed 4d 89 ec 89 44
24 0c
> 0f 84
> 36
> <Aug/28 12:21 pm>RIP  [<ffffffff88256375>]
:ipv6:rt6_select+0x38/0x1f4
> <Aug/28 12:21 pm> RSP <ffffffff80526b00>
> <Aug/28 12:21 pm>CR2: 00000000000000f4
> <Aug/28 12:21 pm> <0>Kernel panic - not syncing: Aiee, killing
interrupt
> handler
>
>
> ------------------------------------------------
>
> There are different processes pid''s that show as the triggering
process
> but the base error is the same.  A couple times it is triggered by the
> swapper.
>
> What is puzzling is the references to ipv6 which I was pretty sure I
> have disabled everywhere.  To be clear these crashes
> are from the dom0, and when it happens the dom0 hangs and does
> not auto-reboot, it requires a reset.
>
> Any ideas?  This config has been pretty stable for us on 7
> different machines including these ones.  A couple of times it happened
> just about the time we were shutting down a xen domU, a couple
> other times today it happened on a machine that I wasn''t even
working on.
>
> Steve Timm
>
>
>
> --
> ------------------------------------------------------------------
> Steven C. Timm, Ph.D  (630) 840-8525
> timm@fnal.gov  http://home.fnal.gov/~timm/
> Fermilab Computing Division, Scientific Computing Facilities,
> Grid Facilities Department, FermiGrid Services Group, Assistant Group
> Leader.
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steven Timm

2008-Aug-29 01:55 UTC

head link

Re: [Xen-users] Nasty kernel panic

------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

On Thu, 28 Aug 2008, Todd Deshane wrote:
> On Thu, Aug 28, 2008 at 5:52 PM, Steven Timm <timm@fnal.gov> wrote:
>>
>> I have seen the following kernel panic 5 times today on
>> three different machines, two of which had been stable
>> for months and one of which is a brand new install.
>>
>
> I just have to ask the obvious question, what changed?
> Something in your environment maybe?
Today we did a heavy rsync load from the old machine to the
new machine, and shutdown some of the VM''s for the first time since
we''d deployed the old machine.  Two of the crashes happened around that
time (both on the source and destination meachine)  Another machine also 
crashed, which we were not working on, but is the half of the HA-squid
server that was supposed to pick up the load.
(These machines are predominantly squid servers.)
I''m in the process of changing all my servers from poweredge 2850
to 2950.  I''ve deployed 3 other 2950''s of near-identical OS
configuration
that hadn''t crashed.  But in today''s events, both the 2850 I
rsync''ed off
of, and the 2950 I rsync''ed on to, crashed at one point or another.

>
> Have you figured out a way to reliable reproduce it?It seems like the key is lots of ongoing I/O simultaneous to
one or more vm''s getting restarted.  Since all the post-install
reboots of the various xen instances finished, we''ve been fine.
>
> If so, it may be worthwhile to set debug = y" in Config.mk in
> the Xen source treeI didn''t build this source--this is the vanilla 2.6.18-xen kernel
that was available in the xen 3.1.0 tarballs from xen.org.
>
> From the messages it does look network related...
>Any way to check if there is spare ipv6 configuration laying around
somewhere, possibly in the domu''s?

Steve

> Hope that helps,
> Cheers,
> Todd
>
>
> -- 
> Todd Deshane
> http://todddeshane.net
> check out our book: http://runningxen.com
>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steven Timm

2008-Aug-29 02:25 UTC

head link

Re: [Xen-users] Nasty kernel panic

Thanks--but it should be noted that I saw the crash both in
the 2850, which has e1000 network, and the 2950, which has broadcom.
I''ll look for the new driver.  Thanks

Steve


------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

On Thu, 28 Aug 2008, Tim Wickberg wrote:
> Steven Timm wrote:
>> I''m in the process of changing all my servers from poweredge
2850
>> to 2950.  I''ve deployed 3 other 2950''s of
near-identical OS configuration
>> that hadn''t crashed.  But in today''s events, both the
2850 I rsync''ed off
>> of, and the 2950 I rsync''ed on to, crashed at one point or
another.
>> 
>
> One related issue with the Broadcomm network cards in the PowerEdges:
>
> The bnx2 driver in the Xen 2.6.18 tree is out of date and kept crashing
under
> heavy load. Installing the bnx2-1.7.1c driver seems to have cleared this
up.
> (Don''t forget to to update-initrd after installing it, or
you''ll reboot into
> the same crash-y driver.)
>
> Xen bug report here: 
> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1294
>
> - Tim
>
> -- 
> Tim Wickberg
> wickbt@rpi.edu
> Senior System Administrator
> Office of Research - Rensselaer Polytechnic Institute
>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steven Timm

2008-Aug-29 02:29 UTC

head link

Re: [Xen-users] Nasty kernel panic

Does it really matter what you have in /etc/modprobe.conf inside
the domu''s?  (note that the crash below is a crash of dom0).
In one of the machines, my physical network is an e1000
but i have bnx2 in modprobe.conf but neither module is actually loaded
in the domu.

Here''s ip6tables from one of the domu''s


[root@fg2x2 ~]# ip6tables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination
RH-Firewall-1-INPUT  all      anywhere             anywhere

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination
RH-Firewall-1-INPUT  all      anywhere             anywhere

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain RH-Firewall-1-INPUT (2 references)
target     prot opt source               destination
ACCEPT     all      anywhere             anywhere
ACCEPT     ipv6-icmp    anywhere             anywhere
ACCEPT     esp      anywhere             anywhere
ACCEPT     ah       anywhere             anywhere
ACCEPT     udp      anywhere             ff02::fb/128       udp dpt:mdns
ACCEPT     udp      anywhere             anywhere           udp dpt:ipp
ACCEPT     tcp      anywhere             anywhere           tcp dpt:ipp
ACCEPT     udp      anywhere             anywhere           udp 
dpts:filenet-tms:61000
ACCEPT     tcp      anywhere             anywhere           tcp 
dpts:filenet-tms:61000
ACCEPT     tcp      anywhere             anywhere           tcp dpt:ssh
REJECT     all      anywhere             anywhere           reject-with 
icmp6-port-unreachable

Does that mean it''s on?

Steve Timm


> Hi,
>
> I''m also using e1000 on my domUs. I have been keeping track of
e1000
> internal function calling sequences (as a part of my project). I do
> not see any ipv6 relaated calls because it is disabled. I would
> encourage you to double check whether ipv6 is actually disabled. I
> don''t remember exactly but it was a little difficult actually
> disabling it.
>
> Regards,
> Asim
> On 8/28/08, Steven Timm <timm@fnal.gov> wrote:
>>
>> I have seen the following kernel panic 5 times today on
>> three different machines, two of which had been stable
>> for months and one of which is a brand new install.
>>
>> We are running the x86_64 xen kernel and userland tools that came in
the
>> Xen 3.1.0
>> tarball from xen.org, on top of scientific linux (redhat clone)
>> 5.1 or 5.2.
>>
>>
>> <Aug/28 12:21 pm>Unable to handle kernel NULL pointer dereference
at
>> 00000000000
>> 000f4 RIP:
>> <Aug/28 12:21 pm> [<ffffffff88256375>]
:ipv6:rt6_select+0x38/0x1f4
>> <Aug/28 12:21 pm>PGD 70010067 PUD 715bf067 PMD 0
>> <Aug/28 12:21 pm>Oops: 0000 [1] SMP
>> <Aug/28 12:21 pm>CPU 0
>> <Aug/28 12:21 pm>Modules linked in: dell_rbu firmware_class
ipmi_devintf
>> ipmi_si
>>   ipmi_msghandler mptctl mptbase nls_utf8 nfs lockd nfs_acl xt_physdev
>> iptable_fi
>> lter ip_tables x_tables bridge ipv6 autofs4 hidp rfcomm l2cap bluetooth
>> sunrpc b
>> infmt_misc dm_multipath video thermal sbs processor i2c_ec i2c_core fan
>> containe
>> r button battery asus_acpi ac parport_pc lp parport floppy ide_cd cdrom
>> ide_flop
>> py intel_rng joydev tsdev usbkbd usbmouse piix e752x_edac edac_mc sg
e1000
>> usbhi
>> d pcspkr serio_raw siimage dm_snapshot dm_zero dm_mirror dm_mod
ide_disk
>> ata_pii
>> x libata megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd
>> ohci_hcd uh
>> ci_hcd usbcore
>> <Aug/28 12:21 pm>Pid: 3075, comm: avahi-daemon Tainted: GF    
2.6.18-xen
>> #1
>> <Aug/28 12:21 pm>RIP: e030:[<ffffffff88256375>] 
[<ffffffff88256375>]
>> :ipv6:rt6_
>> select+0x38/0x1f4
>> <Aug/28 12:21 pm>RSP: e02b:ffffffff80526b00  EFLAGS: 00010286
>> <Aug/28 12:21 pm>RAX: ffff88006cbd6000 RBX: ffffffff88283580 RCX:
>> 00000000000000
>> 0d
>> <Aug/28 12:21 pm>RDX: 0000000000000001 RSI: 000000000000000d RDI:
>> ffff880070a3d4
>> e0
>> <Aug/28 12:21 pm>RBP: ffff880070a3d4c0 R08: ffffffff8824f148 R09:
>> ffffffff80526b
>> 60
>> <Aug/28 12:21 pm>R10: ffffffff88293906 R11: ffff880061730180 R12:
>> ffff880053e997
>> 80
>> <Aug/28 12:21 pm>R13: 0000000000000000 R14: 0000000000000003 R15:
>> 00000000000000
>> 01
>> <Aug/28 12:21 pm>FS:  00002b34a5da6370(0000)
GS:ffffffff804d3000(0000)
>> knlGS:000
>> 0000000000000
>> <Aug/28 12:21 pm>CS:  e033 DS: 0000 ES: 0000
>> <Aug/28 12:21 pm>Process avahi-daemon (pid: 3075, threadinfo
>> ffff88006fc8a000, t
>> ask ffff880000b0c860)
>> <Aug/28 12:21 pm>Stack:  0000000080526bb8 00000000ffffffff
>> 0000000000000000 0000
>> 000000000000
>> <Aug/28 12:21 pm> 0000000d00000001 ffff880070a3d4e0
ffffffff8824f148
>> ffffffff882
>> 83580
>> <Aug/28 12:21 pm> ffff880070a3d4c0 ffff880053e99780
0000000000000000
>> 00000000000
>> 00003
>> <Aug/28 12:21 pm>Call Trace:
>> <Aug/28 12:21 pm> <IRQ> [<ffffffff8824f148>]
:ipv6:ip6_rcv_finish+0x0/0x28
>> <Aug/28 12:21 pm> [<ffffffff882568e7>]
:ipv6:ip6_route_input+0x70/0x1cf
>> <Aug/28 12:21 pm> [<ffffffff8824f3c5>]
:ipv6:ipv6_rcv+0x255/0x2ba
>> <Aug/28 12:21 pm> [<ffffffff80395cbc>]
netif_receive_skb+0x2d3/0x2f3
>> <Aug/28 12:21 pm> [<ffffffff8828f9b4>]
:bridge:br_pass_frame_up+0x64/0x66
>> <Aug/28 12:21 pm> [<ffffffff8828fa7a>]
>> :bridge:br_handle_frame_finish+0xc4/0xf6
>> <Aug/28 12:21 pm> [<ffffffff88292e57>]
>> :bridge:br_nf_pre_routing_finish_ipv6+0xd
>> f/0xe3
>> <Aug/28 12:21 pm> [<ffffffff882935e6>]
>> :bridge:br_nf_pre_routing+0x39b/0x667
>> <Aug/28 12:21 pm> [<ffffffff803ad73c>] nf_iterate+0x52/0x79
>> <Aug/28 12:21 pm> [<ffffffff8828f9b6>]
>> :bridge:br_handle_frame_finish+0x0/0xf6
>> <Aug/28 12:21 pm> [<ffffffff803ad7d6>]
nf_hook_slow+0x73/0xea
>> <Aug/28 12:21 pm> [<ffffffff8828f9b6>]
>> :bridge:br_handle_frame_finish+0x0/0xf6
>> <Aug/28 12:21 pm> [<ffffffff8828fc43>]
:bridge:br_handle_frame+0x167/0x190
>> <Aug/28 12:21 pm> [<ffffffff80395c14>]
netif_receive_skb+0x22b/0x2f3
>> <Aug/28 12:21 pm> [<ffffffff88107a79>]
>> :e1000:e1000_clean_rx_irq+0x430/0x4d5
>> <Aug/28 12:21 pm> [<ffffffff881074ec>]
:e1000:e1000_clean+0x82/0x160
>> <Aug/28 12:21 pm> [<ffffffff80395f51>]
net_rx_action+0xe7/0x254
>> <Aug/28 12:21 pm> [<ffffffff80233d97>]
__do_softirq+0x7b/0x10d
>> <Aug/28 12:21 pm> [<ffffffff8020b094>]
call_softirq+0x1c/0x28
>> <Aug/28 12:21 pm> [<ffffffff8020cdfd>] do_softirq+0x62/0xd9
>> <Aug/28 12:21 pm> [<ffffffff8020cc9c>] do_IRQ+0x68/0x71
>> <Aug/28 12:21 pm> [<ffffffff8034b347>]
evtchn_do_upcall+0xee/0x165
>> <Aug/28 12:21 pm> [<ffffffff8020abca>]
do_hypervisor_callback+0x1e/0x2c
>> <Aug/28 12:21 pm> <EOI>
>>
>> <Aug/28 12:21 pm>Code: 41 8b 85 f4 00 00 00 4d 85 ed 4d 89 ec 89
44 24 0c
>> 0f 84
>> 36
>> <Aug/28 12:21 pm>RIP  [<ffffffff88256375>]
:ipv6:rt6_select+0x38/0x1f4
>> <Aug/28 12:21 pm> RSP <ffffffff80526b00>
>> <Aug/28 12:21 pm>CR2: 00000000000000f4
>> <Aug/28 12:21 pm> <0>Kernel panic - not syncing: Aiee,
killing interrupt
>> handler
>>
>>
>> ------------------------------------------------
>>
>> There are different processes pid''s that show as the
triggering process
>> but the base error is the same.  A couple times it is triggered by the
>> swapper.
>>
>> What is puzzling is the references to ipv6 which I was pretty sure I
>> have disabled everywhere.  To be clear these crashes
>> are from the dom0, and when it happens the dom0 hangs and does
>> not auto-reboot, it requires a reset.
>>
>> Any ideas?  This config has been pretty stable for us on 7
>> different machines including these ones.  A couple of times it happened
>> just about the time we were shutting down a xen domU, a couple
>> other times today it happened on a machine that I wasn''t even
working on.
>>
>> Steve Timm
>>
>>
>>
>> --
>> ------------------------------------------------------------------
>> Steven C. Timm, Ph.D  (630) 840-8525
>> timm@fnal.gov  http://home.fnal.gov/~timm/
>> Fermilab Computing Division, Scientific Computing Facilities,
>> Grid Facilities Department, FermiGrid Services Group, Assistant Group
>> Leader.
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xensource.com
>> http://lists.xensource.com/xen-users
>>
>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Tim Post

2008-Aug-29 02:38 UTC

head link

Re: [Xen-users] Nasty kernel panic

Hi Steve,

On Thu, 2008-08-28 at 16:52 -0500, Steven Timm wrote:> I have seen the following kernel panic 5 times today on
> three different machines, two of which had been stable
> for months and one of which is a brand new install.
[snip]
> <Aug/28 12:21 pm> [<ffffffff88107a79>] 
> :e1000:e1000_clean_rx_irq+0x430/0x4d5
> <Aug/28 12:21 pm> [<ffffffff881074ec>]
:e1000:e1000_clean+0x82/0x160
> <Aug/28 12:21 pm> [<ffffffff80395f51>] net_rx_action+0xe7/0x254
> <Aug/28 12:21 pm> [<ffffffff80233d97>] __do_softirq+0x7b/0x10d
> <Aug/28 12:21 pm> [<ffffffff8020b094>] call_softirq+0x1c/0x28
> <Aug/28 12:21 pm> [<ffffffff8020cdfd>] do_softirq+0x62/0xd9
> <Aug/28 12:21 pm> [<ffffffff8020cc9c>] do_IRQ+0x68/0x71
> <Aug/28 12:21 pm> [<ffffffff8034b347>]
evtchn_do_upcall+0xee/0x165
> <Aug/28 12:21 pm> [<ffffffff8020abca>]
do_hypervisor_callback+0x1e/0x2c
> <Aug/28 12:21 pm> <EOI>
> 
> <Aug/28 12:21 pm>Code: 41 8b 85 f4 00 00 00 4d 85 ed 4d 89 ec 89 44
24 0c
> 0f 84
> 36
> <Aug/28 12:21 pm>RIP  [<ffffffff88256375>]
:ipv6:rt6_select+0x38/0x1f4
> <Aug/28 12:21 pm> RSP <ffffffff80526b00>
> <Aug/28 12:21 pm>CR2: 00000000000000f4
> <Aug/28 12:21 pm> <0>Kernel panic - not syncing: Aiee, killing
interrupt
> handler
It looks like e1000 might be being spit out. From what I gather in your
message, the only thing that changed was you are now putting a much
higher I/O demand on the drives (rsyncing everything), by extension this
increases the demand on the NIC.

If the e1000 nic is the one enslaved to the bridge, it could be clean up
that''s making it freak when a guest stops. If its ejected uncleanly,
the
PID next in line with pending i/o for the device will likely be
identified as the culprit.

I had a very similar problem with a buggy Areca driver on dom-0 a couple
of years ago.

Can you post a link to your kernel''s .config, or perhaps try the latest
stable version of that module from:

http://sourceforge.net/project/showfiles.php?group_id=42302

As for ipv6, if its being set up you''ll see it in /etc/sysconfig
or /etc/network (depending on the distro) pretty clearly. However, that
shouldn''t make a difference .. it should work either way.

Hope this helps :)


Cheers!
--Tim

-- 
Monkey + Typewriter = Echoreply ( http://echoreply.us )


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Tim Post

2008-Aug-29 02:44 UTC

head link

Re: [Xen-users] Nasty kernel panic

Hi Steve,

On Thu, 2008-08-28 at 21:29 -0500, Steven Timm wrote:> 
> Does it really matter what you have in /etc/modprobe.conf inside
> the domu''s?  
Not really. You only need modules for non-devices, unless your using PCI
passthrough to the guest.

For instance, iptables, ext3, jfs, etc, if these aren''t statically
compiled into the kernel. In most cases these would be on an initrd
anyway.

Unless the netfront / blockfront drivers are not static in the kernel,
you shouldn''t have to worry about anything else.

If you exported a PCI device, you''d need the module for that device.
> (note that the crash below is a crash of dom0).
> In one of the machines, my physical network is an e1000
> but i have bnx2 in modprobe.conf but neither module is actually loaded
> in the domu.
I''m really starting to suspect that e1000 is your issue. However,
having
those other modules in the dom-u shouldn''t really matter, the crash is
on dom-0.
> 
> Here''s ip6tables from one of the domu''s[snip]

That just shows that you have ipv6 iptables support .. shouldn''t
matter.
You could recompile a special kernel for dom-0 or dom-u''s if your never
going to use ipv6 .. however just having support for it should not be
causing this problem.

Cheers,
--Tim

-- 
Monkey + Typewriter = Echoreply ( http://echoreply.us )

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Bastian Blank

2008-Aug-29 13:52 UTC

head link

Re: [Xen-users] Nasty kernel panic

On Thu, Aug 28, 2008 at 04:52:44PM -0500, Steven Timm
wrote:> <Aug/28 12:21 pm>Pid: 3075, comm: avahi-daemon Tainted: GF    
2.6.18-xen
> #1
Forced module load somewhat voids your waranty, don''t do that. The
module version informations are there for a reason.

Bastian

-- 
There is a multi-legged creature crawling on your shoulder.
		-- Spock, "A Taste of Armageddon", stardate 3193.9

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steven Timm

2008-Aug-29 14:32 UTC

head link

Re: [Xen-users] Nasty kernel panic

A couple people have pointed at the e1000 driver as a possible culprit
and given good reasons why that should be the case..my only question
is why did I also get the same kernel panic on the new poweredge 2950
which doesn''t have intel e1000 but broadcomm drivers and nics?

By the way, all the systems in question have now been up for 18 hours
and functioning fine so once we got first the rsyncing done, and
then the squid servers all re-initialized correctly, we have been
OK since then.

I am away from the office but I will follow up the thread and post
the kernel config later.

Steve Timm


------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

On Fri, 29 Aug 2008, Tim Post wrote:
> Hi Steve,
>
> On Thu, 2008-08-28 at 16:52 -0500, Steven Timm wrote:
>> I have seen the following kernel panic 5 times today on
>> three different machines, two of which had been stable
>> for months and one of which is a brand new install.
>
> [snip]
>
>> <Aug/28 12:21 pm> [<ffffffff88107a79>]
>> :e1000:e1000_clean_rx_irq+0x430/0x4d5
>> <Aug/28 12:21 pm> [<ffffffff881074ec>]
:e1000:e1000_clean+0x82/0x160
>> <Aug/28 12:21 pm> [<ffffffff80395f51>]
net_rx_action+0xe7/0x254
>> <Aug/28 12:21 pm> [<ffffffff80233d97>]
__do_softirq+0x7b/0x10d
>> <Aug/28 12:21 pm> [<ffffffff8020b094>]
call_softirq+0x1c/0x28
>> <Aug/28 12:21 pm> [<ffffffff8020cdfd>] do_softirq+0x62/0xd9
>> <Aug/28 12:21 pm> [<ffffffff8020cc9c>] do_IRQ+0x68/0x71
>> <Aug/28 12:21 pm> [<ffffffff8034b347>]
evtchn_do_upcall+0xee/0x165
>> <Aug/28 12:21 pm> [<ffffffff8020abca>]
do_hypervisor_callback+0x1e/0x2c
>> <Aug/28 12:21 pm> <EOI>
>>
>> <Aug/28 12:21 pm>Code: 41 8b 85 f4 00 00 00 4d 85 ed 4d 89 ec 89
44 24 0c
>> 0f 84
>> 36
>> <Aug/28 12:21 pm>RIP  [<ffffffff88256375>]
:ipv6:rt6_select+0x38/0x1f4
>> <Aug/28 12:21 pm> RSP <ffffffff80526b00>
>> <Aug/28 12:21 pm>CR2: 00000000000000f4
>> <Aug/28 12:21 pm> <0>Kernel panic - not syncing: Aiee,
killing interrupt
>> handler
>
> It looks like e1000 might be being spit out. From what I gather in your
> message, the only thing that changed was you are now putting a much
> higher I/O demand on the drives (rsyncing everything), by extension this
> increases the demand on the NIC.
>
> If the e1000 nic is the one enslaved to the bridge, it could be clean up
> that''s making it freak when a guest stops. If its ejected
uncleanly, the
> PID next in line with pending i/o for the device will likely be
> identified as the culprit.
>
> I had a very similar problem with a buggy Areca driver on dom-0 a couple
> of years ago.
>
> Can you post a link to your kernel''s .config, or perhaps try the
latest
> stable version of that module from:
>
> http://sourceforge.net/project/showfiles.php?group_id=42302
>
> As for ipv6, if its being set up you''ll see it in /etc/sysconfig
> or /etc/network (depending on the distro) pretty clearly. However, that
> shouldn''t make a difference .. it should work either way.
>
> Hope this helps :)
>
>
> Cheers!
> --Tim
>
> -- 
> Monkey + Typewriter = Echoreply ( http://echoreply.us )
>
>
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Tim Post

2008-Aug-29 14:48 UTC

head link

Re: [Xen-users] Nasty kernel panic

Hi Steve,

On Fri, 2008-08-29 at 09:32 -0500, Steven Timm wrote:> A couple people have pointed at the e1000 driver as a possible culprit
> and given good reasons why that should be the case..my only question
> is why did I also get the same kernel panic on the new poweredge 2950
> which doesn''t have intel e1000 but broadcomm drivers and nics?
Well, _could_ be part of the problem and needs to be eliminated :)
> By the way, all the systems in question have now been up for 18 hours
> and functioning fine so once we got first the rsyncing done, and
> then the squid servers all re-initialized correctly, we have been
> OK since then.
So this issue is definitely dependent on I/O, since the NIC is not
common, is the on board chipset/RAID common on both?

It would be helpful if you could send links to the output of lsmod,
lspci and the .configs for both.

Cheers,
--Tim

-- 
Monkey + Typewriter = Echoreply ( http://echoreply.us )


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Steven Timm

2008-Sep-11 19:14 UTC

head link

Re: [Xen-users] Nasty kernel panic

Thanks to the advice from all.  Some have said that I should
get new versions of e1000 and broadcom drivers for the appropriate
hardware.  It turns out that once I disabled ipv6 as best I could,
that was enough to solve the problem in question.  I rsync''ed
big squid servers and rebooted and tarred and did everything today
that I did a couple of weeks ago, and didn''t have any other
nodes unexpectedly reboot.

Steve Timm


On Thu, 28 Aug 2008, Asim wrote:
> Hi,
>
> I''m also using e1000 on my domUs. I have been keeping track of
e1000
> internal function calling sequences (as a part of my project). I do
> not see any ipv6 relaated calls because it is disabled. I would
> encourage you to double check whether ipv6 is actually disabled. I
> don''t remember exactly but it was a little difficult actually
> disabling it.
>
> Regards,
> Asim
> On 8/28/08, Steven Timm <timm@fnal.gov> wrote:
>>
>> I have seen the following kernel panic 5 times today on
>> three different machines, two of which had been stable
>> for months and one of which is a brand new install.
>>
>> We are running the x86_64 xen kernel and userland tools that came in
the
>> Xen 3.1.0
>> tarball from xen.org, on top of scientific linux (redhat clone)
>> 5.1 or 5.2.
>>
>>
>> <Aug/28 12:21 pm>Unable to handle kernel NULL pointer dereference
at
>> 00000000000
>> 000f4 RIP:
>> <Aug/28 12:21 pm> [<ffffffff88256375>]
:ipv6:rt6_select+0x38/0x1f4
>> <Aug/28 12:21 pm>PGD 70010067 PUD 715bf067 PMD 0
>> <Aug/28 12:21 pm>Oops: 0000 [1] SMP
>> <Aug/28 12:21 pm>CPU 0
>> <Aug/28 12:21 pm>Modules linked in: dell_rbu firmware_class
ipmi_devintf
>> ipmi_si
>>   ipmi_msghandler mptctl mptbase nls_utf8 nfs lockd nfs_acl xt_physdev
>> iptable_fi
>> lter ip_tables x_tables bridge ipv6 autofs4 hidp rfcomm l2cap bluetooth
>> sunrpc b
>> infmt_misc dm_multipath video thermal sbs processor i2c_ec i2c_core fan
>> containe
>> r button battery asus_acpi ac parport_pc lp parport floppy ide_cd cdrom
>> ide_flop
>> py intel_rng joydev tsdev usbkbd usbmouse piix e752x_edac edac_mc sg
e1000
>> usbhi
>> d pcspkr serio_raw siimage dm_snapshot dm_zero dm_mirror dm_mod
ide_disk
>> ata_pii
>> x libata megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd ehci_hcd
>> ohci_hcd uh
>> ci_hcd usbcore
>> <Aug/28 12:21 pm>Pid: 3075, comm: avahi-daemon Tainted: GF    
2.6.18-xen
>> #1
>> <Aug/28 12:21 pm>RIP: e030:[<ffffffff88256375>] 
[<ffffffff88256375>]
>> :ipv6:rt6_
>> select+0x38/0x1f4
>> <Aug/28 12:21 pm>RSP: e02b:ffffffff80526b00  EFLAGS: 00010286
>> <Aug/28 12:21 pm>RAX: ffff88006cbd6000 RBX: ffffffff88283580 RCX:
>> 00000000000000
>> 0d
>> <Aug/28 12:21 pm>RDX: 0000000000000001 RSI: 000000000000000d RDI:
>> ffff880070a3d4
>> e0
>> <Aug/28 12:21 pm>RBP: ffff880070a3d4c0 R08: ffffffff8824f148 R09:
>> ffffffff80526b
>> 60
>> <Aug/28 12:21 pm>R10: ffffffff88293906 R11: ffff880061730180 R12:
>> ffff880053e997
>> 80
>> <Aug/28 12:21 pm>R13: 0000000000000000 R14: 0000000000000003 R15:
>> 00000000000000
>> 01
>> <Aug/28 12:21 pm>FS:  00002b34a5da6370(0000)
GS:ffffffff804d3000(0000)
>> knlGS:000
>> 0000000000000
>> <Aug/28 12:21 pm>CS:  e033 DS: 0000 ES: 0000
>> <Aug/28 12:21 pm>Process avahi-daemon (pid: 3075, threadinfo
>> ffff88006fc8a000, t
>> ask ffff880000b0c860)
>> <Aug/28 12:21 pm>Stack:  0000000080526bb8 00000000ffffffff
>> 0000000000000000 0000
>> 000000000000
>> <Aug/28 12:21 pm> 0000000d00000001 ffff880070a3d4e0
ffffffff8824f148
>> ffffffff882
>> 83580
>> <Aug/28 12:21 pm> ffff880070a3d4c0 ffff880053e99780
0000000000000000
>> 00000000000
>> 00003
>> <Aug/28 12:21 pm>Call Trace:
>> <Aug/28 12:21 pm> <IRQ> [<ffffffff8824f148>]
:ipv6:ip6_rcv_finish+0x0/0x28
>> <Aug/28 12:21 pm> [<ffffffff882568e7>]
:ipv6:ip6_route_input+0x70/0x1cf
>> <Aug/28 12:21 pm> [<ffffffff8824f3c5>]
:ipv6:ipv6_rcv+0x255/0x2ba
>> <Aug/28 12:21 pm> [<ffffffff80395cbc>]
netif_receive_skb+0x2d3/0x2f3
>> <Aug/28 12:21 pm> [<ffffffff8828f9b4>]
:bridge:br_pass_frame_up+0x64/0x66
>> <Aug/28 12:21 pm> [<ffffffff8828fa7a>]
>> :bridge:br_handle_frame_finish+0xc4/0xf6
>> <Aug/28 12:21 pm> [<ffffffff88292e57>]
>> :bridge:br_nf_pre_routing_finish_ipv6+0xd
>> f/0xe3
>> <Aug/28 12:21 pm> [<ffffffff882935e6>]
>> :bridge:br_nf_pre_routing+0x39b/0x667
>> <Aug/28 12:21 pm> [<ffffffff803ad73c>] nf_iterate+0x52/0x79
>> <Aug/28 12:21 pm> [<ffffffff8828f9b6>]
>> :bridge:br_handle_frame_finish+0x0/0xf6
>> <Aug/28 12:21 pm> [<ffffffff803ad7d6>]
nf_hook_slow+0x73/0xea
>> <Aug/28 12:21 pm> [<ffffffff8828f9b6>]
>> :bridge:br_handle_frame_finish+0x0/0xf6
>> <Aug/28 12:21 pm> [<ffffffff8828fc43>]
:bridge:br_handle_frame+0x167/0x190
>> <Aug/28 12:21 pm> [<ffffffff80395c14>]
netif_receive_skb+0x22b/0x2f3
>> <Aug/28 12:21 pm> [<ffffffff88107a79>]
>> :e1000:e1000_clean_rx_irq+0x430/0x4d5
>> <Aug/28 12:21 pm> [<ffffffff881074ec>]
:e1000:e1000_clean+0x82/0x160
>> <Aug/28 12:21 pm> [<ffffffff80395f51>]
net_rx_action+0xe7/0x254
>> <Aug/28 12:21 pm> [<ffffffff80233d97>]
__do_softirq+0x7b/0x10d
>> <Aug/28 12:21 pm> [<ffffffff8020b094>]
call_softirq+0x1c/0x28
>> <Aug/28 12:21 pm> [<ffffffff8020cdfd>] do_softirq+0x62/0xd9
>> <Aug/28 12:21 pm> [<ffffffff8020cc9c>] do_IRQ+0x68/0x71
>> <Aug/28 12:21 pm> [<ffffffff8034b347>]
evtchn_do_upcall+0xee/0x165
>> <Aug/28 12:21 pm> [<ffffffff8020abca>]
do_hypervisor_callback+0x1e/0x2c
>> <Aug/28 12:21 pm> <EOI>
>>
>> <Aug/28 12:21 pm>Code: 41 8b 85 f4 00 00 00 4d 85 ed 4d 89 ec 89
44 24 0c
>> 0f 84
>> 36
>> <Aug/28 12:21 pm>RIP  [<ffffffff88256375>]
:ipv6:rt6_select+0x38/0x1f4
>> <Aug/28 12:21 pm> RSP <ffffffff80526b00>
>> <Aug/28 12:21 pm>CR2: 00000000000000f4
>> <Aug/28 12:21 pm> <0>Kernel panic - not syncing: Aiee,
killing interrupt
>> handler
>>
>>
>> ------------------------------------------------
>>
>> There are different processes pid''s that show as the
triggering process
>> but the base error is the same.  A couple times it is triggered by the
>> swapper.
>>
>> What is puzzling is the references to ipv6 which I was pretty sure I
>> have disabled everywhere.  To be clear these crashes
>> are from the dom0, and when it happens the dom0 hangs and does
>> not auto-reboot, it requires a reset.
>>
>> Any ideas?  This config has been pretty stable for us on 7
>> different machines including these ones.  A couple of times it happened
>> just about the time we were shutting down a xen domU, a couple
>> other times today it happened on a machine that I wasn''t even
working on.
>>
>> Steve Timm
>>
>>
>>
>> --
>> ------------------------------------------------------------------
>> Steven C. Timm, Ph.D  (630) 840-8525
>> timm@fnal.gov  http://home.fnal.gov/~timm/
>> Fermilab Computing Division, Scientific Computing Facilities,
>> Grid Facilities Department, FermiGrid Services Group, Assistant Group
>> Leader.
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xensource.com
>> http://lists.xensource.com/xen-users
>>
>
-- 
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Apparently Analagous Threads

Search for more reasonably related threads

Xen users - Aug 2008 - Nasty kernel panic

[Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Re: [Xen-users] Nasty kernel panic

Apparently Analagous Threads