thr3ads.net - LARTC - Kernel panic with load balancing [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Eduardo Fernández

2006-Mar-08 05:24 UTC

Kernel panic with load balancing

Hi!

I''ve 5 DSL routers and a linux router in a switch. The linux router is
doing NAT for a LAN network, but it freezes when I set up load
balancing with this command:

ip route add default proto static\
        nexthop via 192.168.1.10 dev eth1\
        nexthop via 192.168.1.20 dev eth1\
        nexthop via 192.168.1.30 dev eth1\
        nexthop via 192.168.1.40 dev eth1\
        nexthop via 192.168.1.50 dev eth1

192.168.1.{10,20,30,40,50} are the DSL routers'' IPs. Linux
router''s
kernel is 2.6.15.6 patched with Julian Anastasov''s patches
(http://www.ssi.bg/~ja/#routes). I''ve also tried with other kernel
versions. Kernel panic follows:

CPU: 0
EIP: 0060:[<00000000>] Not tainted VLI
EFLAGS: 00010202       (2.6.15.6)
EIP is at _stext+0x3feffde0/0x1e
eax: d7cd4b00     ebx: c0379280    ecx: 000018ff    edx: c0379280
esi:  00000000     edi:  00000001   ebp:  003b6007 esp: c0341f5c
ds: 007b   es: 007b  ss: 0068
Process swapper (pid: 0, threadinfo=c0340000 task=c02ebb00)
Stack:
c011fb0e d7cd4b00 00000000 c03776e8 0000000a c011fcee c0379280 c03792bc
c0379280 c02f174c c03792b8 c0379240 c0116d7f  00000000 00000001 c0116b6c
c03776e8 00000046 0009f300 c0334800 c0116bd7 0002080b c0104296 c0102e6a
Call trace:
[<c011fb0e>] rcu_do_batch+0x1a/0x59
[<c011fcee>] rcu_process_callbacks+0x28/0x2c
[<c0116d7f>] tasklet_action+0x3a/0x59
[<c0116b6c>] __do_softirq+0x34/0x7d
[<c0116bd7>] do_softirq+0x22/0x26
[<c0104296>] do_IRQ+0x1e/0x24
[<c0102e6a>] common_interrupt+0x1a/0x20
[<c0100a77>] default_idl´+0x2b/0x53
[<c0100af3>] cpu_idle+0x40/0x5c
[<c0342691>] start_kernel+0x13e/0x140
Code: Bad EIP value.
<0> Kernel panic - not syncing:Fatal exception in interrupt

Any ideas? Thank you very much!

Edu

Eduardo Fernández

2006-Mar-10 01:50 UTC

head link

Re: Kernel panic with load balancing

Hi there,

I solved the problem by switching back to a 2.4 kernel. Don''t know
what''s wrong with 2.6.

Greets,

Edu


On 3/8/06, Eduardo Fernández <efgonzalez@gmail.com>
wrote:> Hi!
>
> I''ve 5 DSL routers and a linux router in a switch. The linux
router is
> doing NAT for a LAN network, but it freezes when I set up load
> balancing with this command:
>
> ip route add default proto static\
>         nexthop via 192.168.1.10 dev eth1\
>         nexthop via 192.168.1.20 dev eth1\
>         nexthop via 192.168.1.30 dev eth1\
>         nexthop via 192.168.1.40 dev eth1\
>         nexthop via 192.168.1.50 dev eth1
>
> 192.168.1.{10,20,30,40,50} are the DSL routers'' IPs. Linux
router''s
> kernel is 2.6.15.6 patched with Julian Anastasov''s patches
> (http://www.ssi.bg/~ja/#routes). I''ve also tried with other kernel
> versions. Kernel panic follows:
>
> CPU: 0
> EIP: 0060:[<00000000>] Not tainted VLI
> EFLAGS: 00010202       (2.6.15.6)
> EIP is at _stext+0x3feffde0/0x1e
> eax: d7cd4b00     ebx: c0379280    ecx: 000018ff    edx: c0379280
> esi:  00000000     edi:  00000001   ebp:  003b6007 esp: c0341f5c
> ds: 007b   es: 007b  ss: 0068
> Process swapper (pid: 0, threadinfo=c0340000 task=c02ebb00)
> Stack:
> c011fb0e d7cd4b00 00000000 c03776e8 0000000a c011fcee c0379280 c03792bc
> c0379280 c02f174c c03792b8 c0379240 c0116d7f  00000000 00000001 c0116b6c
> c03776e8 00000046 0009f300 c0334800 c0116bd7 0002080b c0104296 c0102e6a
> Call trace:
> [<c011fb0e>] rcu_do_batch+0x1a/0x59
> [<c011fcee>] rcu_process_callbacks+0x28/0x2c
> [<c0116d7f>] tasklet_action+0x3a/0x59
> [<c0116b6c>] __do_softirq+0x34/0x7d
> [<c0116bd7>] do_softirq+0x22/0x26
> [<c0104296>] do_IRQ+0x1e/0x24
> [<c0102e6a>] common_interrupt+0x1a/0x20
> [<c0100a77>] default_idl´+0x2b/0x53
> [<c0100af3>] cpu_idle+0x40/0x5c
> [<c0342691>] start_kernel+0x13e/0x140
> Code: Bad EIP value.
> <0> Kernel panic - not syncing:Fatal exception in interrupt
>
> Any ideas? Thank you very much!
>
> Edu
>

Sebastian Bork

2006-Mar-23 13:07 UTC

head link

Re: Kernel panic with load balancing

On Mi, 2006-03-08 at 06:24 +0100, Eduardo Fernández
wrote:> <0> Kernel panic - not syncing:Fatal exception in interrupt
> 
> Any ideas? Thank you very much!
Is it a SIS chipset and an Intel cpu? Reading some kernel sources, I
finally (after *months* of trouble) found the problem I had with a
customer. I used a router with a quad-port 100 MBit NIC (for three DSL
modems) and three Gigabit NICs: one onboard, two cheap Realtek 8169s (1:
WLAN, 2: about a dozen clients, 3: two servers, one printer).

The driver generated one interrupt for each packet (not like the NAPI
drivers, e.g. e1000, which keep the interrupt count low). As the
interrupt controller did not handle every interrupt in time when the
network was saturated (some chipsets, especially SIS, seem to just leave
out the handling of a handful of interrupts under these circumstances),
the whole system froze with exactly the same message you quote. Every
thursday at about 17:00, everyone screamed because the net broke.

*** report of my frustrating experience with a computer illiterate
employee follows, you may skip the next paragraph (it only explains why
catastrophe always struck thursday at tea-time) ***

I only found the bug after waiting a whole thursday afternoon, observing
every user, getting more and more nervous and for once hoping that the
shit *would* hit the fan that day. Reconstructing the last steps of an
accountant, which was the only employee leaving at exactly the time of
the crash, I finally saw the light. Obviously, she did a backup of the
accounting database over the LAN, and everytime when she started it, the
net crashed immediately (after which she switched off her PC and went
home, sent off by the screams of her colleagues, who lost quite a lot of
work everytime the database server was suddenly unreachable). Of course
she never told anyone that the net always crashed exactly at the time
when she started her backup. And of course she never got the idea she
could maybe once *not* start her backup and look if the net would crash
at 17:00 anyway, or if it might be her backup messing things up. I was
quite frustrated that anyone could be so stupid, week after week trying
a backup which never succeeded. She was lucky she never needed to
restore the data in all those months.

The correct solution is to exchange the mainboard, because the chipset
is crap. My solution was to exchange the NICs, because it was cheaper
and faster in this case. (Of course everyone thought it was my fault
then, because I had originally bought the cheap NICs. I am not sure they
understood my explanation that it was the chipset of the client PC they
gave me for refitting as router/firewall/web proxy/name-, dhcp, vpn and
everythingelseunderthesun-server which was really b0rken. I learned to
request real server hardware for jobs like this one in the future.)

I replaced the two Realtek gigabit NICs with Intel Pro/1000 GT/MT
(desktop!) adapters, (e1000 driver, I believe they used Intel''s 82542
chipset, and I bought them for 49 € each - not as cheap as 12,97 € like
the crappy Realteks, but not as expensive as the "server" adapters,
which they sell for more than 120 €).

This immediately solved the problem for me. I hope this helps you.

Eduardo Fernández

2006-Mar-24 09:02 UTC

head link

Re: Kernel panic with load balancing

Hi!

Thanks for your explanation, it seems my hardware is also buggy, lspci shows:

0000:00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32)
0000:00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge
0000:00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet
Controller (rev 02)
0000:00:04.0 PCI bridge: Intel Corp. 21152 PCI-to-PCI Bridge
0000:00:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro
100] (rev 0c)
0000:00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
0000:00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93)
0000:00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05)
0000:00:0f.3 ISA bridge: ServerWorks CSB5 LPC bridge
0000:00:10.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
0000:00:10.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05)
0000:01:04.0 Ethernet controller: D-Link System Inc DL10050 Sundance
Ethernet (rev 15)
0000:01:05.0 Ethernet controller: D-Link System Inc DL10050 Sundance
Ethernet (rev 15)
0000:01:06.0 Ethernet controller: D-Link System Inc DL10050 Sundance
Ethernet (rev 15)
0000:01:07.0 Ethernet controller: D-Link System Inc DL10050 Sundance
Ethernet (rev 15)
0000:02:04.0 SCSI storage controller: LSI Logic / Symbios Logic
53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07)

As you can see I tried to buy real server hardware, this is a Dell
server and the NICs are somewhat expensive. I''ve tried with other
NICs, and I get the same kernel panic, the only solution I''ve found is
to switch back to 2.4.29, but I''m missing features from 2.6, like the
hashlimit module in netfilter.

Greets,

Edu

On 3/23/06, Sebastian Bork <sebi@sebi.org> wrote:> On Mi, 2006-03-08 at 06:24 +0100, Eduardo Fernández wrote:
> > <0> Kernel panic - not syncing:Fatal exception in interrupt
> >
> > Any ideas? Thank you very much!
>
> Is it a SIS chipset and an Intel cpu? Reading some kernel sources, I
> finally (after *months* of trouble) found the problem I had with a
> customer. I used a router with a quad-port 100 MBit NIC (for three DSL
> modems) and three Gigabit NICs: one onboard, two cheap Realtek 8169s (1:
> WLAN, 2: about a dozen clients, 3: two servers, one printer).
>
> The driver generated one interrupt for each packet (not like the NAPI
> drivers, e.g. e1000, which keep the interrupt count low). As the
> interrupt controller did not handle every interrupt in time when the
> network was saturated (some chipsets, especially SIS, seem to just leave
> out the handling of a handful of interrupts under these circumstances),
> the whole system froze with exactly the same message you quote. Every
> thursday at about 17:00, everyone screamed because the net broke.
>
> *** report of my frustrating experience with a computer illiterate
> employee follows, you may skip the next paragraph (it only explains why
> catastrophe always struck thursday at tea-time) ***
>
> I only found the bug after waiting a whole thursday afternoon, observing
> every user, getting more and more nervous and for once hoping that the
> shit *would* hit the fan that day. Reconstructing the last steps of an
> accountant, which was the only employee leaving at exactly the time of
> the crash, I finally saw the light. Obviously, she did a backup of the
> accounting database over the LAN, and everytime when she started it, the
> net crashed immediately (after which she switched off her PC and went
> home, sent off by the screams of her colleagues, who lost quite a lot of
> work everytime the database server was suddenly unreachable). Of course
> she never told anyone that the net always crashed exactly at the time
> when she started her backup. And of course she never got the idea she
> could maybe once *not* start her backup and look if the net would crash
> at 17:00 anyway, or if it might be her backup messing things up. I was
> quite frustrated that anyone could be so stupid, week after week trying
> a backup which never succeeded. She was lucky she never needed to
> restore the data in all those months.
>
> The correct solution is to exchange the mainboard, because the chipset
> is crap. My solution was to exchange the NICs, because it was cheaper
> and faster in this case. (Of course everyone thought it was my fault
> then, because I had originally bought the cheap NICs. I am not sure they
> understood my explanation that it was the chipset of the client PC they
> gave me for refitting as router/firewall/web proxy/name-, dhcp, vpn and
> everythingelseunderthesun-server which was really b0rken. I learned to
> request real server hardware for jobs like this one in the future.)
>
> I replaced the two Realtek gigabit NICs with Intel Pro/1000 GT/MT
> (desktop!) adapters, (e1000 driver, I believe they used Intel''s
82542
> chipset, and I bought them for 49 € each - not as cheap as 12,97 € like
> the crappy Realteks, but not as expensive as the "server"
adapters,
> which they sell for more than 120 €).
>
> This immediately solved the problem for me. I hope this helps you.
>

Seemingly Similar Threads

Search for more maybe matching threads

LARTC - Mar 2006 - Kernel panic with load balancing

Kernel panic with load balancing

Re: Kernel panic with load balancing

Re: Kernel panic with load balancing

Re: Kernel panic with load balancing

Seemingly Similar Threads