Hi! I''ve 5 DSL routers and a linux router in a switch. The linux router is doing NAT for a LAN network, but it freezes when I set up load balancing with this command: ip route add default proto static\ nexthop via 192.168.1.10 dev eth1\ nexthop via 192.168.1.20 dev eth1\ nexthop via 192.168.1.30 dev eth1\ nexthop via 192.168.1.40 dev eth1\ nexthop via 192.168.1.50 dev eth1 192.168.1.{10,20,30,40,50} are the DSL routers'' IPs. Linux router''s kernel is 2.6.15.6 patched with Julian Anastasov''s patches (http://www.ssi.bg/~ja/#routes). I''ve also tried with other kernel versions. Kernel panic follows: CPU: 0 EIP: 0060:[<00000000>] Not tainted VLI EFLAGS: 00010202 (2.6.15.6) EIP is at _stext+0x3feffde0/0x1e eax: d7cd4b00 ebx: c0379280 ecx: 000018ff edx: c0379280 esi: 00000000 edi: 00000001 ebp: 003b6007 esp: c0341f5c ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c0340000 task=c02ebb00) Stack: c011fb0e d7cd4b00 00000000 c03776e8 0000000a c011fcee c0379280 c03792bc c0379280 c02f174c c03792b8 c0379240 c0116d7f 00000000 00000001 c0116b6c c03776e8 00000046 0009f300 c0334800 c0116bd7 0002080b c0104296 c0102e6a Call trace: [<c011fb0e>] rcu_do_batch+0x1a/0x59 [<c011fcee>] rcu_process_callbacks+0x28/0x2c [<c0116d7f>] tasklet_action+0x3a/0x59 [<c0116b6c>] __do_softirq+0x34/0x7d [<c0116bd7>] do_softirq+0x22/0x26 [<c0104296>] do_IRQ+0x1e/0x24 [<c0102e6a>] common_interrupt+0x1a/0x20 [<c0100a77>] default_idl´+0x2b/0x53 [<c0100af3>] cpu_idle+0x40/0x5c [<c0342691>] start_kernel+0x13e/0x140 Code: Bad EIP value. <0> Kernel panic - not syncing:Fatal exception in interrupt Any ideas? Thank you very much! Edu
Hi there, I solved the problem by switching back to a 2.4 kernel. Don''t know what''s wrong with 2.6. Greets, Edu On 3/8/06, Eduardo Fernández <efgonzalez@gmail.com> wrote:> Hi! > > I''ve 5 DSL routers and a linux router in a switch. The linux router is > doing NAT for a LAN network, but it freezes when I set up load > balancing with this command: > > ip route add default proto static\ > nexthop via 192.168.1.10 dev eth1\ > nexthop via 192.168.1.20 dev eth1\ > nexthop via 192.168.1.30 dev eth1\ > nexthop via 192.168.1.40 dev eth1\ > nexthop via 192.168.1.50 dev eth1 > > 192.168.1.{10,20,30,40,50} are the DSL routers'' IPs. Linux router''s > kernel is 2.6.15.6 patched with Julian Anastasov''s patches > (http://www.ssi.bg/~ja/#routes). I''ve also tried with other kernel > versions. Kernel panic follows: > > CPU: 0 > EIP: 0060:[<00000000>] Not tainted VLI > EFLAGS: 00010202 (2.6.15.6) > EIP is at _stext+0x3feffde0/0x1e > eax: d7cd4b00 ebx: c0379280 ecx: 000018ff edx: c0379280 > esi: 00000000 edi: 00000001 ebp: 003b6007 esp: c0341f5c > ds: 007b es: 007b ss: 0068 > Process swapper (pid: 0, threadinfo=c0340000 task=c02ebb00) > Stack: > c011fb0e d7cd4b00 00000000 c03776e8 0000000a c011fcee c0379280 c03792bc > c0379280 c02f174c c03792b8 c0379240 c0116d7f 00000000 00000001 c0116b6c > c03776e8 00000046 0009f300 c0334800 c0116bd7 0002080b c0104296 c0102e6a > Call trace: > [<c011fb0e>] rcu_do_batch+0x1a/0x59 > [<c011fcee>] rcu_process_callbacks+0x28/0x2c > [<c0116d7f>] tasklet_action+0x3a/0x59 > [<c0116b6c>] __do_softirq+0x34/0x7d > [<c0116bd7>] do_softirq+0x22/0x26 > [<c0104296>] do_IRQ+0x1e/0x24 > [<c0102e6a>] common_interrupt+0x1a/0x20 > [<c0100a77>] default_idl´+0x2b/0x53 > [<c0100af3>] cpu_idle+0x40/0x5c > [<c0342691>] start_kernel+0x13e/0x140 > Code: Bad EIP value. > <0> Kernel panic - not syncing:Fatal exception in interrupt > > Any ideas? Thank you very much! > > Edu >
On Mi, 2006-03-08 at 06:24 +0100, Eduardo Fernández wrote:> <0> Kernel panic - not syncing:Fatal exception in interrupt > > Any ideas? Thank you very much!Is it a SIS chipset and an Intel cpu? Reading some kernel sources, I finally (after *months* of trouble) found the problem I had with a customer. I used a router with a quad-port 100 MBit NIC (for three DSL modems) and three Gigabit NICs: one onboard, two cheap Realtek 8169s (1: WLAN, 2: about a dozen clients, 3: two servers, one printer). The driver generated one interrupt for each packet (not like the NAPI drivers, e.g. e1000, which keep the interrupt count low). As the interrupt controller did not handle every interrupt in time when the network was saturated (some chipsets, especially SIS, seem to just leave out the handling of a handful of interrupts under these circumstances), the whole system froze with exactly the same message you quote. Every thursday at about 17:00, everyone screamed because the net broke. *** report of my frustrating experience with a computer illiterate employee follows, you may skip the next paragraph (it only explains why catastrophe always struck thursday at tea-time) *** I only found the bug after waiting a whole thursday afternoon, observing every user, getting more and more nervous and for once hoping that the shit *would* hit the fan that day. Reconstructing the last steps of an accountant, which was the only employee leaving at exactly the time of the crash, I finally saw the light. Obviously, she did a backup of the accounting database over the LAN, and everytime when she started it, the net crashed immediately (after which she switched off her PC and went home, sent off by the screams of her colleagues, who lost quite a lot of work everytime the database server was suddenly unreachable). Of course she never told anyone that the net always crashed exactly at the time when she started her backup. And of course she never got the idea she could maybe once *not* start her backup and look if the net would crash at 17:00 anyway, or if it might be her backup messing things up. I was quite frustrated that anyone could be so stupid, week after week trying a backup which never succeeded. She was lucky she never needed to restore the data in all those months. The correct solution is to exchange the mainboard, because the chipset is crap. My solution was to exchange the NICs, because it was cheaper and faster in this case. (Of course everyone thought it was my fault then, because I had originally bought the cheap NICs. I am not sure they understood my explanation that it was the chipset of the client PC they gave me for refitting as router/firewall/web proxy/name-, dhcp, vpn and everythingelseunderthesun-server which was really b0rken. I learned to request real server hardware for jobs like this one in the future.) I replaced the two Realtek gigabit NICs with Intel Pro/1000 GT/MT (desktop!) adapters, (e1000 driver, I believe they used Intel''s 82542 chipset, and I bought them for 49 € each - not as cheap as 12,97 € like the crappy Realteks, but not as expensive as the "server" adapters, which they sell for more than 120 €). This immediately solved the problem for me. I hope this helps you.
Hi! Thanks for your explanation, it seems my hardware is also buggy, lspci shows: 0000:00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32) 0000:00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge 0000:00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02) 0000:00:04.0 PCI bridge: Intel Corp. 21152 PCI-to-PCI Bridge 0000:00:06.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 0c) 0000:00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 0000:00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93) 0000:00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) 0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05) 0000:00:0f.3 ISA bridge: ServerWorks CSB5 LPC bridge 0000:00:10.0 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05) 0000:00:10.2 Host bridge: ServerWorks CIOB-X2 PCI-X I/O Bridge (rev 05) 0000:01:04.0 Ethernet controller: D-Link System Inc DL10050 Sundance Ethernet (rev 15) 0000:01:05.0 Ethernet controller: D-Link System Inc DL10050 Sundance Ethernet (rev 15) 0000:01:06.0 Ethernet controller: D-Link System Inc DL10050 Sundance Ethernet (rev 15) 0000:01:07.0 Ethernet controller: D-Link System Inc DL10050 Sundance Ethernet (rev 15) 0000:02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) As you can see I tried to buy real server hardware, this is a Dell server and the NICs are somewhat expensive. I''ve tried with other NICs, and I get the same kernel panic, the only solution I''ve found is to switch back to 2.4.29, but I''m missing features from 2.6, like the hashlimit module in netfilter. Greets, Edu On 3/23/06, Sebastian Bork <sebi@sebi.org> wrote:> On Mi, 2006-03-08 at 06:24 +0100, Eduardo Fernández wrote: > > <0> Kernel panic - not syncing:Fatal exception in interrupt > > > > Any ideas? Thank you very much! > > Is it a SIS chipset and an Intel cpu? Reading some kernel sources, I > finally (after *months* of trouble) found the problem I had with a > customer. I used a router with a quad-port 100 MBit NIC (for three DSL > modems) and three Gigabit NICs: one onboard, two cheap Realtek 8169s (1: > WLAN, 2: about a dozen clients, 3: two servers, one printer). > > The driver generated one interrupt for each packet (not like the NAPI > drivers, e.g. e1000, which keep the interrupt count low). As the > interrupt controller did not handle every interrupt in time when the > network was saturated (some chipsets, especially SIS, seem to just leave > out the handling of a handful of interrupts under these circumstances), > the whole system froze with exactly the same message you quote. Every > thursday at about 17:00, everyone screamed because the net broke. > > *** report of my frustrating experience with a computer illiterate > employee follows, you may skip the next paragraph (it only explains why > catastrophe always struck thursday at tea-time) *** > > I only found the bug after waiting a whole thursday afternoon, observing > every user, getting more and more nervous and for once hoping that the > shit *would* hit the fan that day. Reconstructing the last steps of an > accountant, which was the only employee leaving at exactly the time of > the crash, I finally saw the light. Obviously, she did a backup of the > accounting database over the LAN, and everytime when she started it, the > net crashed immediately (after which she switched off her PC and went > home, sent off by the screams of her colleagues, who lost quite a lot of > work everytime the database server was suddenly unreachable). Of course > she never told anyone that the net always crashed exactly at the time > when she started her backup. And of course she never got the idea she > could maybe once *not* start her backup and look if the net would crash > at 17:00 anyway, or if it might be her backup messing things up. I was > quite frustrated that anyone could be so stupid, week after week trying > a backup which never succeeded. She was lucky she never needed to > restore the data in all those months. > > The correct solution is to exchange the mainboard, because the chipset > is crap. My solution was to exchange the NICs, because it was cheaper > and faster in this case. (Of course everyone thought it was my fault > then, because I had originally bought the cheap NICs. I am not sure they > understood my explanation that it was the chipset of the client PC they > gave me for refitting as router/firewall/web proxy/name-, dhcp, vpn and > everythingelseunderthesun-server which was really b0rken. I learned to > request real server hardware for jobs like this one in the future.) > > I replaced the two Realtek gigabit NICs with Intel Pro/1000 GT/MT > (desktop!) adapters, (e1000 driver, I believe they used Intel''s 82542 > chipset, and I bought them for 49 € each - not as cheap as 12,97 € like > the crappy Realteks, but not as expensive as the "server" adapters, > which they sell for more than 120 €). > > This immediately solved the problem for me. I hope this helps you. >