thr3ads.net - Linux Ethernet Bridging - [Bridge] Strange problem, please help [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Ryan McConigley

2007-Apr-18 12:36 UTC

[Bridge] Strange problem, please help

At 09:25 AM 30/05/2005 +0200, Jaime Nebrera wrote:>   Hi all,
         <snip>
>   So, some questions:
>
>   1) Is this related to running as a bridge? Would this problem
>disappear if we used a pseudo bridge (proxy ARP)?
>
>   2) Can such a beast sustain 8 ethernets as a single bridge? Bear in
>mind they dont have gigabit traffic, they just use gigabit ethernets :)
>Whats the limit for a linux bridge? Would be better to break it into two
>bridges?
         Just my $0.02 worth, no solutions I'm afraid, just an 
observation.  The behavour you describe is virtually identical to the 
behavour I had on the first bridge I constructed which was using tulip 
network cards.  The system would work wonderfully in test, but put it in 
situ on the network it would last a few minutes, then lock up with the CPU 
maxed out.  We ended up changing the tulip cards to Intels which worked 
perfectly.

         The weird thing was on their own, the tulip cards worked fine, but 
couldn't handle a bridge config.  At the time folks suggested that it was a 
combined interrupt/timing/buffering problem, but I didn't have the skills 
or time to track it down.  From what you've said about the problem going 
away when the other network ports are disabled, I wouldn't mind betting its 
a related issue.  8 Gigabit ports would be a substantial number of 
interrupts, so I wouldn't be surprised if you're starting to max out the
PCI bus, but I don't have any hard numbers to test that theory.

         Cheers,
                 Ryan.
--
           Ryan McConigley - Systems Administrator                  _.-,
      Computer Science   University of Western Australia        .--' 
'-._
        Tel: (+61 8) 6488 7082 - Fax: (+61 8) 6488 1089       _/`-  _     
'.
Ryan[@]csse.uwa.edu.au - http://www.csse.uwa.edu.au/~ryan 
'----'._`.----. \
                                                                      `     \;
  "You're just jealous because the voices are talking to me"      
;_\

Jaime Nebrera

2007-Apr-18 12:36 UTC

head link

[Bridge] Strange problem, please help

Hi all,

  We are experiencing a very strange problem and would need some help.
We have a Leaf based box (actually a Lince box kernel 2.4.26) running as
a bridge with 8 gigabit ethernets, PIV 3Ghz, 2GB RAM. 4 of them share
the same PCI Express and the other 4 a different PCI bus. We have NAPI
enabled on all ethernets and IRQ moderation enabled (dynamic)

  Some ASCII art before proceeding.

     Router 1               Router 2
        |                       |
        --------- Switch --------
                     |
                     |
                  Firewall

 
   WAN  LAN Empty Empty Empty Empty Empty Empty
    |    |     |     |     |     |     |     |
   eth0 eth1 eth2  eth3  eth4   eth5  eth6  eth7
    -----------------      -------------------
          PCI-X                     PCI

  Both routers use HSRP from Cisco to share information about who is
alive. This app uses multicast UDP packets to 224.0.0.1 address, port
1985.

  The problem is, after a while (1 or 2 minutes) the CPU reaches 100%
(0.99 load 99% System) with the process ksoftirqd_CPU0 reaching 99%.
Using iptraf we discover ethernets 4 to 7 (the ones that share the PCI
bus) are at full speed. The traffic is on port 1985 and comes from the 2
virtual IP from the redundant routers. It seems they enter an infinite
loop and completely kill the system. BTW, the only used ethernets are 0
and 1, both on the PCI-X bus, and eth2 and eth3 seem unaffected (no
traffic). Bear in mind, real traffic on eth0 and eth1 doesnt surpass
1Mbps. Also, no service is provided at this point, not even firewalling.

  The problem appears with and without STP activated and we have
verified there is not a loop in the network.

  If we disable ethernets from 4 to 7 (ip link set ethx down) the
problem seems to disappear, but we are not sure as we didnt want to
disturb the client more time (actually, for 15 minutes the problem didnt
appear, while the other way it appeared in much less than 5 minutes). In
this case, even activating things like a Netflow probe in eth0 didnt
disturb at all the system.

  The same problem seems to appear with a Via 1Ghz box with 4 realtek
ethernets and around 4Mbps of traffic (this system was placed under
heavier load, and as the problem appeared, we tested with the big box 
the same afternoon). When the problem appeared this box was so slow 
we could not even make a ssh session so we dont know if this is the 
same problem (but bet it is).

  So, some questions:

  1) Is this related to running as a bridge? Would this problem
disappear if we used a pseudo bridge (proxy ARP)?

  2) Can such a beast sustain 8 ethernets as a single bridge? Bear in
mind they dont have gigabit traffic, they just use gigabit ethernets :)
Whats the limit for a linux bridge? Would be better to break it into two
bridges?

  3) As this traffic is only needed on both routers but doesnt need to
pass trough the firewall, will dropping it on eth0 solve the problem?
(That way there is no way the packets enter into other ethernet ports)
What would happen with other multicast based apps? Would they need to be
dropped too?

  Very thankful in advance. Regards.

-- 
Jaime Nebrera - jnebrera@eneotecnologia.com
Consultor TI - ENEO Tecnologia SL
Telf.- 95 455 40 62 - 619 04 55 18

Linux Ethernet Bridging - Apr 2007 - [Bridge] Strange problem, please help

[Bridge] Strange problem, please help

[Bridge] Strange problem, please help