Stephen, Have you looked into this? Thank you On Sat, 27 Sep 2003, Santiago Garcia Mantinan wrote:> Hi! > > Since the change to 2.4.22 I've been experimenting problems here, after > many tests I have seen what I think is the problem that is causing this. > > The problem I'm seing is the loopback starts loosing packages, I don't know > if this could also happen on other interfaces. I'm testing this by starting > a: > tcpdump -n -i lo port > then a: > nc -n -l port >/dev/null > and a: > nc localhost port </dev/zero > > If everything is fine my cpu goes to 100% and I see the packages all the way > in my tcpdump screen, great. But there are sometimes when this doesn't go > smooth and the tcpdump starts to show only one or two packages each N > seconds, till it ends up showing the resend of the last package which is > never acknowledged, you can even see that the timings of this packages that > are being repeated match those of tcp backoff, my cpu charge is then really > really low, nc disconnects after a while, ... > > When does this happen? > > It took me a while to find this out, but it happens when you have a bridge > interface and one of the ports of the bridge is told to drop packages, like > when they detect a loop in the net and an interface is set to a blocking > state. > > Of course that the loopback is not a part of any bridge in any of my setups, > and I've seen this in a couple of machines, one SMP and the other one single > micro, 2.4.21 worked ok, at least I could not reproduce this on that one. If > the interfaces have been in a forwarding state all the time since the bridge > was setup, without being in a blocking state, then this problem does not > seem to happen. > > I believe that the changes the bridge went through from 2.4.21 to 2.4.22 are > to blame on this one, but this is just a guess. > > Hope we can find a fix for this so that it is integrated in 2.4.23 kernel, > I'll be happy to make any tests you want to track this farther down. > > Regards... >
On Sat, 27 Sep 2003 22:22:01 +0200 Santiago Garcia Mantinan <manty@manty.net> wrote:> Hi! > > Since the change to 2.4.22 I've been experimenting problems here, after > many tests I have seen what I think is the problem that is causing this. > > The problem I'm seing is the loopback starts loosing packages, I don't know > if this could also happen on other interfaces. I'm testing this by starting > a: > tcpdump -n -i lo port > then a: > nc -n -l port >/dev/null > and a: > nc localhost port </dev/zero > > If everything is fine my cpu goes to 100% and I see the packages all the way > in my tcpdump screen, great. But there are sometimes when this doesn't go > smooth and the tcpdump starts to show only one or two packages each N > seconds, till it ends up showing the resend of the last package which is > never acknowledged, you can even see that the timings of this packages that > are being repeated match those of tcp backoff, my cpu charge is then really > really low, nc disconnects after a while, ... > > When does this happen? > > It took me a while to find this out, but it happens when you have a bridge > interface and one of the ports of the bridge is told to drop packages, like > when they detect a loop in the net and an interface is set to a blocking > state. > > Of course that the loopback is not a part of any bridge in any of my setups, > and I've seen this in a couple of machines, one SMP and the other one single > micro, 2.4.21 worked ok, at least I could not reproduce this on that one. If > the interfaces have been in a forwarding state all the time since the bridge > was setup, without being in a blocking state, then this problem does not > seem to happen. > > I believe that the changes the bridge went through from 2.4.21 to 2.4.22 are > to blame on this one, but this is just a guess. > > Hope we can find a fix for this so that it is integrated in 2.4.23 kernel, > I'll be happy to make any tests you want to track this farther down.What kind of hardware do you have? what are the ethernet's you are trying to bridge? There haven't been a many changes at all to the bridging code, and you could try building the 2.4.21 bridge code into a 2.4.22 kernel. When cpu goes 100% could you get a backtrace (with sysrq-t)?
At 22:22 Uhr +0200 27.09.2003, Santiago Garcia Mantinan wrote:>Hi! > >Since the change to 2.4.22 I've been experimenting problems here, after >many tests I have seen what I think is the problem that is causing this. >I use 2.4.23-pre3 + 2 bridging related patches: shemminger 1.1109.2.4 [BRIDGE]: Clear hw checksum flags when bridging. karlis 1.1063.40.5 [BRIDGE]: kfree --> kfree_skb. [both of them are in 2.4.23-pre4] I was able to reproduce this only once: I did a ^Z on the bash of the listening 'nc': uml:~# ps -o pid,stat,cmd,wchan=WIDE-WCHAN-COLUMN 2179 2188 PID STAT CMD WIDE-WCHAN-COLUMN 2179 T nc -n -l -p 8888 signal 2188 S nc localhost 888 wait_for_tcp_memory It didn't continue when I sent a SIGCONT but stayed in 'T'-state. Then I did a 'fg': uml:~# ps -o pid,stat,cmd,wchan=WIDE-WCHAN-COLUMN 2179 2188 PID STAT CMD WIDE-WCHAN-COLUMN 2179 S nc -n -l -p 8888 read_chan 2188 S nc localhost 888 wait_for_tcp_memory I thought that 'read_chan' is stange an hit Return. Then everything resumed back to normal (i.e. tcpdump shows lots of packets). I have 3 bridges active and one of them has blocked port, but I don't think this is related to the bridging code. Perhaps you could try again with 2.4.23-pre[456]. Hannes
Hi! Since the change to 2.4.22 I've been experimenting problems here, after many tests I have seen what I think is the problem that is causing this. The problem I'm seing is the loopback starts loosing packages, I don't know if this could also happen on other interfaces. I'm testing this by starting a: tcpdump -n -i lo port then a: nc -n -l port >/dev/null and a: nc localhost port </dev/zero If everything is fine my cpu goes to 100% and I see the packages all the way in my tcpdump screen, great. But there are sometimes when this doesn't go smooth and the tcpdump starts to show only one or two packages each N seconds, till it ends up showing the resend of the last package which is never acknowledged, you can even see that the timings of this packages that are being repeated match those of tcp backoff, my cpu charge is then really really low, nc disconnects after a while, ... When does this happen? It took me a while to find this out, but it happens when you have a bridge interface and one of the ports of the bridge is told to drop packages, like when they detect a loop in the net and an interface is set to a blocking state. Of course that the loopback is not a part of any bridge in any of my setups, and I've seen this in a couple of machines, one SMP and the other one single micro, 2.4.21 worked ok, at least I could not reproduce this on that one. If the interfaces have been in a forwarding state all the time since the bridge was setup, without being in a blocking state, then this problem does not seem to happen. I believe that the changes the bridge went through from 2.4.21 to 2.4.22 are to blame on this one, but this is just a guess. Hope we can find a fix for this so that it is integrated in 2.4.23 kernel, I'll be happy to make any tests you want to track this farther down. Regards... -- Manty/BestiaTester -> http://manty.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/