Ruslan Ermilov wrote:> Greetings, > > Those of you wishing to try your xl(4) card under > polling(4) are welcome to test this patch: >Ruslan, Yesterday I discovered that polling of the xl interface randomly disrupts an ssh-tunnel of mine. I think there's still a subtle, yet critical problem with xl polling. I cannot locate the details of the problem, so I will describe the symptoms that I see in my network, and why I suspect xl polling. Here is a sketch of my two private networks: PC1 - GW1 ~~~~ GW2 - PC2 PC1 = PC on private network, Intel Pro/100 (fxp) GW1 = Dual-homed Gateway, 2 x 3Com 3c905B-TX (xl) GW2 = Dual-homed Gateway, 2 x RealTek 8139 (rl) PC2 = PC on private network, RealTek 8139 (rl) ~~~~ = Internet All computers are running 5-Stable, as of May 10. All, but PC1 with fxp, use polling, with: options DEVICE_POLLING options HZ=1000 GW2 redirects (with natd) port 2200 to PC2. PC1 establishes an ssh-tunnel to PC2: PC1$ ssh -p 2200 -N -f -R 2000:localhost:22 GW2 Then on PC2, I can use this tunnel to connect directly to PC1, for example to run xbiff: PC2$ ssh -Y -p 2000 localhost xbiff This works beautifully, but every now and then the ssh-tunnel connection is 'closed' for no reason (the ssh-tunnel itself remains, but the connection is closed). This happens at least once an hour, seemingly at random. After some trial and error, I discovered that the polling of the xl devices (GW1) is the culprit. As soon as I disable the polling for the xl devices on GW1, the ssh-tunnel connection is not disrupted anymore. ----- GW1 is also a production server, so experimenting is rather limited. However, I can run tests, if that would help resolve the problem. Here you can find some relevant info on GW1 with the xl devices: dmesg output: http://surfion.snu.ac.kr/~lahaye/dmesg.boot kernel configuration: http://surfion.snu.ac.kr/~lahaye/MYKERNEL /boot/loader.conf: http://surfion.snu.ac.kr/~lahaye/loader.conf Regards, Rob. __________________________________ Yahoo! Mail Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail
On 5/11/2005 8:04, Rob wrote:>All computers are running 5-Stable, as of May 10. >All, but PC1 with fxp, use polling, with: > options DEVICE_POLLING > options HZ=1000 > >1000 IMHO seems a bit too heavy. Try something lower. Regards S.
Hi, just one experience of mine with Realtek 8139. I was not able to FTP upgrade Suse 9.2. Data transfer was crashing. After some *research* I changed the 8139 and problem was solved. The server was on 3c905B. Cheers, Vlado. On Tue, 10 May 2005, Rob wrote:> Ruslan Ermilov wrote: >> Greetings, >> >> Those of you wishing to try your xl(4) card under >> polling(4) are welcome to test this patch: >> > > Ruslan, > > Yesterday I discovered that polling of the xl > interface randomly disrupts an ssh-tunnel of mine. > I think there's still a subtle, yet critical problem > with xl polling. > > I cannot locate the details of the problem, so I > will describe the symptoms that I see in my network, > and why I suspect xl polling. > > Here is a sketch of my two private networks: > > PC1 - GW1 ~~~~ GW2 - PC2 > > PC1 = PC on private network, Intel Pro/100 (fxp) > GW1 = Dual-homed Gateway, 2 x 3Com 3c905B-TX (xl) > GW2 = Dual-homed Gateway, 2 x RealTek 8139 (rl) > PC2 = PC on private network, RealTek 8139 (rl) > ~~~~ = Internet > > All computers are running 5-Stable, as of May 10. > All, but PC1 with fxp, use polling, with: > options DEVICE_POLLING > options HZ=1000 > > > GW2 redirects (with natd) port 2200 to PC2. > > PC1 establishes an ssh-tunnel to PC2: > PC1$ ssh -p 2200 -N -f -R 2000:localhost:22 GW2 > > Then on PC2, I can use this tunnel to connect > directly to PC1, for example to run xbiff: > PC2$ ssh -Y -p 2000 localhost xbiff > > This works beautifully, but every now and then the > ssh-tunnel connection is 'closed' for no reason > (the ssh-tunnel itself remains, but the connection > is closed). > This happens at least once an hour, seemingly at > random. > > After some trial and error, I discovered that the > polling of the xl devices (GW1) is the culprit. > > As soon as I disable the polling for the xl devices > on GW1, the ssh-tunnel connection is not disrupted > anymore. > > ----- > > GW1 is also a production server, so experimenting > is rather limited. However, I can run tests, if that > would help resolve the problem. > > Here you can find some relevant info on GW1 with the > xl devices: > > dmesg output: > http://surfion.snu.ac.kr/~lahaye/dmesg.boot > kernel configuration: > http://surfion.snu.ac.kr/~lahaye/MYKERNEL > /boot/loader.conf: > http://surfion.snu.ac.kr/~lahaye/loader.conf > > Regards, > Rob. > > > > __________________________________ > Yahoo! Mail Mobile > Take Yahoo! Mail with you! Check email on your mobile phone. > http://mobile.yahoo.com/learn/mail > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > >
--- Subhro <subhro.kar@gmail.com> wrote:> On 5/11/2005 8:04, Rob wrote: > > >All computers are running 5-Stable, as of May 10. > >All, but PC1 with fxp, use polling, with: > > options DEVICE_POLLING > > options HZ=1000 > > > > > 1000 IMHO seems a bit too heavy. Try something > lower.Same problem. Ssh-tunnel connection is also disrupted with HZ=100. May I conclude that the HZ value is not the culprit? Or should I try once again with HZ=10? kern.ipc.nmbclusters is 4928 for this PC. Is that good or bad? "sysctl -a | grep -i polling" gives following: kern.polling.burst: 150 kern.polling.each_burst: 5 kern.polling.burst_max: 150 kern.polling.idle_poll: 0 kern.polling.poll_in_trap: 0 kern.polling.user_frac: 50 kern.polling.reg_frac: 20 kern.polling.short_ticks: 0 kern.polling.lost_polls: 6 kern.polling.pending_polls: 0 kern.polling.residual_burst: 0 kern.polling.handlers: 0 kern.polling.enable: 0 kern.polling.phase: 0 kern.polling.suspect: 6 kern.polling.stalled: 0 kern.polling.idlepoll_sleeping: 1 <118>kern.polling.enable: <118>xl0: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX, MULTICAST,POLLING> mtu 1500 <118> options=49<RXCSUM,VLAN_MTU,POLLING> <118>xl1: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX, MULTICAST,POLLING> mtu 1500 <118> options=49<RXCSUM,VLAN_MTU,POLLING> I actually doubt whether the default values of these sysctl variables would cause the problem. Regards, Rob. __________________________________ Yahoo! Mail Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail
On 5/11/2005 13:13, Rob wrote:>--- Subhro <subhro.kar@gmail.com> wrote: > > > >>On 5/11/2005 8:04, Rob wrote: >> >> >> >>>All computers are running 5-Stable, as of May 10. >>>All, but PC1 with fxp, use polling, with: >>> options DEVICE_POLLING >>> options HZ=1000 >>> >>> >>> >>> >>1000 IMHO seems a bit too heavy. Try something >>lower. >> >> > >Same problem. Ssh-tunnel connection is also disrupted >with HZ=100. May I conclude that the HZ value is not >the culprit? Or should I try once again with HZ=10? > >100 should be fine. 10 would be a bit too much overkill.>kern.ipc.nmbclusters is 4928 for this PC. >Is that good or bad? > >What is the purpose of the box? Give a description of the network traffic.>"sysctl -a | grep -i polling" gives following: >kern.polling.burst: 150 >kern.polling.each_burst: 5 >kern.polling.burst_max: 150 >kern.polling.idle_poll: 0 >kern.polling.poll_in_trap: 0 >kern.polling.user_frac: 50 >kern.polling.reg_frac: 20 >kern.polling.short_ticks: 0 >kern.polling.lost_polls: 6 >kern.polling.pending_polls: 0 >kern.polling.residual_burst: 0 >kern.polling.handlers: 0 >kern.polling.enable: 0 >kern.polling.phase: 0 >kern.polling.suspect: 6 >kern.polling.stalled: 0 >kern.polling.idlepoll_sleeping: 1 ><118>kern.polling.enable: ><118>xl0: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX, > MULTICAST,POLLING> mtu 1500 ><118> options=49<RXCSUM,VLAN_MTU,POLLING> ><118>xl1: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX, > MULTICAST,POLLING> mtu 1500 ><118> options=49<RXCSUM,VLAN_MTU,POLLING> > > >Did you use any strange CFLAGS like -O3 or -f* compile time options when you built the system? Regards S.
On 5/11/2005 13:13, Rob wrote:>--- Subhro <subhro.kar@gmail.com> wrote: > > > >>On 5/11/2005 8:04, Rob wrote: >> >> >> >>>All computers are running 5-Stable, as of May 10. >>>All, but PC1 with fxp, use polling, with: >>> options DEVICE_POLLING >>> options HZ=1000 >>> >>> >>> >>> >>1000 IMHO seems a bit too heavy. Try something >>lower. >> >> > >Same problem. Ssh-tunnel connection is also disrupted >with HZ=100. May I conclude that the HZ value is not >the culprit? Or should I try once again with HZ=10? > >kern.ipc.nmbclusters is 4928 for this PC. >Is that good or bad? > >"sysctl -a | grep -i polling" gives following: >kern.polling.burst: 150 >kern.polling.each_burst: 5 >kern.polling.burst_max: 150 >kern.polling.idle_poll: 0 >kern.polling.poll_in_trap: 0 >kern.polling.user_frac: 50 >kern.polling.reg_frac: 20 >kern.polling.short_ticks: 0 >kern.polling.lost_polls: 6 >kern.polling.pending_polls: 0 >kern.polling.residual_burst: 0 >kern.polling.handlers: 0 >kern.polling.enable: 0 > >Force this to be 1. Damn I should have noted it earlier Regards S.
--- Subhro <subhro.kar@gmail.com> wrote:> On 5/11/2005 13:13, Rob wrote: > > >--- Subhro <subhro.kar@gmail.com> wrote: > > > > > > > >>On 5/11/2005 8:04, Rob wrote: > >> > >> > >> > >>>All computers are running 5-Stable, as of May 10. > >>>All, but PC1 with fxp, use polling, with: > >>> options DEVICE_POLLING > >>> options HZ=1000 > >>> > >>> > >>> > >>> > >>1000 IMHO seems a bit too heavy. Try something > >>lower. > > > > Same problem. Ssh-tunnel connection is also > > disrupted with HZ=100. May I conclude that the > > HZ value is not the culprit? Or should I try > > once again with HZ=10? > > > > > 100 should be fine. 10 would be a bit too much > overkill. > > >kern.ipc.nmbclusters is 4928 for this PC. > >Is that good or bad? > > > > > What is the purpose of the box? Give a description > of the network traffic.This is a lab in the Chemistry department; the box in question is a dual-homed gateway to eight other PCs in the lab. The box has a tight firewall, and runs an apache server and an SSH server. On the private network, the box also runs as a DHCP server, Samba server and NTP server. OS = 5-Stable. The other PCs in the lab are two FreeBSD PCs and various flavours of Windows.> Did you use any strange CFLAGS like -O3 or -f* > compile time options when you built the system?No. My /etc/make.conf has: CFLAGS= -O -pipe NOPROFILE=true NO_PF=true>> kern.polling.enable: 0 > > Force this to be 1. Damn I should have noted it > earlierI took this printout after I changed the value to 0. Of course it is 1 when I test the polling, but when I noticed that the ssh-tunnel connection problem persisted, I changed it to 0; so that my ssh-tunnel connection is not randomly closed :). Thanks for your elaborate help! Rob. __________________________________ Yahoo! Mail Mobile Take Yahoo! Mail with you! Check email on your mobile phone. http://mobile.yahoo.com/learn/mail
On Wed, May 11, 2005 at 12:43:09AM -0700, Rob wrote:> I actually doubt whether the default values of > these sysctl variables would cause the problem. >No. Can you observe the broken IP/TCP/UDP checksums? netstat -ss -f inet |grep -w bad Cheers, -- Ruslan Ermilov ru@FreeBSD.org FreeBSD committer -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20050511/32d7a6cf/attachment.bin