thr3ads.net - freebsd stable - xl(4) & polling [May 2005]

If this information is useful, please help other people find it:
Share via:

Rob

2005-May-10 19:34 UTC

xl(4) & polling

Ruslan Ermilov wrote:> Greetings,
> 
> Those of you wishing to try your xl(4) card under
> polling(4) are welcome to test this patch:
> 
Ruslan,

Yesterday I discovered that polling of the xl
interface randomly disrupts an ssh-tunnel of mine.
I think there's still a subtle, yet critical problem
with xl polling.

I cannot locate the details of the problem, so I
will describe the symptoms that I see in my network,
and why I suspect xl polling.

Here is a sketch of my two private networks:

 PC1 - GW1 ~~~~ GW2 - PC2

   PC1 = PC on private network, Intel Pro/100 (fxp)
   GW1 = Dual-homed Gateway, 2 x 3Com 3c905B-TX (xl)
   GW2 = Dual-homed Gateway, 2 x RealTek 8139 (rl)
   PC2 = PC on private network, RealTek 8139 (rl)
   ~~~~ = Internet

All computers are running 5-Stable, as of May 10.
All, but PC1 with fxp, use polling, with:
   options DEVICE_POLLING
   options HZ=1000

GW2 redirects (with natd) port 2200 to PC2.

PC1 establishes an ssh-tunnel to PC2:
  PC1$ ssh -p 2200 -N -f -R 2000:localhost:22 GW2

Then on PC2, I can use this tunnel to connect
directly to PC1, for example to run xbiff:
  PC2$ ssh -Y -p 2000 localhost xbiff

This works beautifully, but every now and then the
ssh-tunnel connection is 'closed' for no reason
(the ssh-tunnel itself remains, but the connection
is closed).
This happens at least once an hour, seemingly at
random.

After some trial and error, I discovered that the
polling of the xl devices (GW1) is the culprit.

As soon as I disable the polling for the xl devices
on GW1, the ssh-tunnel connection is not disrupted
anymore.

-----

GW1 is also a production server, so experimenting
is rather limited. However, I can run tests, if that
would help resolve the problem.

Here you can find some relevant info on GW1 with the
xl devices:

dmesg output:
  http://surfion.snu.ac.kr/~lahaye/dmesg.boot
kernel configuration:
  http://surfion.snu.ac.kr/~lahaye/MYKERNEL
/boot/loader.conf:
  http://surfion.snu.ac.kr/~lahaye/loader.conf

Regards,
Rob.

__________________________________ 
Yahoo! Mail Mobile 
Take Yahoo! Mail with you! Check email on your mobile phone. 
http://mobile.yahoo.com/learn/mail

Subhro

2005-May-10 20:40 UTC

head link

xl(4) & polling

On 5/11/2005 8:04, Rob wrote:
>All computers are running 5-Stable, as of May 10.
>All, but PC1 with fxp, use polling, with:
>   options DEVICE_POLLING
>   options HZ=1000
>  
>1000 IMHO seems a bit too heavy. Try something lower.

Regards
S.

Vladimir Botka

2005-May-10 20:53 UTC

head link

xl(4) & polling

Hi,

just one experience of mine with Realtek 8139. I was not able to FTP 
upgrade Suse 9.2. Data transfer was crashing. After some *research* I 
changed the 8139 and problem was solved. The server was on 3c905B.

Cheers,
Vlado.

On Tue, 10 May 2005, Rob wrote:
> Ruslan Ermilov wrote:
>> Greetings,
>>
>> Those of you wishing to try your xl(4) card under
>> polling(4) are welcome to test this patch:
>>
>
> Ruslan,
>
> Yesterday I discovered that polling of the xl
> interface randomly disrupts an ssh-tunnel of mine.
> I think there's still a subtle, yet critical problem
> with xl polling.
>
> I cannot locate the details of the problem, so I
> will describe the symptoms that I see in my network,
> and why I suspect xl polling.
>
> Here is a sketch of my two private networks:
>
> PC1 - GW1 ~~~~ GW2 - PC2
>
>   PC1 = PC on private network, Intel Pro/100 (fxp)
>   GW1 = Dual-homed Gateway, 2 x 3Com 3c905B-TX (xl)
>   GW2 = Dual-homed Gateway, 2 x RealTek 8139 (rl)
>   PC2 = PC on private network, RealTek 8139 (rl)
>   ~~~~ = Internet
>
> All computers are running 5-Stable, as of May 10.
> All, but PC1 with fxp, use polling, with:
>   options DEVICE_POLLING
>   options HZ=1000
>
>
> GW2 redirects (with natd) port 2200 to PC2.
>
> PC1 establishes an ssh-tunnel to PC2:
>  PC1$ ssh -p 2200 -N -f -R 2000:localhost:22 GW2
>
> Then on PC2, I can use this tunnel to connect
> directly to PC1, for example to run xbiff:
>  PC2$ ssh -Y -p 2000 localhost xbiff
>
> This works beautifully, but every now and then the
> ssh-tunnel connection is 'closed' for no reason
> (the ssh-tunnel itself remains, but the connection
> is closed).
> This happens at least once an hour, seemingly at
> random.
>
> After some trial and error, I discovered that the
> polling of the xl devices (GW1) is the culprit.
>
> As soon as I disable the polling for the xl devices
> on GW1, the ssh-tunnel connection is not disrupted
> anymore.
>
> -----
>
> GW1 is also a production server, so experimenting
> is rather limited. However, I can run tests, if that
> would help resolve the problem.
>
> Here you can find some relevant info on GW1 with the
> xl devices:
>
> dmesg output:
>  http://surfion.snu.ac.kr/~lahaye/dmesg.boot
> kernel configuration:
>  http://surfion.snu.ac.kr/~lahaye/MYKERNEL
> /boot/loader.conf:
>  http://surfion.snu.ac.kr/~lahaye/loader.conf
>
> Regards,
> Rob.
>
>
>
> __________________________________
> Yahoo! Mail Mobile
> Take Yahoo! Mail with you! Check email on your mobile phone.
> http://mobile.yahoo.com/learn/mail
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
"freebsd-current-unsubscribe@freebsd.org"
>
>
>

Rob

2005-May-11 00:43 UTC

head link

xl(4) & polling

--- Subhro <subhro.kar@gmail.com> wrote:
> On 5/11/2005 8:04, Rob wrote:
> 
> >All computers are running 5-Stable, as of May 10.
> >All, but PC1 with fxp, use polling, with:
> >   options DEVICE_POLLING
> >   options HZ=1000
> >  
> >
> 1000 IMHO seems a bit too heavy. Try something
> lower.
Same problem. Ssh-tunnel connection is also disrupted
with HZ=100. May I conclude that the HZ value is not
the culprit? Or should I try once again with HZ=10?

kern.ipc.nmbclusters is 4928 for this PC.
Is that good or bad?

"sysctl -a | grep -i polling" gives following:
kern.polling.burst: 150
kern.polling.each_burst: 5
kern.polling.burst_max: 150
kern.polling.idle_poll: 0
kern.polling.poll_in_trap: 0
kern.polling.user_frac: 50
kern.polling.reg_frac: 20
kern.polling.short_ticks: 0
kern.polling.lost_polls: 6
kern.polling.pending_polls: 0
kern.polling.residual_burst: 0
kern.polling.handlers: 0
kern.polling.enable: 0
kern.polling.phase: 0
kern.polling.suspect: 6
kern.polling.stalled: 0
kern.polling.idlepoll_sleeping: 1
<118>kern.polling.enable: 
<118>xl0: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX,
                         MULTICAST,POLLING> mtu 1500
<118>   options=49<RXCSUM,VLAN_MTU,POLLING>
<118>xl1: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX,
                         MULTICAST,POLLING> mtu 1500
<118>   options=49<RXCSUM,VLAN_MTU,POLLING>


I actually doubt whether the default values of
these sysctl variables would cause the problem.

Regards,
Rob.


		
__________________________________ 
Yahoo! Mail Mobile 
Take Yahoo! Mail with you! Check email on your mobile phone. 
http://mobile.yahoo.com/learn/mail

Subhro

2005-May-11 00:47 UTC

head link

xl(4) & polling

On 5/11/2005 13:13, Rob wrote:
>--- Subhro <subhro.kar@gmail.com> wrote:
>
>  
>
>>On 5/11/2005 8:04, Rob wrote:
>>
>>    
>>
>>>All computers are running 5-Stable, as of May 10.
>>>All, but PC1 with fxp, use polling, with:
>>>  options DEVICE_POLLING
>>>  options HZ=1000
>>> 
>>>
>>>      
>>>
>>1000 IMHO seems a bit too heavy. Try something
>>lower.
>>    
>>
>
>Same problem. Ssh-tunnel connection is also disrupted
>with HZ=100. May I conclude that the HZ value is not
>the culprit? Or should I try once again with HZ=10?
>  
>100 should be fine. 10 would be a bit too much overkill.
>kern.ipc.nmbclusters is 4928 for this PC.
>Is that good or bad?
>  
>What is the purpose of the box? Give a description of the network traffic.
>"sysctl -a | grep -i polling" gives following:
>kern.polling.burst: 150
>kern.polling.each_burst: 5
>kern.polling.burst_max: 150
>kern.polling.idle_poll: 0
>kern.polling.poll_in_trap: 0
>kern.polling.user_frac: 50
>kern.polling.reg_frac: 20
>kern.polling.short_ticks: 0
>kern.polling.lost_polls: 6
>kern.polling.pending_polls: 0
>kern.polling.residual_burst: 0
>kern.polling.handlers: 0
>kern.polling.enable: 0
>kern.polling.phase: 0
>kern.polling.suspect: 6
>kern.polling.stalled: 0
>kern.polling.idlepoll_sleeping: 1
><118>kern.polling.enable: 
><118>xl0: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX,
>                         MULTICAST,POLLING> mtu 1500
><118>   options=49<RXCSUM,VLAN_MTU,POLLING>
><118>xl1: flags=18843<UP,BROADCAST,RUNNING,SIMPLEX,
>                         MULTICAST,POLLING> mtu 1500
><118>   options=49<RXCSUM,VLAN_MTU,POLLING>
>
>  
>Did you use any strange CFLAGS like -O3 or -f* compile time options when 
you built the system?

Regards
S.

Subhro

2005-May-11 00:48 UTC

head link

xl(4) & polling

On 5/11/2005 13:13, Rob wrote:
>--- Subhro <subhro.kar@gmail.com> wrote:
>
>  
>
>>On 5/11/2005 8:04, Rob wrote:
>>
>>    
>>
>>>All computers are running 5-Stable, as of May 10.
>>>All, but PC1 with fxp, use polling, with:
>>>  options DEVICE_POLLING
>>>  options HZ=1000
>>> 
>>>
>>>      
>>>
>>1000 IMHO seems a bit too heavy. Try something
>>lower.
>>    
>>
>
>Same problem. Ssh-tunnel connection is also disrupted
>with HZ=100. May I conclude that the HZ value is not
>the culprit? Or should I try once again with HZ=10?
>
>kern.ipc.nmbclusters is 4928 for this PC.
>Is that good or bad?
>
>"sysctl -a | grep -i polling" gives following:
>kern.polling.burst: 150
>kern.polling.each_burst: 5
>kern.polling.burst_max: 150
>kern.polling.idle_poll: 0
>kern.polling.poll_in_trap: 0
>kern.polling.user_frac: 50
>kern.polling.reg_frac: 20
>kern.polling.short_ticks: 0
>kern.polling.lost_polls: 6
>kern.polling.pending_polls: 0
>kern.polling.residual_burst: 0
>kern.polling.handlers: 0
>kern.polling.enable: 0
>  
>Force this to be 1. Damn I should have noted it earlier

Regards
S.

Rob

2005-May-11 01:12 UTC

head link

xl(4) & polling

--- Subhro <subhro.kar@gmail.com> wrote:
> On 5/11/2005 13:13, Rob wrote:
> 
> >--- Subhro <subhro.kar@gmail.com> wrote:
> >
> >  
> >
> >>On 5/11/2005 8:04, Rob wrote:
> >>
> >>    
> >>
> >>>All computers are running 5-Stable, as of May 10.
> >>>All, but PC1 with fxp, use polling, with:
> >>>  options DEVICE_POLLING
> >>>  options HZ=1000
> >>> 
> >>>
> >>>      
> >>>
> >>1000 IMHO seems a bit too heavy. Try something
> >>lower.
> >
> > Same problem. Ssh-tunnel connection is also
> > disrupted with HZ=100. May I conclude that the
> > HZ value is not the culprit? Or should I try
> > once again with HZ=10?
> >  
> >
> 100 should be fine. 10 would be a bit too much
> overkill.
> 
> >kern.ipc.nmbclusters is 4928 for this PC.
> >Is that good or bad?
> >  
> >
> What is the purpose of the box? Give a description
> of the network traffic.
This is a lab in the Chemistry department; the box
in question is a dual-homed gateway to eight other
PCs in the lab. The box has a tight firewall, and
runs an apache server and an SSH server.
On the private network, the box also runs as a DHCP
server, Samba server and NTP server.
OS = 5-Stable.

The other PCs in the lab are two FreeBSD PCs and
various flavours of Windows.
> Did you use any strange CFLAGS like -O3 or -f*
> compile time options when you built the system?
No. My /etc/make.conf has:

CFLAGS= -O -pipe
NOPROFILE=true
NO_PF=true
>> kern.polling.enable: 0
> 
> Force this to be 1. Damn I should have noted it
> earlier
I took this printout after I changed the value to 0.
Of course it is 1 when I test the polling, but when
I noticed that the ssh-tunnel connection problem
persisted, I changed it to 0; so that my ssh-tunnel
connection is not randomly closed :).

Thanks for your elaborate help!

Rob.


		
__________________________________ 
Yahoo! Mail Mobile 
Take Yahoo! Mail with you! Check email on your mobile phone. 
http://mobile.yahoo.com/learn/mail

Ruslan Ermilov

2005-May-11 03:01 UTC

head link

xl(4) & polling

On Wed, May 11, 2005 at 12:43:09AM -0700, Rob wrote:> I actually doubt whether the default values of
> these sysctl variables would cause the problem.
> No.  Can you observe the broken IP/TCP/UDP checksums?

netstat -ss -f inet |grep -w bad


Cheers,
-- 
Ruslan Ermilov
ru@FreeBSD.org
FreeBSD committer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20050511/32d7a6cf/attachment.bin

freebsd stable - May 2005 - xl(4) & polling

xl(4) & polling

xl(4) & polling

xl(4) & polling

xl(4) & polling

xl(4) & polling

xl(4) & polling

xl(4) & polling

xl(4) & polling