thr3ads.net - freebsd stable - Odd network issue ... *very* slow scp between two servers [Mar 2004]

If this information is useful, please help other people find it:
Share via:

Marc G. Fournier

2004-Mar-06 09:48 UTC

Odd network issue ... very slow scp between two servers

I have two servers on the same network switch, sitting one on top of the
other ... one is running an em device, the other an fxp device ...

Doing a straight ftp between the two servers, of a 1Meg file, shows:

1038785 bytes received in 85.91 seconds (11.81 KB/s)

Going between two servers, same switch, both running fxp devices, for the
exact same file, shows:

1038785 bytes received in 0.09 seconds (10.64 MB/s)

Now, I have ipaudit running on all the servers, to monitor bandwidth ...
the server with the fxp device on it, that I just downloaded to from
another fxp server @ 10.64MB/s, did 11535.73M of traffic total yesterday
...  the one with the em device did 11766.46M ...

Now, in my /var/log/messages file, I am getting the RST lines:

Mar  6 12:35:38 neptune /kernel: Limiting open port RST response from 700 to 200
packets per second
Mar  6 12:35:39 neptune /kernel: Limiting open port RST response from 636 to 200
packets per second
Mar  6 12:35:41 neptune /kernel: Limiting open port RST response from 523 to 200
packets per second
Mar  6 12:35:46 neptune /kernel: Limiting open port RST response from 386 to 200
packets per second
Mar  6 12:35:55 neptune /kernel: Limiting open port RST response from 238 to 200
packets per second
Mar  6 13:34:25 neptune /kernel: Limiting open port RST response from 799 to 200
packets per second
Mar  6 13:34:27 neptune /kernel: Limiting open port RST response from 637 to 200
packets per second
Mar  6 13:34:28 neptune /kernel: Limiting open port RST response from 503 to 200
packets per second
Mar  6 13:34:32 neptune /kernel: Limiting open port RST response from 343 to 200
packets per second
Mar  6 13:34:42 neptune /kernel: Limiting open port RST response from 206 to 200
packets per second

And seems to be quite regular:

neptune# gzcat /var/log/messages.0.gz | grep RST | wc -l
      95

where 0.gz is from Mar  5 14:47:28 -> Mar  6 11:30:52

but, shouldn't:

net.inet.tcp.blackhole: 0 -> 2

help?  or did I read the man page wrong?  If it should, I'm still only
getting ~13k/s on that same file ...

there is nothing else in messages to indicate a problem, either with
processes, or drives, or anything, and load on the machine, right now, is
only 1.3 ...

vmstat -i shows a high rate of interrupts for the em device:

neptune# uptime
 1:43PM  up 57 days,  3:08, 5 users, load averages: 1.38, 1.32, 0.97
neptune# vmstat -i
interrupt                   total       rate
ahd0 irq16                     15          0
ahd1 irq17              932228686        188
em0 irq18              1205773331        244
clk irq0                493596903         99
rtc irq8                631819522        128
Total                  3263418457        661

vs

mars# uptime
 1:43PM  up 77 days,  9:50, 3 users, load averages: 7.44, 7.73, 6.28
mars# vmstat -i
interrupt                   total       rate
fxp0 irq5               499794285         74
ahc0 irq11                     15          0
ahc1 irq15              915710622        136
fdc0 irq6                       4          0
clk irq0                668800403         99
rtc irq8                856196939        128
Total                  2940502268        439

the fxp device is running:
        media: Ethernet autoselect (100baseTX <full-duplex>)

the em device is running:
        media: Ethernet 100baseTX <full-duplex>

and, finally, the em server was last upgraded:
	4.9-STABLE #4: Tue Jan  6 00:59:37 AST 2004

while the fxp server is almost ancient:
	4.9-PRERELEASE #2: Sat Sep 20 14:42:25 ADT 2003

I'm going to do a reboot on the server Monday, when a tech is easily
accessible in case of a problem ... but, before I do that, is there
anything I can do to possible debug this?   Maybe something I can look at
that would show a 'leak', maybe?

Thanks ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Mike Tancsa

2004-Mar-06 15:04 UTC

head link

Odd network issue ... very slow scp between two servers

At 12:48 PM 06/03/2004, Marc G. Fournier wrote:>the fxp device is running:
>         media: Ethernet autoselect (100baseTX <full-duplex>)
>
>the em device is running:
>         media: Ethernet 100baseTX <full-duplex>
what does netstat -ni show on both machines for those NICs ?  Is the switch 
managed ? If so, see if there are any errors.  Also run the tests where 
there is little load going on.  A load of 7 is going to impact something 
that needs cpu power (i.e. the ssh encryption)

         ---Mike

Bill Vermillion

2004-Mar-07 06:12 UTC

head link

Odd network issue ... very slow scp between two servers

freebsd-stable-request@freebsd.org, the prominent pundit, on Sun,
Mar 07, 2004 at 05:24 while half mumbling, half-witicized:
> ------------------------------
> 
> Message: 11
> Date: Sat, 6 Mar 2004 21:26:14 -0400 (AST)
> From: "Marc G. Fournier" <scrappy@hub.org>
> Subject: Re: Odd network issue ... *very* slow scp between two servers
> To: Mike Tancsa <mike@sentex.net>
> Cc: freebsd-stable@freebsd.org
> Message-ID: <20040306212430.F13247@ganymede.hub.org>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> Wow, okay, switching to 10baseT/UTP, full duplex is atrocious too:
> 1038785 bytes received in 74.30 seconds (13.65 KB/s)
> So, bug with full-duplex on the em devices?
> Switching to 100baseTX, half-duplex gives me an error though,
> but seems to work:
I saw something similar when an OS/X machine was having slow
transfers while the BSD's were not.

When I did a traceroute to the adjacent machine which was on a
separate /24 network, the packets went to the switch, to the
router which tne sent it back to the switch, and to the
destination.   I don't know what prompted me to perform a
traceroute between two machines that were on the same switch, but
it was two hops instead of just direct.  There was also an
intervening bridge - post switch / pre router that added delay.

This was on a Cisco 2948.

This may have nothing to do with your problem and you didn't
indicated if the machines were on the same subnet or not.
Just throwing this out as a point of interest.

Bill

-- 
Bill Vermillion - bv @ wjv . com

freebsd stable - Mar 2004 - Odd network issue ... *very* slow scp between two servers

Odd network issue ... *very* slow scp between two servers

Odd network issue ... *very* slow scp between two servers

Odd network issue ... *very* slow scp between two servers

freebsd stable - Mar 2004 - Odd network issue ... very slow scp between two servers

Odd network issue ... very slow scp between two servers

Odd network issue ... very slow scp between two servers

Odd network issue ... very slow scp between two servers