Marc G. Fournier
2004-Mar-06 09:48 UTC
Odd network issue ... *very* slow scp between two servers
I have two servers on the same network switch, sitting one on top of the other ... one is running an em device, the other an fxp device ... Doing a straight ftp between the two servers, of a 1Meg file, shows: 1038785 bytes received in 85.91 seconds (11.81 KB/s) Going between two servers, same switch, both running fxp devices, for the exact same file, shows: 1038785 bytes received in 0.09 seconds (10.64 MB/s) Now, I have ipaudit running on all the servers, to monitor bandwidth ... the server with the fxp device on it, that I just downloaded to from another fxp server @ 10.64MB/s, did 11535.73M of traffic total yesterday ... the one with the em device did 11766.46M ... Now, in my /var/log/messages file, I am getting the RST lines: Mar 6 12:35:38 neptune /kernel: Limiting open port RST response from 700 to 200 packets per second Mar 6 12:35:39 neptune /kernel: Limiting open port RST response from 636 to 200 packets per second Mar 6 12:35:41 neptune /kernel: Limiting open port RST response from 523 to 200 packets per second Mar 6 12:35:46 neptune /kernel: Limiting open port RST response from 386 to 200 packets per second Mar 6 12:35:55 neptune /kernel: Limiting open port RST response from 238 to 200 packets per second Mar 6 13:34:25 neptune /kernel: Limiting open port RST response from 799 to 200 packets per second Mar 6 13:34:27 neptune /kernel: Limiting open port RST response from 637 to 200 packets per second Mar 6 13:34:28 neptune /kernel: Limiting open port RST response from 503 to 200 packets per second Mar 6 13:34:32 neptune /kernel: Limiting open port RST response from 343 to 200 packets per second Mar 6 13:34:42 neptune /kernel: Limiting open port RST response from 206 to 200 packets per second And seems to be quite regular: neptune# gzcat /var/log/messages.0.gz | grep RST | wc -l 95 where 0.gz is from Mar 5 14:47:28 -> Mar 6 11:30:52 but, shouldn't: net.inet.tcp.blackhole: 0 -> 2 help? or did I read the man page wrong? If it should, I'm still only getting ~13k/s on that same file ... there is nothing else in messages to indicate a problem, either with processes, or drives, or anything, and load on the machine, right now, is only 1.3 ... vmstat -i shows a high rate of interrupts for the em device: neptune# uptime 1:43PM up 57 days, 3:08, 5 users, load averages: 1.38, 1.32, 0.97 neptune# vmstat -i interrupt total rate ahd0 irq16 15 0 ahd1 irq17 932228686 188 em0 irq18 1205773331 244 clk irq0 493596903 99 rtc irq8 631819522 128 Total 3263418457 661 vs mars# uptime 1:43PM up 77 days, 9:50, 3 users, load averages: 7.44, 7.73, 6.28 mars# vmstat -i interrupt total rate fxp0 irq5 499794285 74 ahc0 irq11 15 0 ahc1 irq15 915710622 136 fdc0 irq6 4 0 clk irq0 668800403 99 rtc irq8 856196939 128 Total 2940502268 439 the fxp device is running: media: Ethernet autoselect (100baseTX <full-duplex>) the em device is running: media: Ethernet 100baseTX <full-duplex> and, finally, the em server was last upgraded: 4.9-STABLE #4: Tue Jan 6 00:59:37 AST 2004 while the fxp server is almost ancient: 4.9-PRERELEASE #2: Sat Sep 20 14:42:25 ADT 2003 I'm going to do a reboot on the server Monday, when a tech is easily accessible in case of a problem ... but, before I do that, is there anything I can do to possible debug this? Maybe something I can look at that would show a 'leak', maybe? Thanks ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Mike Tancsa
2004-Mar-06 15:04 UTC
Odd network issue ... *very* slow scp between two servers
At 12:48 PM 06/03/2004, Marc G. Fournier wrote:>the fxp device is running: > media: Ethernet autoselect (100baseTX <full-duplex>) > >the em device is running: > media: Ethernet 100baseTX <full-duplex>what does netstat -ni show on both machines for those NICs ? Is the switch managed ? If so, see if there are any errors. Also run the tests where there is little load going on. A load of 7 is going to impact something that needs cpu power (i.e. the ssh encryption) ---Mike
Bill Vermillion
2004-Mar-07 06:12 UTC
Odd network issue ... *very* slow scp between two servers
freebsd-stable-request@freebsd.org, the prominent pundit, on Sun, Mar 07, 2004 at 05:24 while half mumbling, half-witicized:> ------------------------------ > > Message: 11 > Date: Sat, 6 Mar 2004 21:26:14 -0400 (AST) > From: "Marc G. Fournier" <scrappy@hub.org> > Subject: Re: Odd network issue ... *very* slow scp between two servers > To: Mike Tancsa <mike@sentex.net> > Cc: freebsd-stable@freebsd.org > Message-ID: <20040306212430.F13247@ganymede.hub.org> > Content-Type: TEXT/PLAIN; charset=US-ASCII> Wow, okay, switching to 10baseT/UTP, full duplex is atrocious too:> 1038785 bytes received in 74.30 seconds (13.65 KB/s)> So, bug with full-duplex on the em devices?> Switching to 100baseTX, half-duplex gives me an error though, > but seems to work:I saw something similar when an OS/X machine was having slow transfers while the BSD's were not. When I did a traceroute to the adjacent machine which was on a separate /24 network, the packets went to the switch, to the router which tne sent it back to the switch, and to the destination. I don't know what prompted me to perform a traceroute between two machines that were on the same switch, but it was two hops instead of just direct. There was also an intervening bridge - post switch / pre router that added delay. This was on a Cisco 2948. This may have nothing to do with your problem and you didn't indicated if the machines were on the same subnet or not. Just throwing this out as a point of interest. Bill -- Bill Vermillion - bv @ wjv . com