Hi folks,
I have four identical ITX boards from Jetway here, each having two re(4)
onboard nics:
re0@pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec rev=0x10
hdr=0x00 vendor = 'Realtek Semiconductor'
device = 'RTL8169/8110 Family Gigabit Ethernet NIC'
class = network
subclass = ethernet
re1@pci0:0:11:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec
rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor'
device = 'RTL8169/8110 Family Gigabit Ethernet NIC'
class = network
subclass = ethernet
atapci0@pci0:0:15:0: class=0x01018f card=0x31491106 chip=0x31491106
rev=0x80
I run FreeBSD 7-stable from early March 08 on three of these
machines and noticed no problems with networking with that so far.
Some days ago I installed a fourth machine with 7-stable from early May
(and some days later -because of the problems described below- to May
17th). With this new machine I see several networking problems. The most
prominent are these two:
- heavy networking traffic (in this case backup via tar & NFS) causes hangs
for about 10s-30s and sometimes also leads to watchdog timeouts:
May 27 09:04:07 protoserve kernel: re0: watchdog timeout
May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN
May 27 09:04:10 protoserve kernel: re0: link state changed to UP
- copying large files (more than some 100MB) via ssh/scp drops the
connection due to "corrupted MAC on input":
Disconnecting: Corrupted MAC on input.
lost connection
In the latter case the networking traffic should actually not be that
high, because these are nanobsd systems which are transferring a new image
file (system update, 2GB) via ssh (so the bottleneck should be the write
speed of the CF card used to hold the system).
I do not see these problems with the old codebase from March 08 on my old
machines. The cvs shows a large MFC for the re-driver in April, so I
guessed something came in there which broke things here. Therefore I
downgraded the new system to a cvs codebase from March 1st, but the
problems persist. They also exist on both interfaces. memtest86 is running
for hours now without finding something wrong.
Any hints what I should do next to find the culprit?
cu
Gerrit
Gerrit K?hn wrote:> Hi folks, > > I have four identical ITX boards from Jetway here, each having two re(4) > onboard nics: > > re0@pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec rev=0x10 > hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > re1@pci0:0:11:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec > rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > atapci0@pci0:0:15:0: class=0x01018f card=0x31491106 chip=0x31491106 > rev=0x80 > > > I run FreeBSD 7-stable from early March 08 on three of these > machines and noticed no problems with networking with that so far. > Some days ago I installed a fourth machine with 7-stable from early May > (and some days later -because of the problems described below- to May > 17th). With this new machine I see several networking problems. The most > prominent are these two: > > - heavy networking traffic (in this case backup via tar & NFS) causes hangs > for about 10s-30s and sometimes also leads to watchdog timeouts: > May 27 09:04:07 protoserve kernel: re0: watchdog timeout > May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN > May 27 09:04:10 protoserve kernel: re0: link state changed to UP > > - copying large files (more than some 100MB) via ssh/scp drops the > connection due to "corrupted MAC on input": > Disconnecting: Corrupted MAC on input. > lost connection > > In the latter case the networking traffic should actually not be that > high, because these are nanobsd systems which are transferring a new image > file (system update, 2GB) via ssh (so the bottleneck should be the write > speed of the CF card used to hold the system). > > > I do not see these problems with the old codebase from March 08 on my old > machines. The cvs shows a large MFC for the re-driver in April, so I > guessed something came in there which broke things here. Therefore I > downgraded the new system to a cvs codebase from March 1st, but the > problems persist. They also exist on both interfaces. memtest86 is running > for hours now without finding something wrong. > > Any hints what I should do next to find the culprit? >I'm running 6.3 on the exact same Jetway board at home, and while I haven't been bitten by the DOWN/UP issue I have seen the occasional "corrupted MAC on input" error when doing an ssh/scp. Seems to have simmered-down since moving from 6.3-RELEASE to 6.3-STABLE (last supped/rebuilt on 5/6/08). Note this is using only one of the 2 on-board NICs. I disabled the 2nd one in the BIOS as I don't need it at the moment. -Proto
On Tue, May 27, 2008 at 04:52:32PM +0200, Gerrit K?hn wrote: > Hi folks, > > I have four identical ITX boards from Jetway here, each having two re(4) > onboard nics: > > re0@pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec rev=0x10 > hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > re1@pci0:0:11:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec > rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > atapci0@pci0:0:15:0: class=0x01018f card=0x31491106 chip=0x31491106 > rev=0x80 > > > I run FreeBSD 7-stable from early March 08 on three of these > machines and noticed no problems with networking with that so far. > Some days ago I installed a fourth machine with 7-stable from early May > (and some days later -because of the problems described below- to May > 17th). With this new machine I see several networking problems. The most > prominent are these two: > > - heavy networking traffic (in this case backup via tar & NFS) causes hangs > for about 10s-30s and sometimes also leads to watchdog timeouts: > May 27 09:04:07 protoserve kernel: re0: watchdog timeout > May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN > May 27 09:04:10 protoserve kernel: re0: link state changed to UP > > - copying large files (more than some 100MB) via ssh/scp drops the > connection due to "corrupted MAC on input": > Disconnecting: Corrupted MAC on input. > lost connection > > In the latter case the networking traffic should actually not be that > high, because these are nanobsd systems which are transferring a new image > file (system update, 2GB) via ssh (so the bottleneck should be the write > speed of the CF card used to hold the system). > > > I do not see these problems with the old codebase from March 08 on my old > machines. The cvs shows a large MFC for the re-driver in April, so I > guessed something came in there which broke things here. Therefore I > downgraded the new system to a cvs codebase from March 1st, but the > problems persist. They also exist on both interfaces. memtest86 is running > for hours now without finding something wrong. > > Any hints what I should do next to find the culprit? > There were similiar reports on this issue. It seems that it's very hard to make re(4) work so many RTL8168/8169/8111 revisions without documentation as different revisions require different workaround. Anyway, would you try this one? The patch was generated against HEAD but it would apply to STABLE too. http://people.freebsd.org/~yongari/re/re.HEAD.20080519 -- Regards, Pyun YongHyeon
Quoting Gerrit Khn, who wrote on Fri, May 30, 2008 at 02:47:59PM +0200 ..> On Fri, 30 May 2008 13:49:24 +0200 Wilko Bulte <wb@freebie.xs4all.nl> > wrote about Re: broken re(4): > > WB> > Typing "pci riser card jumper" in Google will give you > WB> > many more pages with interesting (or frightening) stuff > WB> > to read. > > WB> Well, if you know how the PCI bus electrically works this kind of > WB> problem is hardly a surprise ;-) > > Well, the riser that came with this 1HU-chassis is probably even more > frightening: it plugs into the pci port and uses a short ribbon cable to > connect to an extra board which holds the cards.Hmmm... brr.... -- Wilko Bulte wilko@FreeBSD.org
Il giorno 27/mag/08, alle ore 16:52, Gerrit K?hn ha scritto:> Hi folks, > > I have four identical ITX boards from Jetway here, each having two > re(4) > onboard nics: > > re0@pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec > rev=0x10 > hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > re1@pci0:0:11:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec > rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > atapci0@pci0:0:15:0: class=0x01018f card=0x31491106 chip=0x31491106 > rev=0x80 > > > I run FreeBSD 7-stable from early March 08 on three of these > machines and noticed no problems with networking with that so far. > Some days ago I installed a fourth machine with 7-stable from early > May > (and some days later -because of the problems described below- to May > 17th). With this new machine I see several networking problems. The > most > prominent are these two: > > - heavy networking traffic (in this case backup via tar & NFS) > causes hangs > for about 10s-30s and sometimes also leads to watchdog timeouts: > May 27 09:04:07 protoserve kernel: re0: watchdog timeout > May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN > May 27 09:04:10 protoserve kernel: re0: link state changed to UP > > - copying large files (more than some 100MB) via ssh/scp drops the > connection due to "corrupted MAC on input": > Disconnecting: Corrupted MAC on input. > lost connectionI had the same problem. I fixed it (for now) making a buildworld with *default date=2008.03.01.00.00.00 in my src csup configuration. I'm not so skilled to investigate in the sources but the problem is after this date. Regards Daniele Bastianini
On Tue, 10 Jun 2008 20:43:04 +0200 Daniele Bastianini <liste.bsd@gmail.com> wrote about Re: broken re(4): DB> > - copying large files (more than some 100MB) via ssh/scp drops the DB> > connection due to "corrupted MAC on input": DB> > Disconnecting: Corrupted MAC on input. DB> > lost connection DB> I had the same problem. DB> I fixed it (for now) making a buildworld with DB> *default date=2008.03.01.00.00.00 in my src csup configuration. DB> I'm not so skilled to investigate in the sources but the problem is DB> after this date. For me all versions from cvs and all patches from Pyun are working now, after I have solved the issue with the bad riser card. I still think it's funny that the riser causes this kind of trouble for the networking chips. On the other hand, I have not been able to get more than about 10MByte/s through the interfaces of this particular system. I have 1GBit-networking equipment, and the other systems (which are used as router) have no problem doing a throughput of >20MB/s. Even bonding the two interfaces using lagg(4) does not improve the performance - where else could be the bottleneck? The only difference here is that I have the extra SATA-controller with disks in there. However, the disks appear to be as fast as I can expect from a SATA150-interface. cu Gerrit
Gerrit K?hn wrote:
> On the other hand, I have not been able to get more than about 10MByte/s
> through the interfaces of this particular system. I have 1GBit-networking
> equipment, and the other systems (which are used as router) have no
> problem doing a throughput of >20MB/s. Even bonding the two interfaces
> using lagg(4) does not improve the performance - where else could be the
> bottleneck?
A few questions or hints ...
- What is the CPU usage during your network test (user,
sys, intr, idle)?
- Do you see errors in "netstat -i"?
- Do you use jumbo frames?
- Is polling enabled?
- Are there any network-related sysctls (/etc/sysctl.conf)
or kernel settings? Have you enabled kernel debugging
features (INVARIANTS, WITNESS etc.)?
- Do you have any packet filter rules (PF, IPF, IPFW)?
Best regards
Oliver
--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n-
chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart
FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd
"C++ is the only current language making COBOL look good."
-- Bertrand Meyer
On Wed, 11 Jun 2008 17:26:29 +0200 (CEST) Oliver Fromme <olli@lurza.secnetix.de> wrote about Re: broken re(4): OF> > On the other hand, I have not been able to get more than about OF> > 10MByte/s through the interfaces of this particular system. I have OF> > 1GBit-networking equipment, and the other systems (which are used OF> > as router) have no problem doing a throughput of >20MB/s. Even OF> > bonding the two interfaces using lagg(4) does not improve the OF> > performance - where else could be the bottleneck? OF> A few questions or hints ... OF> - What is the CPU usage during your network test (user, OF> sys, intr, idle)? I will test and report that tomorrow. OF> - Do you see errors in "netstat -i"? None. OF> - Do you use jumbo frames? No. OF> - Is polling enabled? No. I tested polling on a lot of different machines earlier and never found it to improve performance so far (same for jumbo frames, btw). OF> - Are there any network-related sysctls (/etc/sysctl.conf) OF> or kernel settings? Have you enabled kernel debugging OF> features (INVARIANTS, WITNESS etc.)? No, stock GENERIC, only with a lot of things disabled. OF> - Do you have any packet filter rules (PF, IPF, IPFW)? No, not on this machine. The faster machines are router/firewalls, they do filtering; so it should be something different... cu Gerrit
On Thu, Jun 12, 2008 at 08:58:10AM +0200, Gerrit K?hn wrote: > On Thu, 12 Jun 2008 12:22:28 +0900 Pyun YongHyeon <pyunyh@gmail.com> wrote > about Re: broken re(4): > > PY> Before checking performance of network controller you had to rule > PY> out other factors like disk I/O. Use one of benchmark programs in > PY> ports/benchmark. > > I already did simple benchmarking by using "dd if=/dev/zero of=file" which > gave me several 10s of MByte/s under all circumstances. > Can you recommend one of the benchmarking programs for more detailed > testing? > Try netperf or iperf in ports/benchmark. > > cu > Gerrit -- Regards, Pyun YongHyeon