Hi folks, I have four identical ITX boards from Jetway here, each having two re(4) onboard nics: re0@pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' device = 'RTL8169/8110 Family Gigabit Ethernet NIC' class = network subclass = ethernet re1@pci0:0:11:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' device = 'RTL8169/8110 Family Gigabit Ethernet NIC' class = network subclass = ethernet atapci0@pci0:0:15:0: class=0x01018f card=0x31491106 chip=0x31491106 rev=0x80 I run FreeBSD 7-stable from early March 08 on three of these machines and noticed no problems with networking with that so far. Some days ago I installed a fourth machine with 7-stable from early May (and some days later -because of the problems described below- to May 17th). With this new machine I see several networking problems. The most prominent are these two: - heavy networking traffic (in this case backup via tar & NFS) causes hangs for about 10s-30s and sometimes also leads to watchdog timeouts: May 27 09:04:07 protoserve kernel: re0: watchdog timeout May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN May 27 09:04:10 protoserve kernel: re0: link state changed to UP - copying large files (more than some 100MB) via ssh/scp drops the connection due to "corrupted MAC on input": Disconnecting: Corrupted MAC on input. lost connection In the latter case the networking traffic should actually not be that high, because these are nanobsd systems which are transferring a new image file (system update, 2GB) via ssh (so the bottleneck should be the write speed of the CF card used to hold the system). I do not see these problems with the old codebase from March 08 on my old machines. The cvs shows a large MFC for the re-driver in April, so I guessed something came in there which broke things here. Therefore I downgraded the new system to a cvs codebase from March 1st, but the problems persist. They also exist on both interfaces. memtest86 is running for hours now without finding something wrong. Any hints what I should do next to find the culprit? cu Gerrit
Gerrit K?hn wrote:> Hi folks, > > I have four identical ITX boards from Jetway here, each having two re(4) > onboard nics: > > re0@pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec rev=0x10 > hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > re1@pci0:0:11:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec > rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > atapci0@pci0:0:15:0: class=0x01018f card=0x31491106 chip=0x31491106 > rev=0x80 > > > I run FreeBSD 7-stable from early March 08 on three of these > machines and noticed no problems with networking with that so far. > Some days ago I installed a fourth machine with 7-stable from early May > (and some days later -because of the problems described below- to May > 17th). With this new machine I see several networking problems. The most > prominent are these two: > > - heavy networking traffic (in this case backup via tar & NFS) causes hangs > for about 10s-30s and sometimes also leads to watchdog timeouts: > May 27 09:04:07 protoserve kernel: re0: watchdog timeout > May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN > May 27 09:04:10 protoserve kernel: re0: link state changed to UP > > - copying large files (more than some 100MB) via ssh/scp drops the > connection due to "corrupted MAC on input": > Disconnecting: Corrupted MAC on input. > lost connection > > In the latter case the networking traffic should actually not be that > high, because these are nanobsd systems which are transferring a new image > file (system update, 2GB) via ssh (so the bottleneck should be the write > speed of the CF card used to hold the system). > > > I do not see these problems with the old codebase from March 08 on my old > machines. The cvs shows a large MFC for the re-driver in April, so I > guessed something came in there which broke things here. Therefore I > downgraded the new system to a cvs codebase from March 1st, but the > problems persist. They also exist on both interfaces. memtest86 is running > for hours now without finding something wrong. > > Any hints what I should do next to find the culprit? >I'm running 6.3 on the exact same Jetway board at home, and while I haven't been bitten by the DOWN/UP issue I have seen the occasional "corrupted MAC on input" error when doing an ssh/scp. Seems to have simmered-down since moving from 6.3-RELEASE to 6.3-STABLE (last supped/rebuilt on 5/6/08). Note this is using only one of the 2 on-board NICs. I disabled the 2nd one in the BIOS as I don't need it at the moment. -Proto
On Tue, May 27, 2008 at 04:52:32PM +0200, Gerrit K?hn wrote: > Hi folks, > > I have four identical ITX boards from Jetway here, each having two re(4) > onboard nics: > > re0@pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec rev=0x10 > hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > re1@pci0:0:11:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec > rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > atapci0@pci0:0:15:0: class=0x01018f card=0x31491106 chip=0x31491106 > rev=0x80 > > > I run FreeBSD 7-stable from early March 08 on three of these > machines and noticed no problems with networking with that so far. > Some days ago I installed a fourth machine with 7-stable from early May > (and some days later -because of the problems described below- to May > 17th). With this new machine I see several networking problems. The most > prominent are these two: > > - heavy networking traffic (in this case backup via tar & NFS) causes hangs > for about 10s-30s and sometimes also leads to watchdog timeouts: > May 27 09:04:07 protoserve kernel: re0: watchdog timeout > May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN > May 27 09:04:10 protoserve kernel: re0: link state changed to UP > > - copying large files (more than some 100MB) via ssh/scp drops the > connection due to "corrupted MAC on input": > Disconnecting: Corrupted MAC on input. > lost connection > > In the latter case the networking traffic should actually not be that > high, because these are nanobsd systems which are transferring a new image > file (system update, 2GB) via ssh (so the bottleneck should be the write > speed of the CF card used to hold the system). > > > I do not see these problems with the old codebase from March 08 on my old > machines. The cvs shows a large MFC for the re-driver in April, so I > guessed something came in there which broke things here. Therefore I > downgraded the new system to a cvs codebase from March 1st, but the > problems persist. They also exist on both interfaces. memtest86 is running > for hours now without finding something wrong. > > Any hints what I should do next to find the culprit? > There were similiar reports on this issue. It seems that it's very hard to make re(4) work so many RTL8168/8169/8111 revisions without documentation as different revisions require different workaround. Anyway, would you try this one? The patch was generated against HEAD but it would apply to STABLE too. http://people.freebsd.org/~yongari/re/re.HEAD.20080519 -- Regards, Pyun YongHyeon
Quoting Gerrit Khn, who wrote on Fri, May 30, 2008 at 02:47:59PM +0200 ..> On Fri, 30 May 2008 13:49:24 +0200 Wilko Bulte <wb@freebie.xs4all.nl> > wrote about Re: broken re(4): > > WB> > Typing "pci riser card jumper" in Google will give you > WB> > many more pages with interesting (or frightening) stuff > WB> > to read. > > WB> Well, if you know how the PCI bus electrically works this kind of > WB> problem is hardly a surprise ;-) > > Well, the riser that came with this 1HU-chassis is probably even more > frightening: it plugs into the pci port and uses a short ribbon cable to > connect to an extra board which holds the cards.Hmmm... brr.... -- Wilko Bulte wilko@FreeBSD.org
Il giorno 27/mag/08, alle ore 16:52, Gerrit K?hn ha scritto:> Hi folks, > > I have four identical ITX boards from Jetway here, each having two > re(4) > onboard nics: > > re0@pci0:0:9:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec > rev=0x10 > hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > re1@pci0:0:11:0: class=0x020000 card=0x10ec16f3 chip=0x816710ec > rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' > device = 'RTL8169/8110 Family Gigabit Ethernet NIC' > class = network > subclass = ethernet > atapci0@pci0:0:15:0: class=0x01018f card=0x31491106 chip=0x31491106 > rev=0x80 > > > I run FreeBSD 7-stable from early March 08 on three of these > machines and noticed no problems with networking with that so far. > Some days ago I installed a fourth machine with 7-stable from early > May > (and some days later -because of the problems described below- to May > 17th). With this new machine I see several networking problems. The > most > prominent are these two: > > - heavy networking traffic (in this case backup via tar & NFS) > causes hangs > for about 10s-30s and sometimes also leads to watchdog timeouts: > May 27 09:04:07 protoserve kernel: re0: watchdog timeout > May 27 09:04:07 protoserve kernel: re0: link state changed to DOWN > May 27 09:04:10 protoserve kernel: re0: link state changed to UP > > - copying large files (more than some 100MB) via ssh/scp drops the > connection due to "corrupted MAC on input": > Disconnecting: Corrupted MAC on input. > lost connectionI had the same problem. I fixed it (for now) making a buildworld with *default date=2008.03.01.00.00.00 in my src csup configuration. I'm not so skilled to investigate in the sources but the problem is after this date. Regards Daniele Bastianini
On Tue, 10 Jun 2008 20:43:04 +0200 Daniele Bastianini <liste.bsd@gmail.com> wrote about Re: broken re(4): DB> > - copying large files (more than some 100MB) via ssh/scp drops the DB> > connection due to "corrupted MAC on input": DB> > Disconnecting: Corrupted MAC on input. DB> > lost connection DB> I had the same problem. DB> I fixed it (for now) making a buildworld with DB> *default date=2008.03.01.00.00.00 in my src csup configuration. DB> I'm not so skilled to investigate in the sources but the problem is DB> after this date. For me all versions from cvs and all patches from Pyun are working now, after I have solved the issue with the bad riser card. I still think it's funny that the riser causes this kind of trouble for the networking chips. On the other hand, I have not been able to get more than about 10MByte/s through the interfaces of this particular system. I have 1GBit-networking equipment, and the other systems (which are used as router) have no problem doing a throughput of >20MB/s. Even bonding the two interfaces using lagg(4) does not improve the performance - where else could be the bottleneck? The only difference here is that I have the extra SATA-controller with disks in there. However, the disks appear to be as fast as I can expect from a SATA150-interface. cu Gerrit
Gerrit K?hn wrote: > On the other hand, I have not been able to get more than about 10MByte/s > through the interfaces of this particular system. I have 1GBit-networking > equipment, and the other systems (which are used as router) have no > problem doing a throughput of >20MB/s. Even bonding the two interfaces > using lagg(4) does not improve the performance - where else could be the > bottleneck? A few questions or hints ... - What is the CPU usage during your network test (user, sys, intr, idle)? - Do you see errors in "netstat -i"? - Do you use jumbo frames? - Is polling enabled? - Are there any network-related sysctls (/etc/sysctl.conf) or kernel settings? Have you enabled kernel debugging features (INVARIANTS, WITNESS etc.)? - Do you have any packet filter rules (PF, IPF, IPFW)? Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Gesch?ftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n- chen, HRB 125758, Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "C++ is the only current language making COBOL look good." -- Bertrand Meyer
On Wed, 11 Jun 2008 17:26:29 +0200 (CEST) Oliver Fromme <olli@lurza.secnetix.de> wrote about Re: broken re(4): OF> > On the other hand, I have not been able to get more than about OF> > 10MByte/s through the interfaces of this particular system. I have OF> > 1GBit-networking equipment, and the other systems (which are used OF> > as router) have no problem doing a throughput of >20MB/s. Even OF> > bonding the two interfaces using lagg(4) does not improve the OF> > performance - where else could be the bottleneck? OF> A few questions or hints ... OF> - What is the CPU usage during your network test (user, OF> sys, intr, idle)? I will test and report that tomorrow. OF> - Do you see errors in "netstat -i"? None. OF> - Do you use jumbo frames? No. OF> - Is polling enabled? No. I tested polling on a lot of different machines earlier and never found it to improve performance so far (same for jumbo frames, btw). OF> - Are there any network-related sysctls (/etc/sysctl.conf) OF> or kernel settings? Have you enabled kernel debugging OF> features (INVARIANTS, WITNESS etc.)? No, stock GENERIC, only with a lot of things disabled. OF> - Do you have any packet filter rules (PF, IPF, IPFW)? No, not on this machine. The faster machines are router/firewalls, they do filtering; so it should be something different... cu Gerrit
On Thu, Jun 12, 2008 at 08:58:10AM +0200, Gerrit K?hn wrote: > On Thu, 12 Jun 2008 12:22:28 +0900 Pyun YongHyeon <pyunyh@gmail.com> wrote > about Re: broken re(4): > > PY> Before checking performance of network controller you had to rule > PY> out other factors like disk I/O. Use one of benchmark programs in > PY> ports/benchmark. > > I already did simple benchmarking by using "dd if=/dev/zero of=file" which > gave me several 10s of MByte/s under all circumstances. > Can you recommend one of the benchmarking programs for more detailed > testing? > Try netperf or iperf in ports/benchmark. > > cu > Gerrit -- Regards, Pyun YongHyeon