I've spent the past four days or so updating machines here to 4.8/9-stable via cvsup, and have done a complete make buildworld/kernel on each machine (some SMP, some single processor). It seems something is broken with the latest fxp driver, on each machine (different mobos and hardware configs) heavy network traffic with fxp NICs causes timeouts and random kernel panics. First machine to experience the problem was a single proc PIII-650 with 512M and Adaptec 2940UW, one fxp, doing a backup via scp, after 10 megs or so starting giving fxp0 timeout errors and dropping the connection (host was not pingable and dropped all arp entries). The only way to restart the scp was to ifconfig fxp0 back up with the same IP and netmask. Second machine is a dual proc PIII-650 with 512M, MegaRAID, one fxp - after a minute or so of scp'ing the machine completely locked, had to be hard reset. Second attempt caused a panic that seized entire machine with instant reboot. Few more machines, same problems, all with varying SCSI subsystems and with one fxp NIC. After replacing each machine's fxp with crappy tulip and/or $12 kmart linksys NIC, I've had no problems at all. --------------------------------- Perry Research, Inc. 5450 Bruce B. Downs Blvd #313 Wesley Chapel, FL 33543 p: 813-864-7659 f: 813-862-2015 http://www.PerryResearch.com
At 12:26 PM 12/09/2003, Info Account wrote:>I've spent the past four days or so updating machines here to 4.8/9-stable via >cvsup, and have done a complete make buildworld/kernel on each machine (some >SMP, some single processor). It seems something is broken with the latest fxp >driver, on each machine (different mobos and hardware configs) heavy network >traffic with fxp NICs causes timeouts and random kernel panics.I have a few boxes pushing over 50Mb with fxp cards and havent seen this problem. What type of fxp cards do you have ? What does pciconf -v -l show for the Intel types ? Also, I have found in the past that I would see this behavior if I changed NICs and didnt do a PCIconfig reset in the MB BIOS. There is something about Intel nics and Adaptec and 3ware cards that particularly require this. Also, make sure that you dont have some duplex mismatches on the nics. I have seen where excessive errors combined with high traffic will cause panics. Also, please post the actual error messages on each of the machines. ---Mike>First machine to experience the problem was a single proc PIII-650 with >512M and >Adaptec 2940UW, one fxp, doing a backup via scp, after 10 megs or so starting >giving fxp0 timeout errors and dropping the connection (host was not pingable >and dropped all arp entries). The only way to restart the scp was to ifconfig >fxp0 back up with the same IP and netmask. > >Second machine is a dual proc PIII-650 with 512M, MegaRAID, one fxp - after a >minute or so of scp'ing the machine completely locked, had to be hard reset. >Second attempt caused a panic that seized entire machine with instant reboot. > >Few more machines, same problems, all with varying SCSI subsystems and >with one >fxp NIC. After replacing each machine's fxp with crappy tulip and/or $12 >kmart >linksys NIC, I've had no problems at all. > >--------------------------------- > >Perry Research, Inc. >5450 Bruce B. Downs Blvd #313 >Wesley Chapel, FL 33543 >p: 813-864-7659 f: 813-862-2015 > >http://www.PerryResearch.com > > > >_______________________________________________ >freebsd-stable@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-stable >To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
On 15-Sep-2003 Vivek Khera wrote:> I've a handful of 1550s as well. None of them exhibit any problems > speaking to the network as connected to Netgear 10/100 switches (well, > one did at one time, but it turned out to be a motherboard hardware > fault). One of the servers' sole duty is to take backups of various > large files on other machines on the LAN and it works just fine.Interesting! Thanks for that info.> Here's the pciconf output from that 'backup' machine: > > fxp0@pci0:1:0: class=0x020000 card=0x00da1028 chip=0x12298086 rev=0x08 hdr=0x00 > vendor = 'Intel Corporation' > device = '82557/8/9 EtherExpress PRO/100(B) Ethernet Adapter' > class = network > subclass = ethernet > fxp1@pci0:2:0: class=0x020000 card=0x00da1028 chip=0x12298086 rev=0x08 hdr=0x00 > vendor = 'Intel Corporation' > device = '82557/8/9 EtherExpress PRO/100(B) Ethernet Adapter' > class = network > subclass = ethernetYes, that's exactly the same as my 1550 system.> I wouldn't rule out hardware. The Dell diagnostics are amazingly good > at finding hardware faults.I'll check into that. I've never run the Dell diagnostics before.> It could just as well be your switch/hub. I have an old 5-port hub > that none of my fxp ports will speak to, but the de and sis ones do.I don't think anything the switch does should be able to cause SCB timeouts and DMA timeouts. But just to be sure, I tried again using a Dell managed 10/100/1000 Mbit switch. I still get the same failures with that switch, too. I also tried disabling flow control on the switch, but it didn't help. Doug Ambrisko told me he's had similar problems with certain fxp devices and was able to fix them by patching a few bits in the EEPROMs based on the EEPROM contents of a card that works. It sounds like he found the bits to patch more or less by trial and error. (There's a posting from him about it in the mailing list archives somewhere.) I'm going to try it, but haven't had time yet to do it safely. I have a Dell desktop machine with exactly the same revision of 82559 in it that works perfectly, so I was hoping to use it as the reference. Unfortunately, its EEPROM contents differ from those on the 1550 in several places, even ignoring the expected differences in the stored MAC addresses. So it's not at all obvious what to change and what to leave alone. If it were a NIC I'd be willing to trash it, but I'm naturally more cautious with devices that are on the motherboard. John