thr3ads.net - freebsd stable - recent stability problems with fxp driver [Sep 2003]

If this information is useful, please help other people find it:
Share via:

Info Account

2003-Sep-12 09:25 UTC

recent stability problems with fxp driver

I've spent the past four days or so updating machines here to 4.8/9-stable
via
cvsup, and have done a complete make buildworld/kernel on each machine (some
SMP, some single processor).  It seems something is broken with the latest fxp
driver, on each machine (different mobos and hardware configs) heavy network
traffic with fxp NICs causes timeouts and random kernel panics.  

First machine to experience the problem was a single proc PIII-650 with 512M and
Adaptec 2940UW, one fxp, doing a backup via scp, after 10 megs or so starting
giving fxp0 timeout errors and dropping the connection (host was not pingable
and dropped all arp entries).  The only way to restart the scp was to ifconfig
fxp0 back up with the same IP and netmask.

Second machine is a dual proc PIII-650 with 512M, MegaRAID, one fxp - after a
minute or so of scp'ing the machine completely locked, had to be hard reset.
Second attempt caused a panic that seized entire machine with instant reboot.

Few more machines, same problems, all with varying SCSI subsystems and with one
fxp NIC.  After replacing each machine's fxp with crappy tulip and/or $12
kmart
linksys NIC, I've had no problems at all.

---------------------------------

Perry Research, Inc.
5450 Bruce B. Downs Blvd #313
Wesley Chapel, FL 33543
p: 813-864-7659 f: 813-862-2015

http://www.PerryResearch.com

Mike Tancsa

2003-Sep-12 10:43 UTC

head link

recent stability problems with fxp driver

At 12:26 PM 12/09/2003, Info Account wrote:>I've spent the past four days or so updating machines here to
4.8/9-stable via
>cvsup, and have done a complete make buildworld/kernel on each machine (some
>SMP, some single processor).  It seems something is broken with the latest
fxp
>driver, on each machine (different mobos and hardware configs) heavy network
>traffic with fxp NICs causes timeouts and random kernel panics.
I have a few boxes pushing over 50Mb with fxp cards and havent seen this 
problem.  What type of fxp cards do you have ?  What does
  pciconf -v -l
show for the Intel types ?

Also, I have found in the past that I would see this behavior if I changed 
NICs and didnt do a PCIconfig reset in the MB BIOS.  There is something 
about Intel nics and Adaptec and 3ware cards that particularly require 
this.  Also, make sure that you dont have some duplex mismatches on the 
nics.  I have seen where excessive errors combined with high traffic will 
cause panics.

Also, please post the actual error messages on each of the machines.

         ---Mike



>First machine to experience the problem was a single proc PIII-650 with 
>512M and
>Adaptec 2940UW, one fxp, doing a backup via scp, after 10 megs or so
starting
>giving fxp0 timeout errors and dropping the connection (host was not
pingable
>and dropped all arp entries).  The only way to restart the scp was to
ifconfig
>fxp0 back up with the same IP and netmask.
>
>Second machine is a dual proc PIII-650 with 512M, MegaRAID, one fxp - after
a
>minute or so of scp'ing the machine completely locked, had to be hard
reset.
>Second attempt caused a panic that seized entire machine with instant
reboot.
>
>Few more machines, same problems, all with varying SCSI subsystems and 
>with one
>fxp NIC.  After replacing each machine's fxp with crappy tulip and/or
$12
>kmart
>linksys NIC, I've had no problems at all.
>
>---------------------------------
>
>Perry Research, Inc.
>5450 Bruce B. Downs Blvd #313
>Wesley Chapel, FL 33543
>p: 813-864-7659 f: 813-862-2015
>
>http://www.PerryResearch.com
>
>
>
>_______________________________________________
>freebsd-stable@freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"

John Polstra

2003-Sep-17 18:16 UTC

head link

recent stability problems with fxp driver

On 15-Sep-2003 Vivek Khera wrote:> I've a handful of 1550s as well.  None of them exhibit any problems
> speaking to the network as connected to Netgear 10/100 switches (well,
> one did at one time, but it turned out to be a motherboard hardware
> fault).  One of the servers' sole duty is to take backups of various
> large files on other machines on the LAN and it works just fine.
Interesting!  Thanks for that info.
> Here's the pciconf output from that 'backup' machine:
> 
> fxp0@pci0:1:0:  class=0x020000 card=0x00da1028 chip=0x12298086 rev=0x08
hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = '82557/8/9 EtherExpress PRO/100(B) Ethernet Adapter'
>     class    = network
>     subclass = ethernet
> fxp1@pci0:2:0:  class=0x020000 card=0x00da1028 chip=0x12298086 rev=0x08
hdr=0x00
>     vendor   = 'Intel Corporation'
>     device   = '82557/8/9 EtherExpress PRO/100(B) Ethernet Adapter'
>     class    = network
>     subclass = ethernet
Yes, that's exactly the same as my 1550 system.
> I wouldn't rule out hardware.  The Dell diagnostics are amazingly good
> at finding hardware faults.
I'll check into that.  I've never run the Dell diagnostics before.
> It could just as well be your switch/hub.  I have an old 5-port hub
> that none of my fxp ports will speak to, but the de and sis ones do.
I don't think anything the switch does should be able to cause SCB
timeouts and DMA timeouts.  But just to be sure, I tried again using a
Dell managed 10/100/1000 Mbit switch.  I still get the same failures
with that switch, too.  I also tried disabling flow control on the
switch, but it didn't help.

Doug Ambrisko told me he's had similar problems with certain fxp
devices and was able to fix them by patching a few bits in the EEPROMs
based on the EEPROM contents of a card that works.  It sounds like he
found the bits to patch more or less by trial and error.  (There's a
posting from him about it in the mailing list archives somewhere.)
I'm going to try it, but haven't had time yet to do it safely.  I have
a Dell desktop machine with exactly the same revision of 82559 in it
that works perfectly, so I was hoping to use it as the reference.
Unfortunately, its EEPROM contents differ from those on the 1550 in
several places, even ignoring the expected differences in the stored
MAC addresses.  So it's not at all obvious what to change and what to
leave alone.  If it were a NIC I'd be willing to trash it, but I'm
naturally more cautious with devices that are on the motherboard.

John

Possibly Parallel Threads

Search for more maybe matching threads

freebsd stable - Sep 2003 - recent stability problems with fxp driver

recent stability problems with fxp driver

recent stability problems with fxp driver

recent stability problems with fxp driver

Possibly Parallel Threads