thr3ads.net - freebsd stable - 82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R) [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Royce Williams

2009-Nov-12 20:01 UTC

82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)

We have servers with dual 82573 NICs that work well during low-throughput
activity, but during high-volume activity, they pause shortly after transfers
start and do not recover.  Other sessions to the system are not affected.

These systems are being repurposed, jumping from 6.3 to 7.2.  The same system
and its kin do not exhibit the symptom under 6.3-RELEASE-p13.  The symptoms
appear under freebsd-updated 7.2-RELEASE GENERIC kernel with no tuning.

Previously, we've been using DCGDIS.EXE (from Jack Vogel) for this symptom. 
The first system to be repurposed accepts DCGDIS with 'Updated' and
subsequent 'update not needed', with no relief.

Notably, there are no watchdog timeout errors - unlike our various Supermicro
models still running FreeBSD 6.x.  All of our other 7.x Supermicro flavors had
already received the flash update and haven't show the symptom.

Details follow.

Kernel:

rand# uname -a
FreeBSD rand.acsalaska.net 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Fri Oct  2
12:21:39 UTC 2009    
root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  i386

sysctls:

rand# sysctl dev.em
dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.6
dev.em.0.%driver: em
dev.em.0.%location: slot=0 function=0
dev.em.0.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9 subdevice=0x108c
class=0x020000
dev.em.0.%parent: pci13
dev.em.0.debug: -1
dev.em.0.stats: -1
dev.em.0.rx_int_delay: 0
dev.em.0.tx_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_abs_int_delay: 66
dev.em.0.rx_processing_limit: 100
dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 6.9.6
dev.em.1.%driver: em
dev.em.1.%location: slot=0 function=0
dev.em.1.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9 subdevice=0x108c
class=0x020000
dev.em.1.%parent: pci14
dev.em.1.debug: -1
dev.em.1.stats: -1
dev.em.1.rx_int_delay: 0
dev.em.1.tx_int_delay: 66
dev.em.1.rx_abs_int_delay: 66
dev.em.1.tx_abs_int_delay: 66
dev.em.1.rx_processing_limit: 100

kenv:

rand# kenv | grep smbios | egrep -v 'socket|serial|uuid|tag|0123456789'
smbios.bios.reldate="03/05/2008"
smbios.bios.vendor="Phoenix Technologies LTD"
smbios.bios.version="6.00"
smbios.chassis.maker="Supermicro"
smbios.planar.maker="Supermicro"
smbios.planar.product="PDSMi "
smbios.planar.version="PCB Version"
smbios.system.maker="Supermicro"
smbios.system.product="PDSMi"


The system is not yet production, so I can invasively abuse it if needed.  The
other systems are in production under 6.3-RELEASE-p13 and can also be inspected.

Any pointers appreciated.

Royce

Jeremy Chadwick

2009-Nov-12 20:47 UTC

head link

82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)

On Thu, Nov 12, 2009 at 10:36:16AM -0900, Royce Williams
wrote:> We have servers with dual 82573 NICs that work well during low-throughput
activity, but during high-volume activity, they pause shortly after transfers
start and do not recover.  Other sessions to the system are not affected.
Please define "low-throughput" and "high-volume" if you
could; it might
help folks determine where the threshold is for problems.
> These systems are being repurposed, jumping from 6.3 to 7.2.  The same
system and its kin do not exhibit the symptom under 6.3-RELEASE-p13.  The
symptoms appear under freebsd-updated 7.2-RELEASE GENERIC kernel with no tuning.
> 
> Previously, we've been using DCGDIS.EXE (from Jack Vogel) for this
symptom.  The first system to be repurposed accepts DCGDIS with
'Updated' and subsequent 'update not needed', with no relief.
> 
> Notably, there are no watchdog timeout errors - unlike our various
Supermicro models still running FreeBSD 6.x.  All of our other 7.x Supermicro
flavors had already received the flash update and haven't show the symptom.
> 
> Details follow.
> 
> Kernel:
> 
> rand# uname -a
> FreeBSD rand.acsalaska.net 7.2-RELEASE-p4 FreeBSD 7.2-RELEASE-p4 #0: Fri
Oct  2 12:21:39 UTC 2009    
root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  i386
> 
> sysctls:
> 
> rand# sysctl dev.em
> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 6.9.6
> dev.em.0.%driver: em
> dev.em.0.%location: slot=0 function=0
> dev.em.0.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9
subdevice=0x108c class=0x020000
> dev.em.0.%parent: pci13
> dev.em.0.debug: -1
> dev.em.0.stats: -1
> dev.em.0.rx_int_delay: 0
> dev.em.0.tx_int_delay: 66
> dev.em.0.rx_abs_int_delay: 66
> dev.em.0.tx_abs_int_delay: 66
> dev.em.0.rx_processing_limit: 100
> dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 6.9.6
> dev.em.1.%driver: em
> dev.em.1.%location: slot=0 function=0
> dev.em.1.%pnpinfo: vendor=0x8086 device=0x108c subvendor=0x15d9
subdevice=0x108c class=0x020000
> dev.em.1.%parent: pci14
> dev.em.1.debug: -1
> dev.em.1.stats: -1
> dev.em.1.rx_int_delay: 0
> dev.em.1.tx_int_delay: 66
> dev.em.1.rx_abs_int_delay: 66
> dev.em.1.tx_abs_int_delay: 66
> dev.em.1.rx_processing_limit: 100
> 
> kenv:
> 
> rand# kenv | grep smbios | egrep -v
'socket|serial|uuid|tag|0123456789'
> smbios.bios.reldate="03/05/2008"
> smbios.bios.vendor="Phoenix Technologies LTD"
> smbios.bios.version="6.00"
> smbios.chassis.maker="Supermicro"
> smbios.planar.maker="Supermicro"
> smbios.planar.product="PDSMi "
> smbios.planar.version="PCB Version"
> smbios.system.maker="Supermicro"
> smbios.system.product="PDSMi"
> 
> 
> The system is not yet production, so I can invasively abuse it if needed. 
The other systems are in production under 6.3-RELEASE-p13 and can also be
inspected.
> 
> Any pointers appreciated.
> 
> Royce
For what it's worth as a comparison base:

We use the following Supermicro SuperServers, and can confirm that no
such issues occur for us using RELENG_6 nor RELENG_7 on the following
hardware:

Supermicro SuperServer 5015B-MTB - amd64 - Intel 82573V + Intel 82573L
Supermicro SuperServer 5015M-T+B - amd64 - Intel 82573V + Intel 82573L
Supermicro SuperServer 5015M-T+B - amd64 - Intel 82573V + Intel 82573L
Supermicro SuperServer 5015M-T+B - i386  - Intel 82573V + Intel 82573L
Supermicro SuperServer 5015M-T+B - i386  - Intel 82573V + Intel 82573L

The 5015B-MTB system presently runs RELENG_8 -- no issues there either.

Relevant server configuration and network setup details:

- All machines use pf(4).
- All emX devices are configured for autoneg.
- All emX devices use RXCSUM, TXCSUM, and TSO4.
- We do not use polling.
- All machines use both NICs simultaneously at all times.
- All machines connected to an HP ProCurve 2626 switch (100mbit,
  full-duplex ports, all autoneg).
- We do not use Jumbo frames.
- No add-in cards (PCI, PCI-X, nor PCIe) are used in the systems.
- All of the systems had DCGDIS.EXE run on them; no EEPROM settings
  were changed, indicating the from-the-Intel-factory MANC register
  in question was set properly.

Relevant throughput details per box:

- em0 pushes ~600-1000kbit/sec at all times.
- em1 pushes ~100-200kbit/sec at all times.
- During nightly maintenance (backups), em1 pushes ~2-3mbit/sec
  for a variable amount of time.
- For a full level 0 backup (which I've done numerous times), em1
  pushes 60-70mbit/sec without issues.

I've compared your sysctl dev.em output to that of our 5015M-T+B systems
(which use the PDSMi+, not the PDSMi, but whatever), and ours is 100%
identical.

All of our 5015M-T+B systems are using BIOS 1.3, and the 5015B-MTB
system is using BIOS 1.30.

If you'd like, I can provide the exact BIOS settings we use on the
machines in question; they do deviate from the factory defaults a slight
bit, but none of the adjustments are "tweaks" for performance or
otherwise (just disabling things which we don't use, etc.).

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |

freebsd stable - Nov 2009 - 82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)

82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)

82573 xfers pause, no watchdog timeouts, DCGDIS ineffective (7.2-R)