thr3ads.net - freebsd stable - svn commit: r267935 - head/sys/dev/e1000 (with work around?) [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Rick Macklem

2014-Sep-12 23:33 UTC

svn commit: r267935 - head/sys/dev/e1000 (with work around?)

I wrote:> The patches are in 10.1. I thought his report said 10.0 in the message.
> 
> If Mike is running a recent stable/10 or releng/10.1, then it has been
> patched for this and NFS should work with TSO enabled. If it doesn't,
> then something else is broken.Oops, I looked and I see Mike was testing r270560 (which would have both
the patches). I don't have an explanation why TSO and 64K rsize, wsize
would cause a hang, but does appear it will exist in 10.1 unless it
gets resolved.

Mike, one difference is that, even with the patches the driver will be
copying the transmit mbuf list via m_defrag() to 32 MCLBYTE clusters
when using 64K rsize, wsize.
If you can reproduce the hang, you might want to look at how many mbuf
clusters are allocated. If you've hit the limit, then I think that
would explain it.

rick

Mike Tancsa

2014-Sep-13 00:52 UTC

head link

svn commit: r267935 - head/sys/dev/e1000 (with work around?)

On 9/12/2014 7:33 PM, Rick Macklem wrote:> I wrote:
>> The patches are in 10.1. I thought his report said 10.0 in the message.
>>
>> If Mike is running a recent stable/10 or releng/10.1, then it has been
>> patched for this and NFS should work with TSO enabled. If it
doesn't,
>> then something else is broken.
> Oops, I looked and I see Mike was testing r270560 (which would have both
> the patches). I don't have an explanation why TSO and 64K rsize, wsize
> would cause a hang, but does appear it will exist in 10.1 unless it
> gets resolved.
>
> Mike, one difference is that, even with the patches the driver will be
> copying the transmit mbuf list via m_defrag() to 32 MCLBYTE clusters
> when using 64K rsize, wsize.
> If you can reproduce the hang, you might want to look at how many mbuf
> clusters are allocated. If you've hit the limit, then I think that
> would explain it.

I have been running the test for a few hrs now and no lockups of the 
nic, so doing the nfs mount with -orsize=32768,wsize=32768 certainly 
seems to work around the lockup.   How do I check the mbuf clusters ?

root at backup3:/usr/home/mdtancsa # vmstat -z | grep -i clu
mbuf_cluster:          2048, 760054,    4444,     370, 3088708,   0,   0
root at backup3:/usr/home/mdtancsa #
root at backup3:/usr/home/mdtancsa # netstat -m
3322/4028/7350 mbufs in use (current/cache/total)
2826/1988/4814/760054 mbuf clusters in use (current/cache/total/max)
2430/1618 mbuf+clusters out of packet secondary zone in use (current/cache)
0/4/4/380026 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/112600 9k jumbo clusters in use (current/cache/total/max)
0/0/0/63337 16k jumbo clusters in use (current/cache/total/max)
6482K/4999K/11481K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
root at backup3:/usr/home/mdtancsa #

Interface is RUNNING and ACTIVE
em1: hw tdh = 343, hw tdt = 838
em1: hw rdh = 512, hw rdt = 511
em1: Tx Queue Status = 1
em1: TX descriptors avail = 516
em1: Tx Descriptors avail failure = 1
em1: RX discarded packets = 0
em1: RX Next to Check = 512
em1: RX Next to Refresh = 511


I just tested on the other em nic and I can wedge it as well, so its not 
limited to one particular type of em nic.


em0: Watchdog timeout -- resetting
em0: Queue(0) tdh = 349, hw tdt = 176
em0: TX(0) desc avail = 173,Next TX to Clean = 349
em0: link state changed to DOWN
em0: link state changed to UP

so it does not seem limited to just certain em nics

em0 at pci0:0:25:0:        class=0x020000 card=0x34ec8086 chip=0x10ef8086 
rev=0x05 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82578DM Gigabit Network Connection'
     class      = network
     subclass   = ethernet
     bar   [10] = type Memory, range 32, base 0xb1a00000, size 131072, 
enabled
     bar   [14] = type Memory, range 32, base 0xb1a25000, size 4096, enabled
     bar   [18] = type I/O Port, range 32, base 0x2040, size 32, enabled
     cap 01[c8] = powerspec 2  supports D0 D3  current D0
     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
     cap 13[e0] = PCI Advanced Features: FLR TP


I can lock things up fairly quickly by running these 2 scripts across an 
nfs mount.

#!/bin/sh

while true
do
  dd if=/dev/urandom ibs=64k count=1000 | pbzip2 -c -p3 > /mnt/test.bz2
  dd if=/dev/urandom ibs=63k count=1000 | pbzip2 -c -p3 > /mnt/test.bz2
  dd if=/dev/urandom ibs=66k count=1000 | pbzip2 -c -p3 > /mnt/test.bz2
done
root at backup3:/usr/home/mdtancsa # cat i3
#!/bin/sh

while true
do
dd if=/dev/zero of=/mnt/test2 bs=128k count=2000
sleep 10
done


	---Mike




-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike at sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/

freebsd stable - Sep 2014 - svn commit: r267935 - head/sys/dev/e1000 (with work around?)

svn commit: r267935 - head/sys/dev/e1000 (with work around?)

svn commit: r267935 - head/sys/dev/e1000 (with work around?)