thr3ads.net - freebsd stable - RELENG_7_1: bce driver change generating too much interrupts ? [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Geoffroy Desvernay

2008-Dec-02 02:13 UTC

RELENG_7_1: bce driver change generating too much interrupts ?

Since last upgrade, I see much more CPU time "eated" by interrupts (at
least 10% cpu in top)
(see http://dgeo.perso.ec-marseille.fr/cpu-week.png)

The server behave correctly (Or seems to?), and high interrupt number
seems to come from bce cards (source: systat -vmstat)

I just upgraded from
"RELENG_7 Mon Sep  8 12:33:06 CEST 2008"
to
"RELENG_7_1 Sat Nov 29 16:20:35 CET 2008"

We have the same machine (dell PE 1950) which have not been upgraded
(production use - the two machine are carp(4)-redundant)

I don't know if it is related to "SVN rev 184826 on 2008-11-10
22:40:16Z
by delphij" patch to sys/dev/bce/if_bce.c


If I can help debugging something? These are production machines, but I
may test patches or ? on the faulty system.



Some clues:

Under the very same load (carp interfaces down on other machine), vmstat
shows:
for newer system:

 procs      memory      page                   disk   faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr mf0   in   sy   cs us
sy id
 0 1 1   4806M   460M   649   0   0   0   582   2   0 21770 1270 13653
1 15 85

and for older:

 procs      memory      page                   disk   faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr mf0   in   sy   cs us
sy id
 0 1 0   3694M   414M   236   0   0   0   199  17   0  286  317  386  1
 1 97


bce-related part of dmesg for the newer system:

bce0: <Broadcom NetXtreme II BCM5708 1000Base-T (B2)> mem
0xf4000000-0xf5ffffff irq 16 at device 0.0 on pci9
miibus0: <MII bus> on bce0
bce0: Ethernet address: 00:15:c5:f1:56:f4
bce0: [ITHREAD]
bce0: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W
(0x02090105); Flags( SPLT MFW MSI )
bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B2)> mem
0xf8000000-0xf9ffffff irq 16 at device 0.0 on pci5
miibus1: <MII bus> on bce1
bce1: Ethernet address: 00:15:c5:f1:56:f2
bce1: [ITHREAD]
bce1: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W
(0x02090105); Flags( SPLT MFW MSI )

And on the older system:

bce0: <Broadcom NetXtreme II BCM5708 1000Base-T (B2)> mem
0xf4000000-0xf5ffffff irq 16 at device 0.0 on pci9
miibus0: <MII bus> on bce0
bce0: Ethernet address: 00:15:c5:f1:6a:47
bce0: [ITHREAD]
bce0: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W
(0x02090105); Flags( MFW MSI )
bce1: <Broadcom NetXtreme II BCM5708 1000Base-T (B2)> mem
0xf8000000-0xf9ffffff irq 16 at device 0.0 on pci5
miibus1: <MII bus> on bce1
bce1: Ethernet address: 00:15:c5:f1:6a:45
bce1: [ITHREAD]
bce1: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); F/W
(0x02090105); Flags( MFW MSI )

-- 
Geoffroy Desvernay
Ecole Centrale de Marseille

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20081202/b9fe1c12/signature.pgp

Mike Jakubik

2008-Dec-02 08:01 UTC

head link

RELENG_7_1: bce driver change generating too much interrupts ?

On Tue, December 2, 2008 4:57 am, Geoffroy Desvernay
wrote:> Since last upgrade, I see much more CPU time "eated" by
interrupts (at
> least 10% cpu in top)
> (see http://dgeo.perso.ec-marseille.fr/cpu-week.png)
I am also seeing the same behavior on a farm of Dell servers.

root@web.local:~# vmstat -i
interrupt                          total       rate
irq1: atkbd0                          18          0
irq14: ata0                          176          0
irq16: mfi0                        67924          1
irq20: uhci1 uhci3                     1          0
irq21: uhci0 uhci+                     5          0
cpu0: timer                    132244117       1997
irq257: bce1                  3366039632      50853
cpu1: timer                    132244053       1997
cpu2: timer                    132244053       1997
cpu3: timer                    132244053       1997
Total                         3895084032      58846

Not only this, but i have also noticed that there are a number of errors
reported by netstat now. before the drivers update, i would not get these
errors.

root@web.local:~# netstat -i
Name    Mtu Network       Address              Ipkts Ierrs    Opkts Oerrs 
Coll
   0
bce1   1500 <Link#2>      00:1e:c9:b5:cc:b6  1848959  2197  1357031     0 
   0

Dmitry Sivachenko

2008-Dec-03 01:03 UTC

head link

RELENG_7_1: bce driver change generating too much interrupts ?

On Tue, Dec 02, 2008 at 04:44:46PM -0800, Xin LI wrote:> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi guys,
> 
> I think I got a real fix.
> 

I tried that patch with very recent 7-STABLE.
I does fix the problem for me.


Thanks a lot!


> Cheers,
> - --
> Xin LI <delphij@delphij.net>	http://www.delphij.net/
> FreeBSD - The Power to Serve!
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (FreeBSD)
> 
> iEYEARECAAYFAkk11n0ACgkQi+vbBBjt66Dy6wCfSl3eLRhj5TVs24Q+8ao5Mcz0
> FNQAoK8KvziiXFoanhSlWv636o+HfYIj
> =AixC
> -----END PGP SIGNATURE-----
> Index: if_bce.c
> ==================================================================> ---
if_bce.c	(revision 185565)
> +++ if_bce.c	(working copy)
> @@ -7030,13 +7030,14 @@
>  
>  		/* Was it a link change interrupt? */
>  		if ((status_attn_bits & STATUS_ATTN_BITS_LINK_STATE) !> -		
(sc->status_block->status_attn_bits_ack &
STATUS_ATTN_BITS_LINK_STATE))
> +			(sc->status_block->status_attn_bits_ack &
STATUS_ATTN_BITS_LINK_STATE)) {
>  			bce_phy_intr(sc);
>  
> -		/* Clear any transient status updates during link state change. */
> -		REG_WR(sc, BCE_HC_COMMAND,
> -			sc->hc_command | BCE_HC_COMMAND_COAL_NOW_WO_INT);
> -		REG_RD(sc, BCE_HC_COMMAND);
> +			/* Clear any transient status updates during link state change. */
> +			REG_WR(sc, BCE_HC_COMMAND,
> +				sc->hc_command | BCE_HC_COMMAND_COAL_NOW_WO_INT);
> +			REG_RD(sc, BCE_HC_COMMAND);
> +		}
>  
>  		/* If any other attention is asserted then the chip is toast. */
>  		if (((status_attn_bits & ~STATUS_ATTN_BITS_LINK_STATE) !=

Mike Jakubik

2008-Dec-03 07:48 UTC

head link

RELENG_7_1: bce driver change generating too much interrupts ?

On Wed, December 3, 2008 3:27 am, Dmitry Sivachenko
wrote:> On Tue, Dec 02, 2008 at 04:44:46PM -0800, Xin LI wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi guys,
>>
>> I think I got a real fix.
>>
>
>
> I tried that patch with very recent 7-STABLE.
> I does fix the problem for me.
Good to hear. I will have to wait a few days before i update the code as
these systems are in production.

Thanks guys.

geoffroy desvernay

2008-Dec-03 13:23 UTC

head link

RELENG_7_1: bce driver change generating too much interrupts ?

Xin LI a ?crit :> Hi guys,
> 
> I think I got a real fix.
> It seems to "work for me?" too

Server under normal charge (smtp/imap/Maildir for ~1000 users, NFS
filer), everything seems ok... (1h uptime for now)

Thank you !
-- 
geoffroy desvernay

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: OpenPGP digital signature
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20081203/2fd06398/signature.pgp

Xin LI

2008-Dec-05 13:40 UTC

head link

RELENG_7_1: bce driver change generating too much interrupts ?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

FYI, I have committed the patch as r185653 (stable/7) and r185654
(releng/7.1) so new build would get this issue fixed.  Thanks goes to
David who gave review for the changes and all who tested the earlier
patches.

Cheers,
- --
Xin LI <delphij@delphij.net>	http://www.delphij.net/
FreeBSD - The Power to Serve!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAkk5n6YACgkQi+vbBBjt66BToACfTp+1hqno30HTpNfcvMn7SpAF
6XoAn1St590CMK2Lz9jLwlnTLDKGW8cV
=/FVN
-----END PGP SIGNATURE-----

Oleg Gorokhov

2008-Dec-08 01:53 UTC

head link

RELENG_7_1: bce driver change generating too much interrupts ?

This patch committed fixes the issue reported earlier with interruptions 
but there is one more problem discussed here:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2008-11/msg00144.html

We also have observed similar bad behavior for network (especially 
ssl-based) operations: imaps, ssh and smtp starttls connections - all of 
them were failed to establish after a day of successful operation:

Dec  7 23:32:41 imaps[62530]: accepted connection
Dec  7 23:32:41 imaps[62530]: SSL_accept() incomplete -> wait
Dec  7 23:32:41 imaps[62530]: wrong version number in SSL_accept() -> fail
Dec  7 23:32:41 master[3930]: process 62530 exited, status 75
Dec  7 23:32:41 master[3930]: service imaps pid 62530 in BUSY state: 
terminated abnormally

Dec  7 23:39:26 imaps[91999]: SSL_accept() incomplete -> wait
Dec  7 23:39:26 imaps[91999]: decryption failed or bad record mac in 
SSL_accept() -> fail

Dec  7 23:32:44 lmtp[77715]: [lmtpd] STARTTLS failed: gamgee.yandex.ru 
[77.88.19.54]

We have reverted back to stable before the last bce driver update was 
commited to releng branch and now hope that the system should run as 
expected.

Xin LI wrote:> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> FYI, I have committed the patch as r185653 (stable/7) and r185654
> (releng/7.1) so new build would get this issue fixed.  Thanks goes to
> David who gave review for the changes and all who tested the earlier
> patches.
> 
> Cheers,
> - --
> Xin LI <delphij@delphij.net>	http://www.delphij.net/
> FreeBSD - The Power to Serve!
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.9 (FreeBSD)
> 
> iEYEARECAAYFAkk5n6YACgkQi+vbBBjt66BToACfTp+1hqno30HTpNfcvMn7SpAF
> 6XoAn1St590CMK2Lz9jLwlnTLDKGW8cV
> =/FVN
> -----END PGP SIGNATURE-----
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"
-- 
Oleg Gorokhov
System Administrator, Yandex
Tel.: +7 (495) 739-7000 (+7166)

Mike Jakubik

2008-Dec-08 11:46 UTC

head link

RELENG_7_1: bce driver change generating too much interrupts ?

On Mon, December 8, 2008 4:29 am, Oleg Gorokhov wrote:> This patch committed fixes the issue reported earlier with interruptions
> but there is one more problem discussed here:
>
>
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2008-11/msg00144.html
>
> We also have observed similar bad behavior for network (especially
> ssl-based) operations: imaps, ssh and smtp starttls connections - all of
> them were failed to establish after a day of successful operation:
>
I wonder if my problem is related to this. I have a java chat service
application that starts dropping connections after about 4 days of uptime.
There is nothing in the applications logs, and i know this works fine on
Linux. Will try updating to the latest bce patch tonight to see if it
helps.

Danny Braniss

2008-Dec-18 02:49 UTC

head link

RELENG_7_1: bce driver change generating too much interrupts ?

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi, Nawfal,
> 
> Nawfal bin Mohmad Rouyan wrote:
> > I have been using a Dell machine with 2 bce interfaces as a bridge
> > between my LAN and Firewall to shape the traffic. Since after the
> > update, the machine can only run for a few minutes and after that no
> > more connection can go through.
> > 
> > Ping from LAN to Internet is OK but when I telnet say to www.yahoo.com
> > at port 80 and issue "GET / HTTP/1.0" I can see the data of
different
> > application including the HTML text.
> > 
> > For example, I can see uTorrent packets with binaries and also the
HTML
> > page being cut short. It's as if, I'm seeing packets jumbled
together
> > from different application.
> > 
> > I'm using PF to shape the traffic. If I reboot the server, it will
panic
> > and I have about 3 different vmcores in /var/crash and not sure what
to
> > do with it :( . I've tested the patch to remove
> > stat_IfInFramesL2FilterDiscards but the problem still occurs.
> 
> The last patch is not a functional change, but a behavior change that
> removes the L2FilterDiscards from being counted to match previous behavior.
> 
> Would you please do this:
> 
> script bt.txt kgdb /boot/kernel/kernel.symbols /var/crash/vmcore.0
> 
> Then, do 'bt', press enter until all display has finished, then
exit
> kgdb, and send me the result (bt.txt)?
> 
> > As for now, I'm not using the server to shape the traffic because
I
> > suspect the driver isn't reliable. I'm going to revert back to
the
> > previous driver and hopes its going to work.
> > 
> > Sorry if there is not much detail since I'm not sure what to
provide.
> > Just tell me what to provide and I'd be happy to do so.
I don't know if the following is related, but:
- while stress testing nfs/zfs, I get many weird things on the server 
(dell-2950/bce)
example:
	impossible packet length (33555456) from nfs server fr-01:/vol/system/share
	impossible packet length (1792323116) from nfs server fr-01:/vol/system/share
	...
and things get worse soon after. Now, there are no input errors, so it seems 
some memory starvation are not properly handled ...

cheers,
	danny

Reasonably Related Threads

Search for more reasonably related threads

freebsd stable - Dec 2008 - RELENG_7_1: bce driver change generating too much interrupts ?

RELENG_7_1: bce driver change generating too much interrupts ?

RELENG_7_1: bce driver change generating too much interrupts ?

RELENG_7_1: bce driver change generating too much interrupts ?

RELENG_7_1: bce driver change generating too much interrupts ?

RELENG_7_1: bce driver change generating too much interrupts ?

RELENG_7_1: bce driver change generating too much interrupts ?

RELENG_7_1: bce driver change generating too much interrupts ?

RELENG_7_1: bce driver change generating too much interrupts ?

RELENG_7_1: bce driver change generating too much interrupts ?

Reasonably Related Threads