Good day I'm looking for suggestions for tuning my setup in order to get rid of the input errors I'm seeing on em0, em1 and em2 when using vlans. [This message (excluding the description of the second machine at the end) has also been sent to the freebsd-net mailing list, a few days ago] I'm running a FreeBSD 7.2 router and I am seeing a lot of input errors on one of the em interfaces (em0), coupled with (at approximately the same times) much fewer errors on em1 and em2. Monitoring is done with SNMP from another machine, and the CPU load as reported via SNMP is mostly below 30%, with a couple of spikes up to 35%. Software description: - FreeBSD 7.2-RELEASE-p2, amd64 - bsnmpd with modules: hostres and (from ports) snmp_ucd - quagga 0.99.12 (running only zebra and bgpd) - netgraph (ng_ether and ng_netflow) Hardware description: - Dell machine, dual Xeon 3.20 GHz, 4 GB RAM - 2 x built-in gigabit interfaces (em0, em1) - 1 x dual-port gigabit interface, PCI-X (em2, em3) [see pciconf near the end] The machine receives the global routing table ("netstat -nr | wc -l" gives 289115 currently). All of the em interfaces are just configured "up", with various vlan interfaces on them. Note that I use "kpps" to mean "thousands of packets per second", sorry if that's the wrong shorthand. - em0 sees a traffic of 10...22 kpps in, and 15...35 kpps out. In bits, it's 30...120Mbits/s in, and 100...210Mbits/s out. Vlans configured are vlan100 and vlan200, and most of the traffic is on vlan100 (vlan200 sees 4kpps in / 0.5kpps out maximum, with the average at about one third of this). em0 is the external interface, and its traffic corresponds to the sum of traffic through em1 and em2 - em1 has 5 vlans, and sees about 22kpps in / 11kpps out (maximum) - em2 has a single VLAN, and sees about 4...13kpps both in and out (almost equal in/out during most of the day) - em3 is a backup interface, with 2 VLANS, and is the only one which has seen no errors. Only the vlans on em0 are analyzed by ng_netflow, and the errors I'm seeing have started appearing days before netgraph was even loaded in the kernel. Tuning done: /boot/loader.conf: hw.em.rxd=4096 hw.em.txd=4096 Witout the above we were seeing way more errors, now they are reduced, but still come in bursts of over 1000 errors on em0. /etc/sysctl.conf: net.inet.ip.fastforwarding=1 dev.em.0.rx_processing_limit=300 dev.em.1.rx_processing_limit=300 dev.em.2.rx_processing_limit=300 dev.em.3.rx_processing_limit=300 Still seeing errros, after some searching the mailing lists we also added: # the four lines below are repeated for em1, em2, em3 dev.em.0.rx_int_delay=0 dev.em.0.rx_abs_int_delay=0 dev.em.0.tx_int_delay=0 dev.em.0.tx_abs_int_delay=0 Still getting errors, so I also added: net.inet.ip.intr_queue_maxlen=4096 net.route.netisr_maxqlen=1024 and kern.ipc.nmbclusters=655360 Also tried with rx_processing_limit set to -1 on all em interfaces, still getting errors. Looking at the shape of the error and packet graphs, there seems to be a correlation between the number of packets per second on em0 and the height of the error "spikes" on the error graph. These spikes are spread throughout the day, with spaces (zones with no errors) of various lengths (10 minutes ... 2 hours spaces within the last 24 hours), but sometimes there are errors even in the lowest kpps times of the day. em0 and em1 error times are correlated, with all errors on the graph for em0 having a smaller corresponding error spike on em1 at the same time, and sometimes an error spike on em2. The old router was seeing about the same traffic, and had em0, em1, re0 and re1 network cards, and was only seeing errors on the em cards. It was running 7.2-PRERELEASE/i386 Any suggestions would be greatly appreciated. Please note that this is a live router, and I can't reboot it (unless absolutely necessary). Tuning that can be applied without rebooting will be tried first. Here are some more details: Trimmed output of netstat -ni (sorry if there are line breaks): Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll em0 1500 <Link#1> 00:14:22:xx:xx:xx 19744458839 15494721 24284439443 0 0 em1 1500 <Link#2> 00:14:22:xx:xx:xx 12832245469 123181 10105031790 0 0 em2 1500 <Link#3> 00:04:23:xx:xx:xx 12082552403 10964 10339416865 0 0 em3 1500 <Link#4> 00:04:23:xx:xx:xx 79912337 0 48178737 0 0 Relevant part of pciconf -vl: em0@pci0:6:7:0: class=0x020000 card=0x016d1028 chip=0x10768086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '82541EI Gigabit Ethernet Controller' class = network subclass = ethernet em1@pci0:7:8:0: class=0x020000 card=0x016d1028 chip=0x10768086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '82541EI Gigabit Ethernet Controller' class = network subclass = ethernet em2@pci0:9:4:0: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' class = network subclass = ethernet em3@pci0:9:4:1: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' class = network subclass = ethernet Kernel messages after sysctl dev.em.0.stats=1: (note that I've removed the lines which only showed zeros in the second and third outputs) em0: Excessive collisions = 0 em0: Sequence errors = 0 em0: Defer count = 0 em0: Missed Packets = 15435312 em0: Receive No Buffers = 16446113 em0: Receive Length Errors = 0 em0: Receive errors = 1 em0: Crc errors = 2 em0: Alignment errors = 0 em0: Collision/Carrier extension errors = 0 em0: RX overruns = 96826 em0: watchdog timeouts = 0 em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 em0: XON Rcvd = 0 em0: XON Xmtd = 0 em0: XOFF Rcvd = 0 em0: XOFF Xmtd = 0 em0: Good Packets Rcvd = 19002068797 em0: Good Packets Xmtd = 23168462599 em0: TSO Contexts Xmtd = 0 em0: TSO Contexts Failed = 0 [later] em0: Excessive collisions = 0 em0: Missed Packets = 15459111 em0: Receive No Buffers = 16447082 em0: Receive errors = 1 em0: Crc errors = 2 em0: RX overruns = 96835 em0: Good Packets Rcvd = 19165047284 em0: Good Packets Xmtd = 23386976960 [later] em0: Excessive collisions = 0 em0: Missed Packets = 15470583 em0: Receive No Buffers = 16447686 em0: Receive errors = 1 em0: Crc errors = 2 em0: RX overruns = 96840 em0: Good Packets Rcvd = 19255466068 em0: Good Packets Xmtd = 23519004546 Machine #2 I'm also seeing input errors on another machine, a Core 2 Duo E8200 CPU with 2 em cards, this time connected via PCI Express. This machine handles less traffic, and errors mostly appear on em0, which does NOT use vlans. All of the traffic goes on to em1 which does use vlans, but has recorded about 10% of the errors on em0. Also, netgraph is not used at all on this machine. Only hw.em.rxd, hw.em.txd and the dev.em.*.rx_processing_limit tunables were set for this machine. Relevant info for machine #2: Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll em0 1500 <Link#1> 00:1b:21:xx:xx:xx 3095638890 12762 2604519812 0 0 em1 1500 <Link#2> 00:1b:21:xx:xx:xx 2608953742 1636 2998185465 0 0 em0@pci0:4:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82572EI PRO/1000 PT Desktop Adapter (Copper)' class = network subclass = ethernet em1@pci0:3:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82572EI PRO/1000 PT Desktop Adapter (Copper)' class = network subclass = ethernet em0: Excessive collisions = 0 em0: Sequence errors = 0 em0: Defer count = 402 em0: Missed Packets = 12762 em0: Receive No Buffers = 0 em0: Receive Length Errors = 0 em0: Receive errors = 0 em0: Crc errors = 0 em0: Alignment errors = 0 em0: Collision/Carrier extension errors = 0 em0: RX overruns = 237 em0: watchdog timeouts = 0 em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 em0: XON Rcvd = 249 em0: XON Xmtd = 244 em0: XOFF Rcvd = 402 em0: XOFF Xmtd = 261 em0: Good Packets Rcvd = 3092053709 em0: Good Packets Xmtd = 2622962119 em0: TSO Contexts Xmtd = 12760095 em0: TSO Contexts Failed = 0 Thank you for your time.
I had similar problem with bsnmpd + mibII module running. v.
alexpalias-bsdstable@yahoo.com wrote:> Good day > > I'm looking for suggestions for tuning my setup in order to get rid of the input errors I'm seeing on em0, em1 and em2 when using vlans. > > [This message (excluding the description of the second machine at the end) has also been sent to the freebsd-net mailing list, a few days ago] > > I'm running a FreeBSD 7.2 router and I am seeing a lot of input errors on one of the em interfaces (em0), coupled with (at approximately the same times) much fewer errors on em1 and em2. Monitoring is done with SNMP from another machine, and the CPU load as reported via SNMP is mostly below 30%, with a couple of spikes up to 35%. > > Software description: > > - FreeBSD 7.2-RELEASE-p2, amd64 > - bsnmpd with modules: hostres and (from ports) snmp_ucd > - quagga 0.99.12 (running only zebra and bgpd) > - netgraph (ng_ether and ng_netflow) > > Hardware description: > > - Dell machine, dual Xeon 3.20 GHz, 4 GB RAM > - 2 x built-in gigabit interfaces (em0, em1) > - 1 x dual-port gigabit interface, PCI-X (em2, em3) [see pciconf near the end] > > > The machine receives the global routing table ("netstat -nr | wc -l" gives 289115 currently). > > All of the em interfaces are just configured "up", with various vlan interfaces on them. Note that I use "kpps" to mean "thousands of packets per second", sorry if that's the wrong shorthand. > > - em0 sees a traffic of 10...22 kpps in, and 15...35 kpps out. In bits, it's 30...120Mbits/s in, and 100...210Mbits/s out. Vlans configured are vlan100 and vlan200, and most of the traffic is on vlan100 (vlan200 sees 4kpps in / 0.5kpps out maximum, with the average at about one third of this). em0 is the external interface, and its traffic corresponds to the sum of traffic through em1 and em2 > > - em1 has 5 vlans, and sees about 22kpps in / 11kpps out (maximum) > > - em2 has a single VLAN, and sees about 4...13kpps both in and out (almost equal in/out during most of the day) > > - em3 is a backup interface, with 2 VLANS, and is the only one which has seen no errors. > > Only the vlans on em0 are analyzed by ng_netflow, and the errors I'm seeing have started appearing days before netgraph was even loaded in the kernel. > > Tuning done: > > /boot/loader.conf: > hw.em.rxd=4096 > hw.em.txd=4096 > > Witout the above we were seeing way more errors, now they are reduced, but still come in bursts of over 1000 errors on em0. > > /etc/sysctl.conf: > net.inet.ip.fastforwarding=1 > dev.em.0.rx_processing_limit=300 > dev.em.1.rx_processing_limit=300 > dev.em.2.rx_processing_limit=300 > dev.em.3.rx_processing_limit=300 > > Still seeing errros, after some searching the mailing lists we also added: > > # the four lines below are repeated for em1, em2, em3 > dev.em.0.rx_int_delay=0 > dev.em.0.rx_abs_int_delay=0 > dev.em.0.tx_int_delay=0 > dev.em.0.tx_abs_int_delay=0 > > Still getting errors, so I also added: > > net.inet.ip.intr_queue_maxlen=4096 > net.route.netisr_maxqlen=1024 > > and > > kern.ipc.nmbclusters=655360 > > > Also tried with rx_processing_limit set to -1 on all em interfaces, still getting errors. > > Looking at the shape of the error and packet graphs, there seems to be a correlation between the number of packets per second on em0 and the height of the error "spikes" on the error graph. These spikes are spread throughout the day, with spaces (zones with no errors) of various lengths (10 minutes ... 2 hours spaces within the last 24 hours), but sometimes there are errors even in the lowest kpps times of the day. > > em0 and em1 error times are correlated, with all errors on the graph for em0 having a smaller corresponding error spike on em1 at the same time, and sometimes an error spike on em2. > > The old router was seeing about the same traffic, and had em0, em1, re0 and re1 network cards, and was only seeing errors on the em cards. It was running 7.2-PRERELEASE/i386 > > > Any suggestions would be greatly appreciated. Please note that this is a live router, and I can't reboot it (unless absolutely necessary). Tuning that can be applied without rebooting will be tried first. > > Here are some more details: > > Trimmed output of netstat -ni (sorry if there are line breaks): > Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll > em0 1500 <Link#1> 00:14:22:xx:xx:xx 19744458839 15494721 24284439443 0 0 > em1 1500 <Link#2> 00:14:22:xx:xx:xx 12832245469 123181 10105031790 0 0 > em2 1500 <Link#3> 00:04:23:xx:xx:xx 12082552403 10964 10339416865 0 0 > em3 1500 <Link#4> 00:04:23:xx:xx:xx 79912337 0 48178737 0 0 > > Relevant part of pciconf -vl: > > em0@pci0:6:7:0: class=0x020000 card=0x016d1028 chip=0x10768086 rev=0x05 hdr=0x00 > vendor = 'Intel Corporation' > device = '82541EI Gigabit Ethernet Controller' > class = network > subclass = ethernet > em1@pci0:7:8:0: class=0x020000 card=0x016d1028 chip=0x10768086 rev=0x05 hdr=0x00 > vendor = 'Intel Corporation' > device = '82541EI Gigabit Ethernet Controller' > class = network > subclass = ethernet > em2@pci0:9:4:0: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' > class = network > subclass = ethernet > em3@pci0:9:4:1: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' > class = network > subclass = ethernet > > Kernel messages after sysctl dev.em.0.stats=1: > (note that I've removed the lines which only showed zeros in the second and third outputs) > > em0: Excessive collisions = 0 > em0: Sequence errors = 0 > em0: Defer count = 0 > em0: Missed Packets = 15435312 > em0: Receive No Buffers = 16446113 > em0: Receive Length Errors = 0 > em0: Receive errors = 1 > em0: Crc errors = 2 > em0: Alignment errors = 0 > em0: Collision/Carrier extension errors = 0 > em0: RX overruns = 96826 > em0: watchdog timeouts = 0 > em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 > em0: XON Rcvd = 0 > em0: XON Xmtd = 0 > em0: XOFF Rcvd = 0 > em0: XOFF Xmtd = 0 > em0: Good Packets Rcvd = 19002068797 > em0: Good Packets Xmtd = 23168462599 > em0: TSO Contexts Xmtd = 0 > em0: TSO Contexts Failed = 0 > > [later] > em0: Excessive collisions = 0 > em0: Missed Packets = 15459111 > em0: Receive No Buffers = 16447082 > em0: Receive errors = 1 > em0: Crc errors = 2 > em0: RX overruns = 96835 > em0: Good Packets Rcvd = 19165047284 > em0: Good Packets Xmtd = 23386976960 > > [later] > em0: Excessive collisions = 0 > em0: Missed Packets = 15470583 > em0: Receive No Buffers = 16447686 > em0: Receive errors = 1 > em0: Crc errors = 2 > em0: RX overruns = 96840 > em0: Good Packets Rcvd = 19255466068 > em0: Good Packets Xmtd = 23519004546 > > > Machine #2 > > I'm also seeing input errors on another machine, a Core 2 Duo E8200 CPU with 2 em cards, this time connected via PCI Express. This machine handles less traffic, and errors mostly appear on em0, which does NOT use vlans. All of the traffic goes on to em1 which does use vlans, but has recorded about 10% of the errors on em0. Also, netgraph is not used at all on this machine. Only hw.em.rxd, hw.em.txd and the dev.em.*.rx_processing_limit tunables were set for this machine. > > Relevant info for machine #2: > Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll > em0 1500 <Link#1> 00:1b:21:xx:xx:xx 3095638890 12762 2604519812 0 0 > em1 1500 <Link#2> 00:1b:21:xx:xx:xx 2608953742 1636 2998185465 0 0 > > em0@pci0:4:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00 > vendor = 'Intel Corporation' > device = '82572EI PRO/1000 PT Desktop Adapter (Copper)' > class = network > subclass = ethernet > em1@pci0:3:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00 > vendor = 'Intel Corporation' > device = '82572EI PRO/1000 PT Desktop Adapter (Copper)' > class = network > subclass = ethernet > > em0: Excessive collisions = 0 > em0: Sequence errors = 0 > em0: Defer count = 402 > em0: Missed Packets = 12762 > em0: Receive No Buffers = 0 > em0: Receive Length Errors = 0 > em0: Receive errors = 0 > em0: Crc errors = 0 > em0: Alignment errors = 0 > em0: Collision/Carrier extension errors = 0 > em0: RX overruns = 237 > em0: watchdog timeouts = 0 > em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 > em0: XON Rcvd = 249 > em0: XON Xmtd = 244 > em0: XOFF Rcvd = 402 > em0: XOFF Xmtd = 261 > em0: Good Packets Rcvd = 3092053709 > em0: Good Packets Xmtd = 2622962119 > em0: TSO Contexts Xmtd = 12760095 > em0: TSO Contexts Failed = 0 > > > Thank you for your time. > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >do you have enough mbufs? look for mbuf clusters value in netstat -m output. if 'current' is near to 'total' you'll have drops of input packets. and take a look to 'jumbo clusters' in netstat -m too. -- SY, Marat -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3221 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20090805/a07234bf/smime.bin
--- On Wed, 8/5/09, Vitezslav Novy <vnovy@vnovy.net> wrote:> From: Vitezslav Novy <vnovy@vnovy.net> > Subject: Re: em driver input errors > To: freebsd-stable@freebsd.org > Date: Wednesday, August 5, 2009, 10:48 PM > I had similar problem with bsnmpd + > mibII module running. > > v.Thanks for the pointer. Did you find a solution to this problem? Maybe I should try switching to net-snmpd or some other snmpd? Thanks Alex PS: I seem to remember seeing input errors on the other host (the one with em0 and em1 on different PCI-E slots) even before starting bsnmpd.
On Wed, 5 Aug 2009 00:30:20 -0700 (PDT) alexpalias-bsdstable@yahoo.com mentioned:> dev.em.0.rx_processing_limit=300 > dev.em.1.rx_processing_limit=300 > dev.em.2.rx_processing_limit=300 > dev.em.3.rx_processing_limit=300 >This tunables only affects polling mode. Do you use polling with this adapters or just standard interrupt-based mode? <.. snip ..>> em0: Excessive collisions = 0 > em0: Sequence errors = 0 > em0: Defer count = 402 > em0: Missed Packets = 12762 > em0: Receive No Buffers = 0 > em0: Receive Length Errors = 0 > em0: Receive errors = 0 > em0: Crc errors = 0 > em0: Alignment errors = 0 > em0: Collision/Carrier extension errors = 0 > em0: RX overruns = 237 > em0: watchdog timeouts = 0 > em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 > em0: XON Rcvd = 249 > em0: XON Xmtd = 244 > em0: XOFF Rcvd = 402 > em0: XOFF Xmtd = 261 > em0: Good Packets Rcvd = 3092053709 > em0: Good Packets Xmtd = 2622962119 > em0: TSO Contexts Xmtd = 12760095 > em0: TSO Contexts Failed = 0 >>From the output it looks like that the driver is unable to process theadapter input queue fast enough so it runs out of free descriptors. One of the easiest things you can try is to enable the polling mode and see if it improves the situation. You may want also to play with rx_processing_limit tunables in that case. The em(4) driver in HEAD also includes multiqueue processing fetaure so it is possible it will improve the situation for you too. -- Stanislav Sedov ST4096-RIPE -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20090806/01c866a9/attachment.pgp