Hi All, I have a box with 24 e1000 cards in it. They are configured as 12 bridges, each with 2 ports. I have found that when the total traffic on the box gets to around 100Mbps or so, it starts dropping packets. As far as I can tell, it's related to the amount of traffic per card, rather than the total throughput on the box. (I'm using 6 x 4-port ethernet cards.). My best guess at this point is that the receive buffers are either too small, or not being emptied quickly enough. (NAPI is enabled.). Can anybody give me any ideas on where to look for issues? And how can I change the size of the receive buffers? Is it just a kernel paramater, or do I need to re-compile? CPU utilisation is hovering around 50%, and load average is consistently under 0.1, so I don't beleive I'm looking at a CPU bottleneck. Any other ideas? Regards, Leigh Leigh Sharpe Network Systems Engineer Pacific Wireless Ph +61 3 9584 8966 Mob 0408 009 502 Helpdesk 1300 300 616 email lsharpe@pacificwireless.com.au <blocked::mailto:lsharpe@pacificwireless.com.au> web www.pacificwireless.com.au <blocked::http://www.pacificwireless.com.au/> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.linux-foundation.org/pipermail/bridge/attachments/20071113/33714005/attachment.htm
>Hi All,Hi,> I have a box with 24 e1000 cards in it. They are configured as 12 >bridges, each with 2 ports.24 ports of e1000 nics means 24 interrupts used (or shared). Maybe thats the source of the problem. Did you notice anything unusual in your logs concerning e1000 nics?>... >CPU utilisation is hovering around 50%, and load average is >consistently >under 0.1, so I don't beleive I'm looking at a CPU bottleneck.Is your box is multi-core (or HT-enabled)? Is your kernel SMP? If thats the case then check per core CPU utilisation (press "1" when watching top). You may be hitting roof only on one of the cores while avg. utilisation is around 50%. If you're not familiar with "smp_affinity", then you should read the following: http://bcr2.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt cheers, Marek Kierdelewicz KoBa ISP
>First, make sure you have enough bus bandwidth!Shouldn't a PCI bus be up to it? IIRC, PCI has a bus speed of 133MB/s. I'm only doing 100Mb/s of traffic, less than 1/8 of the bus speed. I don't have a PCI-X machine I can test this on at the moment.>Don't use kernel irq balancing, user space irqbalance daemon is smartI'll try that.>It would be useful to see what the kernel profiling (oprofile) shows.Abridged version as follows: CPU: P4 / Xeon, speed 2400.36 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 GLOBAL_POWER_E...| samples| %| ------------------ 65889602 40.3276 e1000 54306736 33.2383 ebtables 26076156 15.9598 vmlinux 4490657 2.7485 bridge 2532733 1.5502 sch_cbq 2411378 1.4759 libnetsnmp.so.9.0.1 2120668 1.2979 ide_core 1391944 0.8519 oprofiled -------------------------- (There's more, naturally, but I doubt it's very useful.)>How are you measuring CPU utilization?As reported by 'top'.>Andrew Morton wrote a cyclesoaker to do this, if you want it, I'll digit up. Please.>And the dual-port e1000's add a layer of PCI bridge that also hurtslatency/bandwidth. I need bypass-cards in this particular application, so I don't have much choice in the matter. Thanks, Leigh -----Original Message----- From: bridge-bounces@lists.linux-foundation.org [mailto:bridge-bounces@lists.linux-foundation.org] On Behalf Of Stephen Hemminger Sent: Wednesday, 14 November 2007 5:05 AM To: Marek Kierdelewicz Cc: bridge@lists.linux-foundation.org Subject: Re: [Bridge] Rx Buffer sizes on e1000 On Tue, 13 Nov 2007 10:12:03 +0100 Marek Kierdelewicz <marek@piasta.pl> wrote:> >Hi All, > > Hi, > > > I have a box with 24 e1000 cards in it. They are configured as 12 > >bridges, each with 2 ports. > > 24 ports of e1000 nics means 24 interrupts used (or shared). Maybe > thats the source of the problem. Did you notice anything unusual inyour> logs concerning e1000 nics? > > >... > >CPU utilisation is hovering around 50%, and load average is > >consistently > >under 0.1, so I don't beleive I'm looking at a CPU bottleneck. > > Is your box is multi-core (or HT-enabled)? Is your kernel SMP? Ifthats> the case then check per core CPU utilisation (press "1" when watching > top). You may be hitting roof only on one of the cores while avg. > utilisation is around 50%. If you're not familiar with "smp_affinity", > then you should read the following: > http://bcr2.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt > > cheers, > Marek Kierdelewicz > KoBa ISP > _______________________________________________ > Bridge mailing list > Bridge@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/bridgeFirst, make sure you have enough bus bandwidth! What kind of box, you really need PCI-express to get better bus throughput. MSI will also help. Memory speeds also matter. And the dual-port e1000's add a layer of PCI bridge that also hurts latency/bandwidth. Don't use kernel irq balancing, user space irqbalance daemon is smart enough to recognize network device's and do the right thing (assign them directly to processors). It would be useful to see what the kernel profiling (oprofile) shows. How are you measuring CPU utilization? The only accurate way is to measure time with an idle soaker program versus, time under load. Andrew Morton wrote a cyclesoaker to do this, if you want it, I'll dig it up. -- Stephen Hemminger <shemminger@linux-foundation.org> _______________________________________________ Bridge mailing list Bridge@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/bridge
Hi Steven,>You are using ebtables, so that adds a lot of overhead processing therules. The problem is that each packet means a CPU cache miss. What is the memory bus bandwidth of the Xeon's? I'll re-run that oprofile. The last set of tests I did was with ebtables disabled, and it was still dropping packets. Ultimately, however, I need ebtables (and tc) running. Memory is DDR333. Having installed irqbalance as you suggested, initial tests look promising.... Leigh. -----Original Message----- From: Stephen Hemminger [mailto:shemminger@linux-foundation.org] Sent: Wednesday, 14 November 2007 9:47 AM To: Leigh Sharpe Cc: bridge@lists.linux-foundation.org Subject: Re: [Bridge] Rx Buffer sizes on e1000 On Wed, 14 Nov 2007 09:24:18 +1100 "Leigh Sharpe" <lsharpe@pacificwireless.com.au> wrote:> >First, make sure you have enough bus bandwidth! > > Shouldn't a PCI bus be up to it? IIRC, PCI has a bus speed of 133MB/s. > I'm only doing 100Mb/s of traffic, less than 1/8 of the bus speed. I > don't have a PCI-X machine I can test this on at the moment.I find regular PCI bus (32bit) tops out at about 600 Mbits/sec on most machines. For PCI-X (64 bit/133) a realistic value is 6 Gbits/sec. The problem is arbitration and transfer sizes. Absolute limit is: PCI32 33MHz = 133MB/s PCI32 66MHz = 266MB/s PCI64 33MHz = 266MB/s PCI64 66MHz = 533MB/s PCI-X 133MHz = 1066MB/s That means for for normal PCI32, one gigabit card or 6 100Mbit Ethernet interfaces can saturate the bus. Also, all that I/O slows down the CPU and memory interface.> >Don't use kernel irq balancing, user space irqbalance daemon is smart > > I'll try that. > > >It would be useful to see what the kernel profiling (oprofile) shows. > > Abridged version as follows: > > CPU: P4 / Xeon, speed 2400.36 MHz (estimated) > Counted GLOBAL_POWER_EVENTS events (time during which processor is not > stopped) with a unit mask of 0x01 (mandatory) count 100000 > GLOBAL_POWER_E...| > samples| %| > ------------------ > 65889602 40.3276 e1000 > 54306736 33.2383 ebtables > 26076156 15.9598 vmlinux > 4490657 2.7485 bridge > 2532733 1.5502 sch_cbq > 2411378 1.4759 libnetsnmp.so.9.0.1 > 2120668 1.2979 ide_core > 1391944 0.8519 oprofiled >You are using ebtables, so that adds a lot of overhead processing the rules. The problem is that each packet means a CPU cache miss. What is the memory bus bandwidth of the Xeon's?> -------------------------- > (There's more, naturally, but I doubt it's very useful.) > > > >How are you measuring CPU utilization? > > As reported by 'top'. > > >Andrew Morton wrote a cyclesoaker to do this, if you want it, I'lldig> it up. > > Please. > > >And the dual-port e1000's add a layer of PCI bridge that also hurts > latency/bandwidth. > > I need bypass-cards in this particular application, so I don't havemuch> choice in the matter. > > Thanks, > Leigh-- Stephen Hemminger <shemminger@linux-foundation.org>
>I find regular PCI bus (32bit) tops out at about 600 Mbits/sec on most >machines. For PCI-X (64 bit/133) a realistic value is 6 Gbits/sec. The >problem is arbitration and transfer sizes.>That means for for normal PCI32, one gigabit card or >6 100Mbit Ethernet interfaces can saturate the bus. Also, all that >I/O slows down the CPU and memory interface.I'm seeing issues with only 110Mbits/sec of traffic. Ultimately I expect to be using far more than that, but I really didn't expect to be overloading the bus yet. Leigh. -----Original Message----- From: Stephen Hemminger [mailto:shemminger@linux-foundation.org] Sent: Wednesday, 14 November 2007 9:47 AM To: Leigh Sharpe Cc: bridge@lists.linux-foundation.org Subject: Re: [Bridge] Rx Buffer sizes on e1000 I find regular PCI bus (32bit) tops out at about 600 Mbits/sec on most machines. For PCI-X (64 bit/133) a realistic value is 6 Gbits/sec. The problem is arbitration and transfer sizes. Absolute limit is: PCI32 33MHz = 133MB/s PCI32 66MHz = 266MB/s PCI64 33MHz = 266MB/s PCI64 66MHz = 533MB/s PCI-X 133MHz = 1066MB/s That means for for normal PCI32, one gigabit card or 6 100Mbit Ethernet interfaces can saturate the bus. Also, all that I/O slows down the CPU and memory interface. -- Stephen Hemminger <shemminger@linux-foundation.org>
>I'll re-run that oprofile. The last set of tests I did was withebtables disabled, and it was still dropping packets. Ultimately, however, I need ebtables (and tc) running. And here it is. Ebtables is loaded but not running. ---- CPU: P4 / Xeon, speed 2400.15 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 GLOBAL_POWER_E...| samples| %| ------------------ 17230723 56.7305 e1000 8142695 26.8090 vmlinux 2439000 8.0302 bridge 1036770 3.4135 ip_tables 468590 1.5428 ebtables 215775 0.7104 oprofiled ---- This was with about 121Mbps of traffic on the entire box (11 bridges, 11Mbps on each of them), with irqbalance. The results are better with irqbalance, but still not quite there yet. -----Original Message----- From: bridge-bounces@lists.linux-foundation.org [mailto:bridge-bounces@lists.linux-foundation.org] On Behalf Of Leigh Sharpe Sent: Wednesday, 14 November 2007 10:11 AM To: Stephen Hemminger Cc: bridge@lists.linux-foundation.org Subject: RE: [Bridge] Rx Buffer sizes on e1000 Hi Steven,>You are using ebtables, so that adds a lot of overhead processing therules. The problem is that each packet means a CPU cache miss. What is the memory bus bandwidth of the Xeon's? I'll re-run that oprofile. The last set of tests I did was with ebtables disabled, and it was still dropping packets. Ultimately, however, I need ebtables (and tc) running. Memory is DDR333. Having installed irqbalance as you suggested, initial tests look promising.... Leigh. -----Original Message----- From: Stephen Hemminger [mailto:shemminger@linux-foundation.org] Sent: Wednesday, 14 November 2007 9:47 AM To: Leigh Sharpe Cc: bridge@lists.linux-foundation.org Subject: Re: [Bridge] Rx Buffer sizes on e1000 On Wed, 14 Nov 2007 09:24:18 +1100 "Leigh Sharpe" <lsharpe@pacificwireless.com.au> wrote:> >First, make sure you have enough bus bandwidth! > > Shouldn't a PCI bus be up to it? IIRC, PCI has a bus speed of 133MB/s. > I'm only doing 100Mb/s of traffic, less than 1/8 of the bus speed. I > don't have a PCI-X machine I can test this on at the moment.I find regular PCI bus (32bit) tops out at about 600 Mbits/sec on most machines. For PCI-X (64 bit/133) a realistic value is 6 Gbits/sec. The problem is arbitration and transfer sizes. Absolute limit is: PCI32 33MHz = 133MB/s PCI32 66MHz = 266MB/s PCI64 33MHz = 266MB/s PCI64 66MHz = 533MB/s PCI-X 133MHz = 1066MB/s That means for for normal PCI32, one gigabit card or 6 100Mbit Ethernet interfaces can saturate the bus. Also, all that I/O slows down the CPU and memory interface.> >Don't use kernel irq balancing, user space irqbalance daemon is smart > > I'll try that. > > >It would be useful to see what the kernel profiling (oprofile) shows. > > Abridged version as follows: > > CPU: P4 / Xeon, speed 2400.36 MHz (estimated) > Counted GLOBAL_POWER_EVENTS events (time during which processor is not > stopped) with a unit mask of 0x01 (mandatory) count 100000 > GLOBAL_POWER_E...| > samples| %| > ------------------ > 65889602 40.3276 e1000 > 54306736 33.2383 ebtables > 26076156 15.9598 vmlinux > 4490657 2.7485 bridge > 2532733 1.5502 sch_cbq > 2411378 1.4759 libnetsnmp.so.9.0.1 > 2120668 1.2979 ide_core > 1391944 0.8519 oprofiled >You are using ebtables, so that adds a lot of overhead processing the rules. The problem is that each packet means a CPU cache miss. What is the memory bus bandwidth of the Xeon's?> -------------------------- > (There's more, naturally, but I doubt it's very useful.) > > > >How are you measuring CPU utilization? > > As reported by 'top'. > > >Andrew Morton wrote a cyclesoaker to do this, if you want it, I'lldig> it up. > > Please. > > >And the dual-port e1000's add a layer of PCI bridge that also hurts > latency/bandwidth. > > I need bypass-cards in this particular application, so I don't havemuch> choice in the matter. > > Thanks, > Leigh-- Stephen Hemminger <shemminger@linux-foundation.org> _______________________________________________ Bridge mailing list Bridge@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/bridge
Looking at this a little further, the stats on the ethernet card (from ifconfig) tell me that it has transmitted as many bytes as it has received, which indicates that the bridge itself is not dropping the packets. I'm back to the idea that the card sn't delivering them to the bridge somehow. Hence my first guess about buffer sizes. As far as MSI goes, I have it enabled in the kernel (looks like it's default for 2.6.18?), but I see nothing in the logs referring to MSI. Is there any way of checking whether it's enabled and working? Or is it only a PCI-X thing? Leigh. -----Original Message----- From: bridge-bounces@lists.linux-foundation.org [mailto:bridge-bounces@lists.linux-foundation.org] On Behalf Of Leigh Sharpe Sent: Wednesday, 14 November 2007 12:34 PM To: Stephen Hemminger Cc: bridge@lists.linux-foundation.org Subject: RE: [Bridge] Rx Buffer sizes on e1000>I'll re-run that oprofile. The last set of tests I did was withebtables disabled, and it was still dropping packets. Ultimately, however, I need ebtables (and tc) running. And here it is. Ebtables is loaded but not running. ---- CPU: P4 / Xeon, speed 2400.15 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 GLOBAL_POWER_E...| samples| %| ------------------ 17230723 56.7305 e1000 8142695 26.8090 vmlinux 2439000 8.0302 bridge 1036770 3.4135 ip_tables 468590 1.5428 ebtables 215775 0.7104 oprofiled ---- This was with about 121Mbps of traffic on the entire box (11 bridges, 11Mbps on each of them), with irqbalance. The results are better with irqbalance, but still not quite there yet. -----Original Message----- From: bridge-bounces@lists.linux-foundation.org [mailto:bridge-bounces@lists.linux-foundation.org] On Behalf Of Leigh Sharpe Sent: Wednesday, 14 November 2007 10:11 AM To: Stephen Hemminger Cc: bridge@lists.linux-foundation.org Subject: RE: [Bridge] Rx Buffer sizes on e1000 Hi Steven,>You are using ebtables, so that adds a lot of overhead processing therules. The problem is that each packet means a CPU cache miss. What is the memory bus bandwidth of the Xeon's? I'll re-run that oprofile. The last set of tests I did was with ebtables disabled, and it was still dropping packets. Ultimately, however, I need ebtables (and tc) running. Memory is DDR333. Having installed irqbalance as you suggested, initial tests look promising.... Leigh. -----Original Message----- From: Stephen Hemminger [mailto:shemminger@linux-foundation.org] Sent: Wednesday, 14 November 2007 9:47 AM To: Leigh Sharpe Cc: bridge@lists.linux-foundation.org Subject: Re: [Bridge] Rx Buffer sizes on e1000 On Wed, 14 Nov 2007 09:24:18 +1100 "Leigh Sharpe" <lsharpe@pacificwireless.com.au> wrote:> >First, make sure you have enough bus bandwidth! > > Shouldn't a PCI bus be up to it? IIRC, PCI has a bus speed of 133MB/s. > I'm only doing 100Mb/s of traffic, less than 1/8 of the bus speed. I > don't have a PCI-X machine I can test this on at the moment.I find regular PCI bus (32bit) tops out at about 600 Mbits/sec on most machines. For PCI-X (64 bit/133) a realistic value is 6 Gbits/sec. The problem is arbitration and transfer sizes. Absolute limit is: PCI32 33MHz = 133MB/s PCI32 66MHz = 266MB/s PCI64 33MHz = 266MB/s PCI64 66MHz = 533MB/s PCI-X 133MHz = 1066MB/s That means for for normal PCI32, one gigabit card or 6 100Mbit Ethernet interfaces can saturate the bus. Also, all that I/O slows down the CPU and memory interface.> >Don't use kernel irq balancing, user space irqbalance daemon is smart > > I'll try that. > > >It would be useful to see what the kernel profiling (oprofile) shows. > > Abridged version as follows: > > CPU: P4 / Xeon, speed 2400.36 MHz (estimated) > Counted GLOBAL_POWER_EVENTS events (time during which processor is not > stopped) with a unit mask of 0x01 (mandatory) count 100000 > GLOBAL_POWER_E...| > samples| %| > ------------------ > 65889602 40.3276 e1000 > 54306736 33.2383 ebtables > 26076156 15.9598 vmlinux > 4490657 2.7485 bridge > 2532733 1.5502 sch_cbq > 2411378 1.4759 libnetsnmp.so.9.0.1 > 2120668 1.2979 ide_core > 1391944 0.8519 oprofiled >You are using ebtables, so that adds a lot of overhead processing the rules. The problem is that each packet means a CPU cache miss. What is the memory bus bandwidth of the Xeon's?> -------------------------- > (There's more, naturally, but I doubt it's very useful.) > > > >How are you measuring CPU utilization? > > As reported by 'top'. > > >Andrew Morton wrote a cyclesoaker to do this, if you want it, I'lldig> it up. > > Please. > > >And the dual-port e1000's add a layer of PCI bridge that also hurts > latency/bandwidth. > > I need bypass-cards in this particular application, so I don't havemuch> choice in the matter. > > Thanks, > Leigh-- Stephen Hemminger <shemminger@linux-foundation.org> _______________________________________________ Bridge mailing list Bridge@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/bridge _______________________________________________ Bridge mailing list Bridge@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/bridge
>$ cat /proc/interruptsleigh@qosbox:~$ cat /proc/interrupts CPU0 0: 15011517 IO-APIC-edge timer 4: 6 IO-APIC-edge serial 6: 2 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 1 IO-APIC-edge rtc 9: 1 IO-APIC-level acpi 14: 108 IO-APIC-edge ide0 15: 501765 IO-APIC-edge ide1 169: 8964401 IO-APIC-level uhci_hcd:usb1, eth3, eth8, eth13, eth14, eth21, eth24 177: 9074694 IO-APIC-level uhci_hcd:usb2, eth4, eth9, eth10, eth15, eth18, eth25 185: 38 IO-APIC-level uhci_hcd:usb3 193: 10256409 IO-APIC-level ehci_hcd:usb4, eth2, eth7, eth12, eth17, eth20, eth23 201: 44119 IO-APIC-level eth0 217: 9734965 IO-APIC-level eth5, eth6, eth11, eth16, eth19, eth22 225: 0 IO-APIC-level Intel 82801DB-ICH4 NMI: 0 LOC: 15012117 ERR: 0 MIS: 0 -----Original Message----- From: bridge-bounces@lists.linux-foundation.org [mailto:bridge-bounces@lists.linux-foundation.org] On Behalf Of Andy Gospodarek Sent: Thursday, 15 November 2007 3:10 AM To: bridge@lists.linux-foundation.org Subject: Re: [Bridge] Rx Buffer sizes on e1000 On Nov 14, 2007 12:26 AM, Leigh Sharpe <lsharpe@pacificwireless.com.au> wrote:> > Looking at this a little further, the stats on the ethernet card (from > ifconfig) tell me that it has transmitted as many bytes as it has > received, which indicates that the bridge itself is not dropping the > packets. I'm back to the idea that the card sn't delivering them tothe> bridge somehow. Hence my first guess about buffer sizes.The e1000 module does have options for Tx/RxDescriptors (buffers), but I'm not sure that's going to help you too much.> As far as MSI goes, I have it enabled in the kernel (looks like it's > default for 2.6.18?), but I see nothing in the logs referring to MSI.Is> there any way of checking whether it's enabled and working? Or is it > only a PCI-X thing? >$ cat /proc/interrupts _______________________________________________ Bridge mailing list Bridge@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/bridge