Hello everyone,
I am in the process of upgrading our main PF firewalls from 9.2-RC4 to
10.1-BETA1 (r271684) and as part of the process I have been testing the
forwarding capability of FreeBSD 10 (pf firewall disabled) to have a
base-line and find any bottlenecks on a 10GbE network.
My tests show that for the first 3-4Gbit/s of traffic things are great and
then the machine simply "hits the wall" at around 4-5Gbit/s of
traffic.
There's no gradual degradation but a hard drop to 0% idle and 50-50% split
of system and interrupt CPU load. I ran some diagnostics and I was hoping
someone could point me in the right direction as to what is happening and
what I could do to improve the situation. The details I have collected so
far are below:
I run multiple iperf tcp multithreaded instances to generate traffic which
traverses the firewall. As mentioned above for the first 3-4Gbit/s traffic
the machine doesn't even break a sweat.
*top header* during this load when things are good:
CPU 0: 0.0% user, 0.0% nice, 0.0% system, 27.6% interrupt, 72.4% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 27.6% interrupt, 72.4% idle
CPU 2: 0.0% user, 0.0% nice, 0.0% system, 8.7% interrupt, 91.3% idle
CPU 3: 0.0% user, 0.0% nice, 0.0% system, 17.3% interrupt, 82.7% idle
CPU 4: 0.0% user, 0.0% nice, 0.0% system, 12.6% interrupt, 87.4% idle
CPU 5: 0.0% user, 0.0% nice, 0.0% system, 4.7% interrupt, 95.3% idle
CPU 6: 0.0% user, 0.0% nice, 0.0% system, 13.4% interrupt, 86.6% idle
CPU 7: 0.0% user, 0.0% nice, 0.0% system, 26.0% interrupt, 74.0% idle
CPU 8: 0.0% user, 0.0% nice, 0.0% system, 16.5% interrupt, 83.5% idle
CPU 9: 0.0% user, 0.0% nice, 0.0% system, 1.6% interrupt, 98.4% idle
CPU 10: 0.0% user, 0.0% nice, 0.0% system, 19.7% interrupt, 80.3% idle
CPU 11: 0.0% user, 0.0% nice, 0.0% system, 7.1% interrupt, 92.9% idle
Full output at http://pastebin.com/gaaisXV8
*bmon* at the same time:
Interfaces ? RX bps pps %? TX bps
pps %
ix0 ? 240.33MiB 242.20K ? 221.41MiB 236.68K
ix1 ? 246.51MiB 248.80K ? 261.61MiB 250.95K
>lagg0 ? 483.45MiB 492.42K ? 479.54MiB 488.02K
MiB (RX
Bytes/second) MiB (TX
Bytes/second)
499.17 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
496.49 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
415.98 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
413.74 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
332.78 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
331.00 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
249.59 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
248.25 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
166.39 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
165.50 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
83.20
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 82.75
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 5 10 15 20 25 30 35 40 45 50 55
60 1 5 10 15 20 25 30 35 40 45 50 55 60
K (RX
Packets/second) K (TX
Packets/second)
508.27 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
505.14 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
423.56 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
420.95 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
338.85 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
336.76 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
254.14 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
252.57 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
169.42 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
168.38 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
84.71
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 84.19
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 5 10 15 20 25 30 35 40 45 50 55
60 1 5 10 15 20 25 30 35 40 45 50 55 60
To profile it without having it degrade sooner I reduced traffic to 2Gbit/s
and ran dtrace hotkernel and lock profiling. Here are the results:
*/usr/share/dtrace/toolkit/hotkernel* for 60 seconds:
kernel`__mtx_lock_flags 5812 0.8%
kernel`__mtx_unlock_flags 7200 1.0%
kernel`acpi_cpu_c1 7799 1.1%
kernel`__rw_rlock 11196 1.5%
kernel`spinlock_exit 14547 2.0%
kernel`cpu_idle 166700 22.8%
kernel`sched_idletd 461883 63.1%
Full output at http://pastebin.com/w7WfFwPG
*lock profiling* for 60 seconds:
$ head -n 2 good-locks ; cat good-locks | sort -n -k 4 | tail -n6
debug.lock.prof.stats:
max wait_max total wait_total count avg wait_avg
cnt_hold cnt_lock name
22 24 94378 18481 264549 0
0 0 18639 /usr/src/sys/kern/sched_ule.c:886 (spin mutex:sched
lock 10)
22 24 112366 20360 219179 0
0 0 17220 /usr/src/sys/kern/sched_ule.c:888 (spin mutex:sched
lock 2)
18 319 3486 22352 4233 0
5 0 1640 /usr/src/sys/kern/subr_taskqueue.c:344 (sleep
mutex:taskqueue)
26 66 3219768 204875 14616220 0
0 0 133154 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry)
25 90 1923012 2353820 14615738 0
0 0 1562097 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep
mutex:rtentry)
26 91 1398443 2391458 14616137 0
0 0 1604332 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry)
Full output at http://pastebin.com/qiG3ZuAH
Again, above measurements demonstrate the state of the good/healthy system
under moderate traffic load of 3-4Gbit/s with pf disabled and fast
forwarding enabled. Here are the same measurements when I add an additional
1Gbit/s of traffic. The point when performance tanks varies between
4-5Gbit/s but it's always sudden, without any gradual degradation but
instead idle simply drops down to 0. Let's take a look:
*top header* during this load when things are bad:
CPU 0: 0.0% user, 0.0% nice, 44.6% system, 55.4% interrupt, 0.0% idle
CPU 1: 0.0% user, 0.0% nice, 15.1% system, 84.9% interrupt, 0.0% idle
CPU 2: 0.0% user, 0.0% nice, 59.0% system, 40.3% interrupt, 0.7% idle
CPU 3: 0.0% user, 0.0% nice, 57.6% system, 42.4% interrupt, 0.0% idle
CPU 4: 0.0% user, 0.0% nice, 34.5% system, 63.3% interrupt, 2.2% idle
CPU 5: 0.0% user, 0.0% nice, 51.8% system, 48.2% interrupt, 0.0% idle
CPU 6: 0.0% user, 0.0% nice, 33.8% system, 66.2% interrupt, 0.0% idle
CPU 7: 0.7% user, 0.0% nice, 49.6% system, 49.6% interrupt, 0.0% idle
CPU 8: 0.0% user, 0.0% nice, 66.2% system, 33.8% interrupt, 0.0% idle
CPU 9: 0.0% user, 0.0% nice, 35.3% system, 64.7% interrupt, 0.0% idle
CPU 10: 0.0% user, 0.0% nice, 54.7% system, 45.3% interrupt, 0.0% idle
CPU 11: 0.0% user, 0.0% nice, 34.5% system, 65.5% interrupt, 0.0% idle
Full output at http://pastebin.com/9an9ZWv2
*bmon *at the same time shows inconsistent performance with big dips:
Interfaces ? RX bps pps %? TX bps
pps %
ix0 ? 153.89MiB 151.53K ? 179.69MiB 159.91K
ix1 ? 176.56MiB 161.29K ? 145.17MiB 148.13K
>lagg0 ? 327.23MiB 333.05K ? 322.14MiB 328.13K
MiB (RX
Bytes/second) MiB
(TX Bytes/second)
657.86
..............|.............................................
648.60 ..............|.............................................
548.21
..............|.............................................
540.50 ..............|.............................................
438.57
...........|..|.........................|.....|.......|..|..
432.40 ...........|..|.........................|.....|.......|..|..
328.93
|..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
324.30 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
219.29
|..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
216.20 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
109.64
|.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
108.10 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
1 5 10 15 20 25 30 35 40 45 50 55
60 1 5 10 15 20 25 30 35 40 45 50
55 60
K (RX
Packets/second) K
(TX Packets/second)
670.41
..............|.............................................
660.27 ..............|.............................................
558.67
..............|.............................................
550.22 ..............|.............................................
446.94
...........|..|.........................|.....|.......|..|..
440.18 ...........|..|.........................|.....|.......|..|..
335.20
|..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
330.13 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||..
223.47
|..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
220.09 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
111.73
|.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
110.04 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||..
1 5 10 15 20 25 30 35 40 45 50 55
60 1 5 10 15 20 25 30 35 40 45 50
55 60
*/usr/share/dtrace/toolkit/hotkernel* for 60 seconds:
kernel`_rw_runlock_cookie 7709 1.1%
kernel`__rw_rlock 11182 1.6%
kernel`ip_fastforward 12231 1.7%
kernel`__mtx_lock_flags 22004 3.1%
kernel`__mtx_unlock_flags 35614 5.0%
kernel`__mtx_lock_sleep 560768 78.5%
Full output at http://pastebin.com/NurKwkWL
*lock profiling* for 60 seconds:
$ head -n 2 bad-locks ; cat bad-locks | sort -n -k 4 | tail -n6
debug.lock.prof.stats:
max wait_max total wait_total count avg wait_avg
cnt_hold cnt_lock name
401766 191987 1857397 194162 179 10376 1084
0 2 /usr/src/sys/kern/kern_sysctl.c:1601 (sx:sysctl mem)
21064 207907 62556 249066 396 157 628
0 3 /usr/src/sys/kern/kern_sysctl.c:1499 (sleep mutex:Giant)
1 370663 17 372573 86 0 4332
0 2 /usr/src/sys/kern/kern_exit.c:429 (sx:allproc)
14 648 8856844 46296098 15513956 0 2 0
1467849 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry)
15 958 13107581 252445472 15513486 0 16 0
9444644 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep mutex:rtentry)
12 779 12500816 286324556 15513872 0 18 0
9788497 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry)
Full output at http://pastebin.com/d54QmP13
System hardware is: 12 x E5-2440 @ 2.40GHz, 24GB RAM, Dual port fiber
Intel(R) PRO/10GbE in lacp lagg
System configuration details available at http://pastebin.com/tPvs1MeD
It seems to me like heavy lock contention but I don't understand why it
happens in such an abrupt manner after a given threshold.
Other things I tried:
- Upgrade ix (ixgbe) driver to the latest from Intel (2.5.25) - for some
reason I cannot send any packets out
- enable flowtables - no difference
Any help is highly appreciated. I could provide further details and run
additional tests upon request.
Regrads,
--
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>
Hi, The forwarding paths don't 100% use flowtable, so there's still routing table lookups. All those highly contended locks are the rtentry locking because the forwarding and fast forwarding paths don't use flowtable. This shouldn't be a difficult problem to solve; someone just has to go through those places above, figure out what code is doing the lookups and convert them to use flowtable. The one place that's currently a pain is how IPv4 redirects are handled; you can turn them off with a sysctl. -a
--- Original message --- From: "Rumen Telbizov" <telbizov at gmail.com> Date: 20 September 2014, 19:12:17> Hello everyone, > > I am in the process of upgrading our main PF firewalls from 9.2-RC4 to > 10.1-BETA1 (r271684) and as part of the process I have been testing the > forwarding capability of FreeBSD 10 (pf firewall disabled) to have a > base-line and find any bottlenecks on a 10GbE network. >Please show the output of these commands: pciconf -lv | grep -A 4 ix\[0-9\] netstat -m ngctl list | wc -l sysctl -a | egrep 'net.(inet.(tcp|udp)|graph|isr)' -- Vladislav V. Prodan System & Network Administrator support.od.ua
On 9/20/14, Rumen Telbizov <telbizov at gmail.com> wrote:> Hello everyone, > > I am in the process of upgrading our main PF firewalls from 9.2-RC4 to > 10.1-BETA1 (r271684) and as part of the process I have been testing the > forwarding capability of FreeBSD 10 (pf firewall disabled) to have a > base-line and find any bottlenecks on a 10GbE network. > > My tests show that for the first 3-4Gbit/s of traffic things are great and > then the machine simply "hits the wall" at around 4-5Gbit/s of traffic. > There's no gradual degradation but a hard drop to 0% idle and 50-50% split > of system and interrupt CPU load. I ran some diagnostics and I was hoping > someone could point me in the right direction as to what is happening and > what I could do to improve the situation. The details I have collected so > far are below: > > I run multiple iperf tcp multithreaded instances to generate traffic which > traverses the firewall. As mentioned above for the first 3-4Gbit/s traffic > the machine doesn't even break a sweat. > > *top header* during this load when things are good: > CPU 0: 0.0% user, 0.0% nice, 0.0% system, 27.6% interrupt, 72.4% idle > CPU 1: 0.0% user, 0.0% nice, 0.0% system, 27.6% interrupt, 72.4% idle > CPU 2: 0.0% user, 0.0% nice, 0.0% system, 8.7% interrupt, 91.3% idle > CPU 3: 0.0% user, 0.0% nice, 0.0% system, 17.3% interrupt, 82.7% idle > CPU 4: 0.0% user, 0.0% nice, 0.0% system, 12.6% interrupt, 87.4% idle > CPU 5: 0.0% user, 0.0% nice, 0.0% system, 4.7% interrupt, 95.3% idle > CPU 6: 0.0% user, 0.0% nice, 0.0% system, 13.4% interrupt, 86.6% idle > CPU 7: 0.0% user, 0.0% nice, 0.0% system, 26.0% interrupt, 74.0% idle > CPU 8: 0.0% user, 0.0% nice, 0.0% system, 16.5% interrupt, 83.5% idle > CPU 9: 0.0% user, 0.0% nice, 0.0% system, 1.6% interrupt, 98.4% idle > CPU 10: 0.0% user, 0.0% nice, 0.0% system, 19.7% interrupt, 80.3% idle > CPU 11: 0.0% user, 0.0% nice, 0.0% system, 7.1% interrupt, 92.9% idle > Full output at http://pastebin.com/gaaisXV8 > > *bmon* at the same time: > Interfaces ? RX bps pps %? TX bps > pps % > ix0 ? 240.33MiB 242.20K ? 221.41MiB 236.68K > ix1 ? 246.51MiB 248.80K ? 261.61MiB 250.95K > >lagg0 ? 483.45MiB 492.42K ? 479.54MiB 488.02K > > MiB (RX > Bytes/second) MiB (TX > Bytes/second) > 499.17 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 496.49 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 415.98 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 413.74 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 332.78 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 331.00 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 249.59 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 248.25 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 166.39 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 165.50 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 83.20 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 82.75 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 1 5 10 15 20 25 30 35 40 45 50 55 > 60 1 5 10 15 20 25 30 35 40 45 50 55 60 > K (RX > Packets/second) K (TX > Packets/second) > 508.27 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 505.14 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 423.56 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 420.95 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 338.85 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 336.76 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 254.14 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 252.57 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 169.42 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 168.38 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 84.71 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 84.19 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 1 5 10 15 20 25 30 35 40 45 50 55 > 60 1 5 10 15 20 25 30 35 40 45 50 55 60 > > To profile it without having it degrade sooner I reduced traffic to 2Gbit/s > and ran dtrace hotkernel and lock profiling. Here are the results: > > */usr/share/dtrace/toolkit/hotkernel* for 60 seconds: > kernel`__mtx_lock_flags 5812 0.8% > kernel`__mtx_unlock_flags 7200 1.0% > kernel`acpi_cpu_c1 7799 1.1% > kernel`__rw_rlock 11196 1.5% > kernel`spinlock_exit 14547 2.0% > kernel`cpu_idle 166700 22.8% > kernel`sched_idletd 461883 63.1% > Full output at http://pastebin.com/w7WfFwPG > > *lock profiling* for 60 seconds: > $ head -n 2 good-locks ; cat good-locks | sort -n -k 4 | tail -n6 > debug.lock.prof.stats: > max wait_max total wait_total count avg wait_avg > cnt_hold cnt_lock name > 22 24 94378 18481 264549 0 > 0 0 18639 /usr/src/sys/kern/sched_ule.c:886 (spin mutex:sched > lock 10) > 22 24 112366 20360 219179 0 > 0 0 17220 /usr/src/sys/kern/sched_ule.c:888 (spin mutex:sched > lock 2) > 18 319 3486 22352 4233 0 > 5 0 1640 /usr/src/sys/kern/subr_taskqueue.c:344 (sleep > mutex:taskqueue) > 26 66 3219768 204875 14616220 0 > 0 0 133154 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry) > 25 90 1923012 2353820 14615738 0 > 0 0 1562097 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep > mutex:rtentry) > 26 91 1398443 2391458 14616137 0 > 0 0 1604332 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry) > Full output at http://pastebin.com/qiG3ZuAH > > Again, above measurements demonstrate the state of the good/healthy system > under moderate traffic load of 3-4Gbit/s with pf disabled and fast > forwarding enabled. Here are the same measurements when I add an additional > 1Gbit/s of traffic. The point when performance tanks varies between > 4-5Gbit/s but it's always sudden, without any gradual degradation but > instead idle simply drops down to 0. Let's take a look: > > *top header* during this load when things are bad: > CPU 0: 0.0% user, 0.0% nice, 44.6% system, 55.4% interrupt, 0.0% idle > CPU 1: 0.0% user, 0.0% nice, 15.1% system, 84.9% interrupt, 0.0% idle > CPU 2: 0.0% user, 0.0% nice, 59.0% system, 40.3% interrupt, 0.7% idle > CPU 3: 0.0% user, 0.0% nice, 57.6% system, 42.4% interrupt, 0.0% idle > CPU 4: 0.0% user, 0.0% nice, 34.5% system, 63.3% interrupt, 2.2% idle > CPU 5: 0.0% user, 0.0% nice, 51.8% system, 48.2% interrupt, 0.0% idle > CPU 6: 0.0% user, 0.0% nice, 33.8% system, 66.2% interrupt, 0.0% idle > CPU 7: 0.7% user, 0.0% nice, 49.6% system, 49.6% interrupt, 0.0% idle > CPU 8: 0.0% user, 0.0% nice, 66.2% system, 33.8% interrupt, 0.0% idle > CPU 9: 0.0% user, 0.0% nice, 35.3% system, 64.7% interrupt, 0.0% idle > CPU 10: 0.0% user, 0.0% nice, 54.7% system, 45.3% interrupt, 0.0% idle > CPU 11: 0.0% user, 0.0% nice, 34.5% system, 65.5% interrupt, 0.0% idle > Full output at http://pastebin.com/9an9ZWv2 > > *bmon *at the same time shows inconsistent performance with big dips: > Interfaces ? RX bps pps %? TX bps > pps % > ix0 ? 153.89MiB 151.53K ? 179.69MiB 159.91K > ix1 ? 176.56MiB 161.29K ? 145.17MiB 148.13K > >lagg0 ? 327.23MiB 333.05K ? 322.14MiB 328.13K > > MiB (RX > Bytes/second) MiB > (TX Bytes/second) > 657.86 > ..............|............................................. > 648.60 ..............|............................................. > 548.21 > ..............|............................................. > 540.50 ..............|............................................. > 438.57 > ...........|..|.........................|.....|.......|..|.. > 432.40 ...........|..|.........................|.....|.......|..|.. > 328.93 > |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||.. > 324.30 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||.. > 219.29 > |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 216.20 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 109.64 > |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 108.10 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 1 5 10 15 20 25 30 35 40 45 50 55 > 60 1 5 10 15 20 25 30 35 40 45 50 > 55 60 > K (RX > Packets/second) K > (TX Packets/second) > 670.41 > ..............|............................................. > 660.27 ..............|............................................. > 558.67 > ..............|............................................. > 550.22 ..............|............................................. > 446.94 > ...........|..|.........................|.....|.......|..|.. > 440.18 ...........|..|.........................|.....|.......|..|.. > 335.20 > |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||.. > 330.13 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||.. > 223.47 > |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 220.09 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 111.73 > |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 110.04 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 1 5 10 15 20 25 30 35 40 45 50 55 > 60 1 5 10 15 20 25 30 35 40 45 50 > 55 60 > > */usr/share/dtrace/toolkit/hotkernel* for 60 seconds: > kernel`_rw_runlock_cookie 7709 1.1% > kernel`__rw_rlock 11182 1.6% > kernel`ip_fastforward 12231 1.7% > kernel`__mtx_lock_flags 22004 3.1% > kernel`__mtx_unlock_flags 35614 5.0% > kernel`__mtx_lock_sleep 560768 78.5% > Full output at http://pastebin.com/NurKwkWL > > *lock profiling* for 60 seconds: > $ head -n 2 bad-locks ; cat bad-locks | sort -n -k 4 | tail -n6 > debug.lock.prof.stats: > max wait_max total wait_total count avg wait_avg > cnt_hold cnt_lock name > 401766 191987 1857397 194162 179 10376 1084 > 0 2 /usr/src/sys/kern/kern_sysctl.c:1601 (sx:sysctl mem) > 21064 207907 62556 249066 396 157 628 > 0 3 /usr/src/sys/kern/kern_sysctl.c:1499 (sleep mutex:Giant) > 1 370663 17 372573 86 0 4332 > 0 2 /usr/src/sys/kern/kern_exit.c:429 (sx:allproc) > 14 648 8856844 46296098 15513956 0 2 0 > 1467849 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry) > 15 958 13107581 252445472 15513486 0 16 0 > 9444644 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep mutex:rtentry) > 12 779 12500816 286324556 15513872 0 18 0 > 9788497 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry) > Full output at http://pastebin.com/d54QmP13 > > System hardware is: 12 x E5-2440 @ 2.40GHz, 24GB RAM, Dual port fiber > Intel(R) PRO/10GbE in lacp lagg > System configuration details available at http://pastebin.com/tPvs1MeD > > It seems to me like heavy lock contention but I don't understand why it > happens in such an abrupt manner after a given threshold. > Other things I tried: > > - Upgrade ix (ixgbe) driver to the latest from Intel (2.5.25) - for some > reason I cannot send any packets out > - enable flowtables - no difference > > Any help is highly appreciated. I could provide further details and run > additional tests upon request. > > Regrads, > -- > Rumen Telbizov > Unix Systems Administrator <http://telbizov.com>Try to reenable the idle ticks. I observed similar problem with em driver. sysctl kern.eventtimer.idletick=1 sysctl kern.eventtimer.periodic=1> _______________________________________________ > freebsd-stable at freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
On 20.09.2014 20:12, Rumen Telbizov wrote:> Hello everyone, > > I am in the process of upgrading our main PF firewalls from 9.2-RC4 to > 10.1-BETA1 (r271684) and as part of the process I have been testing the > forwarding capability of FreeBSD 10 (pf firewall disabled) to have a > base-line and find any bottlenecks on a 10GbE network.Can you try to split default route into 2 (0/1, 128/1) and see if there are any difference? Can you share "pmcstat -TS instructions -w1" ?> > My tests show that for the first 3-4Gbit/s of traffic things are great and > then the machine simply "hits the wall" at around 4-5Gbit/s of traffic. > There's no gradual degradation but a hard drop to 0% idle and 50-50% split > of system and interrupt CPU load. I ran some diagnostics and I was hoping > someone could point me in the right direction as to what is happening and > what I could do to improve the situation. The details I have collected so > far are below: > > I run multiple iperf tcp multithreaded instances to generate traffic which > traverses the firewall. As mentioned above for the first 3-4Gbit/s traffic > the machine doesn't even break a sweat. > > *top header* during this load when things are good: > CPU 0: 0.0% user, 0.0% nice, 0.0% system, 27.6% interrupt, 72.4% idle > CPU 1: 0.0% user, 0.0% nice, 0.0% system, 27.6% interrupt, 72.4% idle > CPU 2: 0.0% user, 0.0% nice, 0.0% system, 8.7% interrupt, 91.3% idle > CPU 3: 0.0% user, 0.0% nice, 0.0% system, 17.3% interrupt, 82.7% idle > CPU 4: 0.0% user, 0.0% nice, 0.0% system, 12.6% interrupt, 87.4% idle > CPU 5: 0.0% user, 0.0% nice, 0.0% system, 4.7% interrupt, 95.3% idle > CPU 6: 0.0% user, 0.0% nice, 0.0% system, 13.4% interrupt, 86.6% idle > CPU 7: 0.0% user, 0.0% nice, 0.0% system, 26.0% interrupt, 74.0% idle > CPU 8: 0.0% user, 0.0% nice, 0.0% system, 16.5% interrupt, 83.5% idle > CPU 9: 0.0% user, 0.0% nice, 0.0% system, 1.6% interrupt, 98.4% idle > CPU 10: 0.0% user, 0.0% nice, 0.0% system, 19.7% interrupt, 80.3% idle > CPU 11: 0.0% user, 0.0% nice, 0.0% system, 7.1% interrupt, 92.9% idle > Full output at http://pastebin.com/gaaisXV8 > > *bmon* at the same time: > Interfaces ? RX bps pps %? TX bps > pps % > ix0 ? 240.33MiB 242.20K ? 221.41MiB 236.68K > ix1 ? 246.51MiB 248.80K ? 261.61MiB 250.95K > >lagg0 ? 483.45MiB 492.42K ? 479.54MiB 488.02KThere are some things that can be improved in case of lagg, are you OK with testing some patches?> > MiB (RX > Bytes/second) MiB (TX > Bytes/second) > 499.17 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 496.49 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 415.98 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 413.74 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 332.78 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 331.00 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 249.59 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 248.25 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 166.39 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 165.50 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 83.20 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 82.75 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 1 5 10 15 20 25 30 35 40 45 50 55 > 60 1 5 10 15 20 25 30 35 40 45 50 55 60 > K (RX > Packets/second) K (TX > Packets/second) > 508.27 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 505.14 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 423.56 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 420.95 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 338.85 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 336.76 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 254.14 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 252.57 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 169.42 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 168.38 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 84.71 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 84.19 > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > 1 5 10 15 20 25 30 35 40 45 50 55 > 60 1 5 10 15 20 25 30 35 40 45 50 55 60 > > To profile it without having it degrade sooner I reduced traffic to 2Gbit/s > and ran dtrace hotkernel and lock profiling. Here are the results: > > */usr/share/dtrace/toolkit/hotkernel* for 60 seconds: > kernel`__mtx_lock_flags 5812 0.8% > kernel`__mtx_unlock_flags 7200 1.0% > kernel`acpi_cpu_c1 7799 1.1% > kernel`__rw_rlock 11196 1.5% > kernel`spinlock_exit 14547 2.0% > kernel`cpu_idle 166700 22.8% > kernel`sched_idletd 461883 63.1% > Full output at http://pastebin.com/w7WfFwPG > > *lock profiling* for 60 seconds: > $ head -n 2 good-locks ; cat good-locks | sort -n -k 4 | tail -n6 > debug.lock.prof.stats: > max wait_max total wait_total count avg wait_avg > cnt_hold cnt_lock name > 22 24 94378 18481 264549 0 > 0 0 18639 /usr/src/sys/kern/sched_ule.c:886 (spin mutex:sched > lock 10) > 22 24 112366 20360 219179 0 > 0 0 17220 /usr/src/sys/kern/sched_ule.c:888 (spin mutex:sched > lock 2) > 18 319 3486 22352 4233 0 > 5 0 1640 /usr/src/sys/kern/subr_taskqueue.c:344 (sleep > mutex:taskqueue) > 26 66 3219768 204875 14616220 0 > 0 0 133154 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry) > 25 90 1923012 2353820 14615738 0 > 0 0 1562097 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep > mutex:rtentry) > 26 91 1398443 2391458 14616137 0 > 0 0 1604332 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry) > Full output at http://pastebin.com/qiG3ZuAH > > Again, above measurements demonstrate the state of the good/healthy system > under moderate traffic load of 3-4Gbit/s with pf disabled and fast > forwarding enabled. Here are the same measurements when I add an additional > 1Gbit/s of traffic. The point when performance tanks varies between > 4-5Gbit/s but it's always sudden, without any gradual degradation but > instead idle simply drops down to 0. Let's take a look: > > *top header* during this load when things are bad: > CPU 0: 0.0% user, 0.0% nice, 44.6% system, 55.4% interrupt, 0.0% idle > CPU 1: 0.0% user, 0.0% nice, 15.1% system, 84.9% interrupt, 0.0% idle > CPU 2: 0.0% user, 0.0% nice, 59.0% system, 40.3% interrupt, 0.7% idle > CPU 3: 0.0% user, 0.0% nice, 57.6% system, 42.4% interrupt, 0.0% idle > CPU 4: 0.0% user, 0.0% nice, 34.5% system, 63.3% interrupt, 2.2% idle > CPU 5: 0.0% user, 0.0% nice, 51.8% system, 48.2% interrupt, 0.0% idle > CPU 6: 0.0% user, 0.0% nice, 33.8% system, 66.2% interrupt, 0.0% idle > CPU 7: 0.7% user, 0.0% nice, 49.6% system, 49.6% interrupt, 0.0% idle > CPU 8: 0.0% user, 0.0% nice, 66.2% system, 33.8% interrupt, 0.0% idle > CPU 9: 0.0% user, 0.0% nice, 35.3% system, 64.7% interrupt, 0.0% idle > CPU 10: 0.0% user, 0.0% nice, 54.7% system, 45.3% interrupt, 0.0% idle > CPU 11: 0.0% user, 0.0% nice, 34.5% system, 65.5% interrupt, 0.0% idle > Full output at http://pastebin.com/9an9ZWv2 > > *bmon *at the same time shows inconsistent performance with big dips: > Interfaces ? RX bps pps %? TX bps > pps % > ix0 ? 153.89MiB 151.53K ? 179.69MiB 159.91K > ix1 ? 176.56MiB 161.29K ? 145.17MiB 148.13K > >lagg0 ? 327.23MiB 333.05K ? 322.14MiB 328.13K > > MiB (RX > Bytes/second) MiB > (TX Bytes/second) > 657.86 > ..............|............................................. > 648.60 ..............|............................................. > 548.21 > ..............|............................................. > 540.50 ..............|............................................. > 438.57 > ...........|..|.........................|.....|.......|..|.. > 432.40 ...........|..|.........................|.....|.......|..|.. > 328.93 > |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||.. > 324.30 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||.. > 219.29 > |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 216.20 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 109.64 > |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 108.10 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 1 5 10 15 20 25 30 35 40 45 50 55 > 60 1 5 10 15 20 25 30 35 40 45 50 > 55 60 > K (RX > Packets/second) K > (TX Packets/second) > 670.41 > ..............|............................................. > 660.27 ..............|............................................. > 558.67 > ..............|............................................. > 550.22 ..............|............................................. > 446.94 > ...........|..|.........................|.....|.......|..|.. > 440.18 ...........|..|.........................|.....|.......|..|.. > 335.20 > |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||.. > 330.13 |..||||..||||.||...||..|||.||.|.|||||||||.|||||||||.||||||.. > 223.47 > |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 220.09 |..||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 111.73 > |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 110.04 |.|||||..||||.||..|||.||||.||.|||||||||||.|||||||||.||||||.. > 1 5 10 15 20 25 30 35 40 45 50 55 > 60 1 5 10 15 20 25 30 35 40 45 50 > 55 60 > > */usr/share/dtrace/toolkit/hotkernel* for 60 seconds: > kernel`_rw_runlock_cookie 7709 1.1% > kernel`__rw_rlock 11182 1.6% > kernel`ip_fastforward 12231 1.7% > kernel`__mtx_lock_flags 22004 3.1% > kernel`__mtx_unlock_flags 35614 5.0% > kernel`__mtx_lock_sleep 560768 78.5% > Full output at http://pastebin.com/NurKwkWL > > *lock profiling* for 60 seconds: > $ head -n 2 bad-locks ; cat bad-locks | sort -n -k 4 | tail -n6 > debug.lock.prof.stats: > max wait_max total wait_total count avg wait_avg > cnt_hold cnt_lock name > 401766 191987 1857397 194162 179 10376 1084 > 0 2 /usr/src/sys/kern/kern_sysctl.c:1601 (sx:sysctl mem) > 21064 207907 62556 249066 396 157 628 > 0 3 /usr/src/sys/kern/kern_sysctl.c:1499 (sleep mutex:Giant) > 1 370663 17 372573 86 0 4332 > 0 2 /usr/src/sys/kern/kern_exit.c:429 (sx:allproc) > 14 648 8856844 46296098 15513956 0 2 0 > 1467849 /usr/src/sys/net/route.c:439 (sleep mutex:rtentry) > 15 958 13107581 252445472 15513486 0 16 0 > 9444644 /usr/src/sys/netinet/ip_fastfwd.c:593 (sleep mutex:rtentry) > 12 779 12500816 286324556 15513872 0 18 0 > 9788497 /usr/src/sys/netinet/in_rmx.c:114 (sleep mutex:rtentry) > Full output at http://pastebin.com/d54QmP13 > > System hardware is: 12 x E5-2440 @ 2.40GHz, 24GB RAM, Dual port fiber > Intel(R) PRO/10GbE in lacp lagg > System configuration details available at http://pastebin.com/tPvs1MeD > > It seems to me like heavy lock contention but I don't understand why it > happens in such an abrupt manner after a given threshold. > Other things I tried: > > - Upgrade ix (ixgbe) driver to the latest from Intel (2.5.25) - for some > reason I cannot send any packets out > - enable flowtables - no difference > > Any help is highly appreciated. I could provide further details and run > additional tests upon request. > > Regrads,