I'm still investigating this, but my quagga is locking hard on FreeBSD 8.0 and not locking hard on 7.2. It seems (at this early point in the investigation) that both bgpd and zebra are wedging and zebra is listed as being in the "RUN" state. curiously, the load is also 4.0 (exactly the number of cores in the machine) even though the machine also reads 100% idle.
I have also seen this with a recent version of FreeBSD 8 (I know 8.0-BETA2 didn't have this problem, also I have an 8.0-RC1 without problems, but I think RC3 did have it, and I'm sure -RELEASE has it). A few more details: It happened both on amd64 and i386. I couldn't debug amd64 (it was a live server and we couldn't afford it), but on i386 flowcleaner was using a LOT of CPU. It seemed to happen after booting, when quagga was importing global routing tables (~300k routes) from 2 BGP sessions. At least one of the sessions seemed to finish importing routes, but the kernel routing table seemed to be growing very slowly. Doing "netstat -nr | wc -l" took way longer than usual (20-30 seconds versus 9 seconds now), and it only reported about 100k routes. Doing it again after a minute or so showed the number of routes grew by around 10k. During this time, both quagga and zebra were very slow to respond to a new telnet session opened to them. As a workaround, I did sysctl net.inet.flowtable.enable=0. This didn't ease the load on the CPU, but having it in /etc/sysctl.conf and rebooting did help (quagga started up normally and all routes are where they should be). Hope this helps Alex --- On Fri, 12/4/09, Zaphod Beeblebrox <zbeeble@gmail.com> wrote:> From: Zaphod Beeblebrox <zbeeble@gmail.com> > Subject: Quggaa locking hard. > To: "FreeBSD Stable" <freebsd-stable@freebsd.org> > Date: Friday, December 4, 2009, 5:46 AM > I'm still investigating this, but my > quagga is locking hard on FreeBSD 8.0 > and not locking hard on 7.2.? It seems (at this early > point in the > investigation) that both bgpd and zebra are wedging and > zebra is listed as > being in the "RUN" state. > > curiously, the load is also 4.0 (exactly the number of > cores in the machine) > even though the machine also reads 100% idle. >
At 10:46 PM 12/3/2009, Zaphod Beeblebrox wrote:>I'm still investigating this, but my quagga is locking hard on FreeBSD 8.0 >and not locking hard on 7.2. It seems (at this early point in the >investigation) that both bgpd and zebra are wedging and zebra is listed as >being in the "RUN" state. > >curiously, the load is also 4.0 (exactly the number of cores in the machine) >even though the machine also reads 100% idle.I think I am seeing something similar on a test box. I was loading up the box with 200k routes to do testing with. Kernel is default, save for a few unused drivers removed. If I take out options FLOWTABLE # per-cpu routing cache from the kernel, load avg is back to normal. This issue only seems to have come up in the past week or so as the previous kernel from ~8 days ago was OK. last pid: 6229; load averages: 2.00, 2.00, 2.00 up 1+17:33:02 09:39:31 141 processes: 7 running, 106 sleeping, 28 waiting CPU: 0.0% user, 0.0% nice, 22.2% system, 0.0% interrupt, 77.8% idle Mem: 98M Active, 2233M Inact, 187M Wired, 36K Cache, 112M Buf, 979M Free Swap: 8192M Total, 8192M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 22 root 76 - 0K 8K CPU3 3 41.5H 100.00% flowcleaner 11 root 171 ki31 0K 32K CPU2 2 41.5H 100.00% {idle: cpu2} 11 root 171 ki31 0K 32K CPU1 1 41.5H 100.00% {idle: cpu1} 11 root 171 ki31 0K 32K RUN 0 41.4H 100.00% {idle: cpu0} 869 root 4 0 64860K 64488K select 0 4:12 0.00% bgpd 11 root 171 ki31 0K 32K RUN 3 2:09 0.00% {idle: cpu3} 20 root 44 - 0K 8K syncer 0 1:00 0.00% syncer 12 root -32 - 0K 224K WAIT 1 0:47 0.00% {swi4: clock} 0 root -68 0 0K 80K - 2 0:03 0.00% {fw0_taskq} 1230 root 76 0 3348K 1160K ttyin 2 0:02 0.00% getty 863 root 96 0 24640K 24232K RUN 2 0:02 0.00% zebra 12 root -32 - 0K 224K WAIT 2 0:01 0.00% {swi4: clock} 14 root -16 - 0K 8K - 0 0:01 0.00% yarrow>_______________________________________________ >freebsd-stable@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-stable >To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"-------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike
On Sat, Dec 5, 2009 at 9:11 AM, Mike Tancsa <mike@sentex.net> wrote:> At 04:07 PM 12/4/2009, K. Macy wrote: > >> If you have a large number of routes then you will want to disable the >> flowtable. >> > > Thanks! I will remove from boxes that act as routers / large firewalls. > However, the high load avg is something new. Even when the box is doing > nothing, it sits at 2.00 for some reason. This was not happening from the > code base a week ago or so. >Just to add something really interesting to this, "ifconfig vlan101 unplumb" hangs after this has happened. It seems like it should be related.