Steven Hartland
2011-Oct-05 00:11 UTC
serious packet routing issue causing ntpd high load?
We just updated a machine to 8-STABLE and I've noticed that ntpd is using notible amounts of CPU 5-7% which is very high for such a trivial daemon. 8.2-STABLE FreeBSD 8.2-STABLE #16: Tue Oct 4 09:53:17 UTC 2011 truss indicates its constantly checking and reading from a socket 0.047297485 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,0x0) = 1 (0x1) 0.047513160 clock_gettime(0,{1317770389.969538247 }) = 0 (0x0) 0.047604515 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,{0.000000 }) = 1 (0x1) 0.047668212 read(28,"\M-8\0\^E\a\0\0\0\0@\0\0\0\^A\0"...,5120) = 184 (0xb8) 0.049395293 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,0x0) = 1 (0x1) 0.049503689 clock_gettime(0,{1317770389.971526820 }) = 0 (0x0) 0.049606219 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,{0.000000 }) = 1 (0x1) 0.049669916 read(28,"\M-8\0\^E\a\0\0\0\0@\0\0\0\^A\0"...,5120) = 184 (0xb8) 0.049809882 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,0x0) = 1 (0x1) ... running with debug enabled it sits looping outputting:- routing message op = 7: ignored routing message op = 7: ignored routing message op = 7: ignored routing message op = 7: ignored routing message op = 7: ignored routing message op = 7: ignored routing message op = 7: ignored ... It seems socket 28 is a duplicate of an internal routing socket as seen here in the trace:- 0.044544269 socket(PF_ROUTE,SOCK_RAW,0) = 4 (0x4) 0.044595394 fcntl(4,F_DUPFD,0x14) = 28 (0x1c) 0.044645960 close(4) = 0 (0x0) 0.044695968 fcntl(28,F_SETFL,O_NONBLOCK) = 0 (0x0) Now this looks like its RTM_MISS as defined:- sys/net/route.h:#define RTM_MISS 0x7 /* Lookup failed on this address */ So the question was why is PF_ROUTE socket constantly spamming RTM_MISS? route -n monitor on this machines shows:- got message of size 184 on Tue Oct 4 23:46:36 2011 RTM_MISS: Lookup failed on this address: len 184, pid: 0, seq 0, errno 0, flags:<DONE> locks: inits: sockaddrs: <DST> ::A.B.C.D got message of size 184 on Tue Oct 4 23:46:36 2011 RTM_MISS: Lookup failed on this address: len 184, pid: 0, seq 0, errno 0, flags:<DONE> locks: inits: sockaddrs: <DST> ::A.B.C.D This seems very much like the following pr which was fixed:- "Remove a bogusly introduced rtalloc_ign() in rev. 1.335/SVN 178029, generating an RTM_MISS for every IP packet forwarded making user space routing daemons unhappy":- http://www.freebsd.org/cgi/query-pr.cgi?pr=124540 The box is doing no routing, its fairly basic install with 1 main IP on em0 + 1 alias + gw addres and 1 private ip on em1. Its running mysql and thats about it. Any ideas? Regards Steve ===============================================This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
Steven Hartland
2011-Oct-05 00:27 UTC
serious packet routing issue causing ntpd high load?
----- Original Message ----- From: "Steven Hartland" <killing@multiplay.co.uk> ..> This seems very much like the following pr which was fixed:- > "Remove a bogusly introduced rtalloc_ign() in rev. 1.335/SVN 178029, > generating an RTM_MISS for every IP packet forwarded making user space > routing daemons unhappy":- > http://www.freebsd.org/cgi/query-pr.cgi?pr=124540 > > The box is doing no routing, its fairly basic install with > 1 main IP on em0 + 1 alias + gw addres and 1 private ip on em1. > > Its running mysql and thats about it. > > Any ideas?This may also be causing significantly higher than expected kernel time as well:- 0 root 161 -8 0 0K 2560K - 1 44:26 23.88% kernel I've removed the alias on em0 and removed all addresses from em1 leaving just 1 address on em1 + lo0 and still seeing the same thing. Regards Steve ===============================================This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
Hi,> > RTM_MISS: Lookup failed on this address: len 184, pid: 0, seq 0, errno > 0, flags:<DONE> > locks: inits: > sockaddrs: <DST> > ::A.B.C.D >Would it be possible for you to email me what exactly does "::A.B.C.D" map into WRT your system or infrastructure ? And are you able to share your "ifconfig -a" and "netstat -rn" output with me privately ? --Qing> -----Original Message----- > From: owner-freebsd-stable@freebsd.org [mailto:owner-freebsd- > stable@freebsd.org] On Behalf Of Steven Hartland > Sent: Tuesday, October 04, 2011 5:00 PM > To: freebsd-stable@freebsd.org > Cc: liv3d@multiplay.co.uk > Subject: serious packet routing issue causing ntpd high load? > > We just updated a machine to 8-STABLE and I've noticed > that ntpd is using notible amounts of CPU 5-7% which > is very high for such a trivial daemon. > > 8.2-STABLE FreeBSD 8.2-STABLE #16: Tue Oct 4 09:53:17 UTC 2011 > > truss indicates its constantly checking and reading > from a socket > > 0.047297485 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,0x0) = 1 > (0x1) > 0.047513160 clock_gettime(0,{1317770389.969538247 }) = 0 (0x0) > 0.047604515 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,{0.000000 }) > = 1 (0x1) > 0.047668212 read(28,"\M-8\0\^E\a\0\0\0\0@\0\0\0\^A\0"...,5120) = 184 > (0xb8) > 0.049395293 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,0x0) = 1 > (0x1) > 0.049503689 clock_gettime(0,{1317770389.971526820 }) = 0 (0x0) > 0.049606219 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,{0.000000 }) > = 1 (0x1) > 0.049669916 read(28,"\M-8\0\^E\a\0\0\0\0@\0\0\0\^A\0"...,5120) = 184 > (0xb8) > 0.049809882 select(29,{20 21 22 23 24 25 26 27 28},0x0,0x0,0x0) = 1 > (0x1) > ... > > running with debug enabled it sits looping outputting:- > routing message op = 7: ignored > routing message op = 7: ignored > routing message op = 7: ignored > routing message op = 7: ignored > routing message op = 7: ignored > routing message op = 7: ignored > routing message op = 7: ignored > ... > > It seems socket 28 is a duplicate of an internal routing socket > as seen here in the trace:- > 0.044544269 socket(PF_ROUTE,SOCK_RAW,0) = 4 (0x4) > 0.044595394 fcntl(4,F_DUPFD,0x14) = 28 (0x1c) > 0.044645960 close(4) = 0 (0x0) > 0.044695968 fcntl(28,F_SETFL,O_NONBLOCK) = 0 (0x0) > > Now this looks like its RTM_MISS as defined:- > sys/net/route.h:#define RTM_MISS 0x7 /* Lookup failed on > this address */ > > So the question was why is PF_ROUTE socket constantly > spamming RTM_MISS? > > route -n monitor on this machines shows:- > got message of size 184 on Tue Oct 4 23:46:36 2011 > RTM_MISS: Lookup failed on this address: len 184, pid: 0, seq 0, errno > 0, flags:<DONE> > locks: inits: > sockaddrs: <DST> > ::A.B.C.D > > got message of size 184 on Tue Oct 4 23:46:36 2011 > RTM_MISS: Lookup failed on this address: len 184, pid: 0, seq 0, errno > 0, flags:<DONE> > locks: inits: > sockaddrs: <DST> > ::A.B.C.D > > This seems very much like the following pr which was fixed:- > "Remove a bogusly introduced rtalloc_ign() in rev. 1.335/SVN 178029, > generating an RTM_MISS for every IP packet forwarded making user space > routing daemons unhappy":- > http://www.freebsd.org/cgi/query-pr.cgi?pr=124540 > > The box is doing no routing, its fairly basic install with > 1 main IP on em0 + 1 alias + gw addres and 1 private ip on em1. > > Its running mysql and thats about it. > > Any ideas? > > Regards > Steve > > > > ===============================================> This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, printing > or otherwise disseminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable- > unsubscribe@freebsd.org"
Steven Hartland
2011-Oct-10 10:02 UTC
serious packet routing issue causing ntpd high load?
----- Original Message ----- From: "Li, Qing" <qing.li@bluecoat.com>>> RTM_MISS: Lookup failed on this address: len 184, pid: 0, seq 0, errno >> 0, flags:<DONE> >> locks: inits: >> sockaddrs: <DST> >> ::A.B.C.D >> > > Would it be possible for you to email me what exactly does "::A.B.C.D" > map into WRT your system or infrastructure ?Sorry for the slow reply been out of the country. All the hosts are local machines same /24 connecting to the server for mysql. It seems to be that every packet either to or from for the mysql server is generating an RTM_MISS.> And are you able to share your "ifconfig -a" and "netstat -rn" output > with me privately ?On its way. Regards Steve ===============================================This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
Okay, I just reproduced the problem. The strange thing is the routing message appears to be endless, but if I exit "route monitor" and restart it, the message disappears. That indicates to me it is not the kernel that is generating the routing message continuously, but maybe it is a socket buffer issue ? Otherwise when "route monitor" is resumed the messages should also resume, which is not the case here. Of course why the DNS message being generated in FIB <1> triggers a RTM_MISS in FIB <2> is an issue and I am looking into it as well. --Qing> -----Original Message----- > From: Steven Hartland [mailto:killing@multiplay.co.uk] > Sent: Monday, October 10, 2011 2:51 AM > To: Li, Qing; freebsd-stable@freebsd.org > Cc: liv3d@multiplay.co.uk > Subject: Re: serious packet routing issue causing ntpd high load? > > ----- Original Message ----- > From: "Li, Qing" <qing.li@bluecoat.com> > > >> RTM_MISS: Lookup failed on this address: len 184, pid: 0, seq 0, > errno > >> 0, flags:<DONE> > >> locks: inits: > >> sockaddrs: <DST> > >> ::A.B.C.D > >> > > > > Would it be possible for you to email me what exactly does > "::A.B.C.D" > > map into WRT your system or infrastructure ? > > Sorry for the slow reply been out of the country. > > All the hosts are local machines same /24 connecting to the server for > mysql. It seems to be that every packet either to or from for the mysql > server is generating an RTM_MISS. > > > And are you able to share your "ifconfig -a" and "netstat -rn" output > > with me privately ? > > On its way. > > Regards > Steve > > ===============================================> This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, printing > or otherwise disseminating it or any information contained in it. > > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk.