Hi Slawa, On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote:> On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: >> You can also use Dtrace and lockstat (especially with the lockstat -s >> option): >> >> https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks >> https://www.freebsd.org/cgi/man.cgi?query=lockstat&manpath=FreeBSD+11.0-RELEASE >> >> But I am less familiar with Dtrace/lockstat tools. > > I am still use old kernel and got lockdown again. > Try using lockstat (I am save more output), interesting may be next: > > R/W writer spin on writer: 190019 events in 1.070 seconds (177571 events/sec) > > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Lock Caller > 140839 74% 74% 0.00 24659 tcpinp tcp_tw_2msl_scan+0xc6 > > nsec ------ Time Distribution ------ count Stack > 4096 | 913 tcp_twstart+0xa3 > 8192 |@@@@@@@@@@@@ 58191 tcp_do_segment+0x201f > 16384 |@@@@@@ 29594 tcp_input+0xe1c > 32768 |@@@@ 23447 ip_input+0x15f > 65536 |@@@ 16197 > 131072 |@ 8674 > 262144 | 3358 > 524288 | 456 > 1048576 | 9 > ------------------------------------------------------------------------------- > Count indv cuml rcnt nsec Lock Caller > 49180 26% 100% 0.00 15929 tcpinp tcp_tw_2msl_scan+0xc6 > > nsec ------ Time Distribution ------ count Stack > 4096 | 157 pfslowtimo+0x54 > 8192 |@@@@@@@@@@@@@@@ 24796 softclock_call_cc+0x179 > 16384 |@@@@@@ 11223 softclock+0x44 > 32768 |@@@@ 7426 intr_event_execute_handlers+0x95 > 65536 |@@ 3918 > 131072 | 1363 > 262144 | 278 > 524288 | 19 > -------------------------------------------------------------------------------This is interesting, it seems that you have two call paths competing for INP locks here: - pfslowtimo()/tcp_tw_2msl_scan(reuse=0) and - tcp_input()/tcp_twstart()/tcp_tw_2msl_scan(reuse=1) These paths can indeed compete for the same INP lock, as both tcp_tw_2msl_scan() calls always start with the first inp found in twq_2msl list. But in both cases, this first inp should be quickly used and its lock released anyway, thus that could explain your situation it that the TCP stack is doing that all the time, for example: - Let say that you are running out completely and constantly of tcptw, and then all connections transitioning to TIME_WAIT state are competing with the TIME_WAIT timeout scan that tries to free all the expired tcptw. If the stack is doing that all the time, it can appear like "live" locked. This is just an hypothesis and as usual might be a red herring. Anyway, could you run: $ vmstat -z | head -2; vmstat -z | grep -E 'tcp|sock' Ideally, once when everything is ok, and once when you have the issue to see the differences (if any). If it appears your are quite low in tcptw, and if you have enough memory, could you try increase the tcptw limit using sysctl net.inet.tcp.maxtcptw? And actually see if it improve (or not) your performance. My 2 cents. -- Julien -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160921/8035e2d9/attachment.sig>
On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote:> > Hi Slawa, > > On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: > > On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: > >> You can also use Dtrace and lockstat (especially with the lockstat -s > >> option): > >> > >> https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks > >> https://www.freebsd.org/cgi/man.cgi?query=lockstat&manpath=FreeBSD+11.0-RELEASE > >> > >> But I am less familiar with Dtrace/lockstat tools. > > > > I am still use old kernel and got lockdown again. > > Try using lockstat (I am save more output), interesting may be next: > > > > R/W writer spin on writer: 190019 events in 1.070 seconds (177571 events/sec) > > > > ------------------------------------------------------------------------------- > > Count indv cuml rcnt nsec Lock Caller > > 140839 74% 74% 0.00 24659 tcpinp tcp_tw_2msl_scan+0xc6 > > > > nsec ------ Time Distribution ------ count Stack > > 4096 | 913 tcp_twstart+0xa3 > > 8192 |@@@@@@@@@@@@ 58191 tcp_do_segment+0x201f > > 16384 |@@@@@@ 29594 tcp_input+0xe1c > > 32768 |@@@@ 23447 ip_input+0x15f > > 65536 |@@@ 16197 > > 131072 |@ 8674 > > 262144 | 3358 > > 524288 | 456 > > 1048576 | 9 > > ------------------------------------------------------------------------------- > > Count indv cuml rcnt nsec Lock Caller > > 49180 26% 100% 0.00 15929 tcpinp tcp_tw_2msl_scan+0xc6 > > > > nsec ------ Time Distribution ------ count Stack > > 4096 | 157 pfslowtimo+0x54 > > 8192 |@@@@@@@@@@@@@@@ 24796 softclock_call_cc+0x179 > > 16384 |@@@@@@ 11223 softclock+0x44 > > 32768 |@@@@ 7426 intr_event_execute_handlers+0x95 > > 65536 |@@ 3918 > > 131072 | 1363 > > 262144 | 278 > > 524288 | 19 > > ------------------------------------------------------------------------------- > > This is interesting, it seems that you have two call paths competing > for INP locks here: > > - pfslowtimo()/tcp_tw_2msl_scan(reuse=0) and > > - tcp_input()/tcp_twstart()/tcp_tw_2msl_scan(reuse=1)I think same.> These paths can indeed compete for the same INP lock, as both > tcp_tw_2msl_scan() calls always start with the first inp found in > twq_2msl list. But in both cases, this first inp should be quickly used > and its lock released anyway, thus that could explain your situation it > that the TCP stack is doing that all the time, for example: > > - Let say that you are running out completely and constantly of tcptw, > and then all connections transitioning to TIME_WAIT state are competing > with the TIME_WAIT timeout scan that tries to free all the expired > tcptw. If the stack is doing that all the time, it can appear like > "live" locked. > > This is just an hypothesis and as usual might be a red herring. > Anyway, could you run: > > $ vmstat -z | head -2; vmstat -z | grep -E 'tcp|sock'ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP socket: 864, 4192664, 18604, 25348,49276158, 0, 0 tcp_inpcb: 464, 4192664, 34226, 18702,49250593, 0, 0 tcpcb: 1040, 4192665, 18424, 18953,49250593, 0, 0 tcptw: 88, 16425, 15802, 623,14526919, 8, 0 tcpreass: 40, 32800, 15, 2285, 632381, 0, 0 In normal case tcptw is about 16425/600/900 And after `sysctl -a | grep tcp` system stuck on serial console and I am reset it.> Ideally, once when everything is ok, and once when you have the issue > to see the differences (if any). > > If it appears your are quite low in tcptw, and if you have enough > memory, could you try increase the tcptw limit using sysctlI think this is not eliminate stuck, just may do it less frequency> net.inet.tcp.maxtcptw? And actually see if it improve (or not) your > performance.I am already play with net.inet.tcp.maxtcptw and it not affect performance.
On Wed, Sep 21, 2016 at 11:25:18PM +0200, Julien Charbon wrote:> > Hi Slawa, > > On 9/21/16 9:51 PM, Slawa Olhovchenkov wrote: > > On Wed, Sep 21, 2016 at 09:11:24AM +0200, Julien Charbon wrote: > >> You can also use Dtrace and lockstat (especially with the lockstat -s > >> option): > >> > >> https://wiki.freebsd.org/DTrace/One-Liners#Kernel_Locks > >> https://www.freebsd.org/cgi/man.cgi?query=lockstat&manpath=FreeBSD+11.0-RELEASE > >> > >> But I am less familiar with Dtrace/lockstat tools. > > > > I am still use old kernel and got lockdown again. > > Try using lockstat (I am save more output), interesting may be next: > > > > R/W writer spin on writer: 190019 events in 1.070 seconds (177571 events/sec) > > > > ------------------------------------------------------------------------------- > > Count indv cuml rcnt nsec Lock Caller > > 140839 74% 74% 0.00 24659 tcpinp tcp_tw_2msl_scan+0xc6 > > > > nsec ------ Time Distribution ------ count Stack > > 4096 | 913 tcp_twstart+0xa3 > > 8192 |@@@@@@@@@@@@ 58191 tcp_do_segment+0x201f > > 16384 |@@@@@@ 29594 tcp_input+0xe1c > > 32768 |@@@@ 23447 ip_input+0x15f > > 65536 |@@@ 16197 > > 131072 |@ 8674 > > 262144 | 3358 > > 524288 | 456 > > 1048576 | 9 > > ------------------------------------------------------------------------------- > > Count indv cuml rcnt nsec Lock Caller > > 49180 26% 100% 0.00 15929 tcpinp tcp_tw_2msl_scan+0xc6 > > > > nsec ------ Time Distribution ------ count Stack > > 4096 | 157 pfslowtimo+0x54 > > 8192 |@@@@@@@@@@@@@@@ 24796 softclock_call_cc+0x179 > > 16384 |@@@@@@ 11223 softclock+0x44 > > 32768 |@@@@ 7426 intr_event_execute_handlers+0x95 > > 65536 |@@ 3918 > > 131072 | 1363 > > 262144 | 278 > > 524288 | 19 > > ------------------------------------------------------------------------------- > > This is interesting, it seems that you have two call paths competing > for INP locks here: > > - pfslowtimo()/tcp_tw_2msl_scan(reuse=0) and > > - tcp_input()/tcp_twstart()/tcp_tw_2msl_scan(reuse=1)My current hypothesis: nginx do write() (or may be close()?) to socket, kernel lock first inp in V_twq_2msl, happen callout for pfslowtimo() on the same CPU core and tcp_tw_2msl_scan infinity locked on same inp. In this case you modification can't help, before next try we need some like yeld().