On Thu, Sep 22, 2016 at 01:20:45PM +0300, Slawa Olhovchenkov wrote:> On Thu, Sep 22, 2016 at 12:04:40PM +0200, Julien Charbon wrote: > > > >> These paths can indeed compete for the same INP lock, as both > > >> tcp_tw_2msl_scan() calls always start with the first inp found in > > >> twq_2msl list. But in both cases, this first inp should be quickly used > > >> and its lock released anyway, thus that could explain your situation it > > >> that the TCP stack is doing that all the time, for example: > > >> > > >> - Let say that you are running out completely and constantly of tcptw, > > >> and then all connections transitioning to TIME_WAIT state are competing > > >> with the TIME_WAIT timeout scan that tries to free all the expired > > >> tcptw. If the stack is doing that all the time, it can appear like > > >> "live" locked. > > >> > > >> This is just an hypothesis and as usual might be a red herring. > > >> Anyway, could you run: > > >> > > >> $ vmstat -z | head -2; vmstat -z | grep -E 'tcp|sock' > > > > > > ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP > > > > > > socket: 864, 4192664, 18604, 25348,49276158, 0, 0 > > > tcp_inpcb: 464, 4192664, 34226, 18702,49250593, 0, 0 > > > tcpcb: 1040, 4192665, 18424, 18953,49250593, 0, 0 > > > tcptw: 88, 16425, 15802, 623,14526919, 8, 0 > > > tcpreass: 40, 32800, 15, 2285, 632381, 0, 0 > > > > > > In normal case tcptw is about 16425/600/900 > > > > > > And after `sysctl -a | grep tcp` system stuck on serial console and I am reset it. > > > > > >> Ideally, once when everything is ok, and once when you have the issue > > >> to see the differences (if any). > > >> > > >> If it appears your are quite low in tcptw, and if you have enough > > >> memory, could you try increase the tcptw limit using sysctl > > > > > > I think this is not eliminate stuck, just may do it less frequency > > > > You are right, it would just be a big hint that the tcp_tw_2msl_scan() > > contention hypothesis is the right one. As I see you have plenty of > > memory on your server, thus could you try with: > > > > net.inet.tcp.maxtcptw=4192665 > > > > And see what happen. Just to validate this hypothesis. > > This is bad way for validate, with maxtcptw=16384 happened is random > and can be waited for month. After maxtcptw=4192665 I am don't know > how long need to wait for verification this hypothesis. > > More frequency (may be 3-5 times per day) happening less traffic drops > (not to zero for minutes). May be this caused also by contention in > tcp_tw_2msl_scan, but fast resolved (stochastic process). By eating > CPU power nginx can't service connection and clients closed > connections and need more TIME_WAIT and can trigered > tcp_tw_2msl_scan(reuse=1). After this we can got live lock. > > May be after I learning to catch and dignostic this validation is more > accurately.Some more bits: socket: 864, 4192664, 30806, 790,28524160, 0, 0 ipq: 56, 32802, 0, 1278, 1022, 0, 0 udp_inpcb: 464, 4192664, 44, 364, 14066, 0, 0 udpcb: 32, 4192750, 44, 3081, 14066, 0, 0 tcp_inpcb: 464, 4192664, 38558, 378,28476709, 0, 0 tcpcb: 1040, 4192665, 30690, 738,28476709, 0, 0 tcptw: 88, 32805, 7868, 772, 8412249, 0, 0 last pid: 49575; load averages: 2.00, 2.05, 3.75 up 1+01:12:08 22:13:42 853 processes: 15 running, 769 sleeping, 35 waiting, 34 lock CPU 0: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 4: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 5: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 6: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle CPU 7: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 8: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 9: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle CPU 10: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle CPU 11: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 8659M Active, 8385M Inact, 107G Wired, 1325M Free ARC: 99G Total, 88G MFU, 10G MRU, 32K Anon, 167M Header, 529M Other Swap: 32G Total, 32G Free PID UID PRI NICE SIZE RES STATE C TIME WCPU COMMAND 49566 0 20 0 26264K 7068K CPU10 10 0:00 0.14% top 5 0 -8 - 0K 192K l2arc_ 6 31:07 0.13% zfskern{l2arc_fe 46120 0 20 0 14608K 3868K nanslp 8 0:10 0.10% tcpw 12 0 -76 - 0K 848K WAIT 1 0:00 0.02% intr{swi0: uart total = used + free Max used total used total used total used total 2016-09-23 21:46:41 tcptw 32805 389 1665 0 tcpcb 27788 28557 inpcb 28177 29296 socket 27875 28740 2016-09-23 21:46:42 tcptw 32805 409 1665 0 tcpcb 27723 28557 inpcb 28132 29296 socket 27812 28740 2016-09-23 21:46:43 tcptw 32805 405 1665 0 tcpcb 27750 28557 inpcb 28155 29296 socket 27838 28740 2016-09-23 21:46:44 tcptw 32805 409 1665 0 tcpcb 27765 28557 inpcb 28174 29296 socket 27852 28740 2016-09-23 21:46:45 tcptw 32805 422 1665 0 tcpcb 27737 28557 inpcb 28159 29296 socket 27824 28740 2016-09-23 21:46:46 tcptw 32805 444 1665 0 tcpcb 27729 28557 inpcb 28173 29296 socket 27817 28740 2016-09-23 21:46:47 tcptw 32805 444 1665 0 tcpcb 27712 28557 inpcb 28156 29296 socket 27800 28740 2016-09-23 21:46:48 tcptw 32805 428 1665 0 tcpcb 27693 28557 inpcb 28121 29296 socket 27783 28740 2016-09-23 21:46:49 tcptw 32805 431 1665 0 tcpcb 27677 28557 inpcb 28108 29296 socket 27766 28740 2016-09-23 21:46:50 tcptw 32805 455 1665 0 tcpcb 27721 28557 inpcb 28176 29296 socket 27809 28740 2016-09-23 21:46:51 tcptw 32805 432 1665 0 tcpcb 27739 28557 inpcb 28171 29296 socket 27828 28740 2016-09-23 21:46:52 tcptw 32805 434 1665 0 tcpcb 27731 28557 inpcb 28165 29296 socket 27819 28740 2016-09-23 21:46:53 tcptw 32805 431 1665 0 tcpcb 27742 28557 inpcb 28173 29296 socket 27831 28740 2016-09-23 21:46:54 tcptw 32805 424 1665 0 tcpcb 27738 28557 inpcb 28162 29296 socket 27826 28740 2016-09-23 21:46:55 tcptw 32805 397 1665 0 tcpcb 27740 28557 inpcb 28137 29296 socket 27827 28740 2016-09-23 21:46:56 tcptw 32805 412 1665 0 tcpcb 27742 28557 inpcb 28154 29296 socket 27830 28740 2016-09-23 21:46:57 tcptw 32805 418 1665 0 tcpcb 27749 28557 inpcb 28167 29296 socket 27838 28740 2016-09-23 21:46:58 tcptw 32805 426 1665 0 tcpcb 27740 28557 inpcb 28166 29296 socket 27827 28740 2016-09-23 21:46:59 tcptw 32805 423 1665 0 tcpcb 27687 28557 inpcb 28110 29296 socket 27773 28740 2016-09-23 21:47:00 tcptw 32805 426 1665 0 tcpcb 27716 28557 inpcb 28142 29296 socket 27804 28740 2016-09-23 21:47:01 tcptw 32805 437 1665 0 tcpcb 27732 28557 inpcb 28169 29296 socket 27821 28740 2016-09-23 21:47:02 tcptw 32805 471 1665 0 tcpcb 27672 28557 inpcb 28143 29296 socket 27760 28740 2016-09-23 21:47:03 tcptw 32805 426 1665 0 tcpcb 27752 28557 inpcb 28178 29296 socket 27838 28740 2016-09-23 21:47:04 tcptw 32805 402 1665 0 tcpcb 27760 28557 inpcb 28162 29296 socket 27847 28740 2016-09-23 21:47:05 tcptw 32805 406 1665 0 tcpcb 27757 28557 inpcb 28163 29296 socket 27845 28740 2016-09-23 21:47:06 tcptw 32805 443 1665 0 tcpcb 27783 28557 inpcb 28226 29296 socket 27871 28740 2016-09-23 21:47:07 tcptw 32805 484 1665 0 tcpcb 27707 28557 inpcb 28191 29296 socket 27794 28740 2016-09-23 21:47:08 tcptw 32805 473 1665 0 tcpcb 27721 28557 inpcb 28194 29296 socket 27807 28740 2016-09-23 21:47:09 tcptw 32805 432 1665 0 tcpcb 27749 28557 inpcb 28181 29296 socket 27837 28740 2016-09-23 21:47:10 tcptw 32805 421 1665 0 tcpcb 27780 28557 inpcb 28201 29296 socket 27868 28740 2016-09-23 21:47:11 tcptw 32805 530 1665 0 tcpcb 27814 28557 inpcb 28344 29296 socket 27902 28740 2016-09-23 21:47:12 tcptw 32805 680 1665 0 tcpcb 27874 28557 inpcb 28554 29296 socket 27964 28740 2016-09-23 21:47:13 tcptw 32805 832 1665 0 tcpcb 27881 28557 inpcb 28713 29296 socket 27971 28740 2016-09-23 21:47:14 tcptw 32805 997 1665 0 tcpcb 27880 28557 inpcb 28877 29352 socket 27972 28740 2016-09-23 21:47:15 tcptw 32805 1155 1890 0 tcpcb 27931 28557 inpcb 29086 29560 socket 28023 28740 2016-09-23 21:47:16 tcptw 32805 1322 2250 0 tcpcb 27981 28557 inpcb 29303 29800 socket 28075 28740 2016-09-23 21:47:17 tcptw 32805 1496 2385 0 tcpcb 28065 28557 inpcb 29561 30040 socket 28159 28740 2016-09-23 21:47:18 tcptw 32805 1648 2385 0 tcpcb 28151 28557 inpcb 29799 30280 socket 28245 28740 2016-09-23 21:47:19 tcptw 32805 1790 2655 0 tcpcb 28398 28599 inpcb 30188 30672 socket 28492 28796 2016-09-23 21:47:20 tcptw 32805 1954 2655 0 tcpcb 28712 28923 inpcb 30666 31120 socket 28807 29116 2016-09-23 21:47:21 tcptw 32805 2115 3015 0 tcpcb 29061 29244 inpcb 31176 31576 socket 29156 29468 2016-09-23 21:47:22 tcptw 32805 2265 3150 0 tcpcb 29335 29538 inpcb 31600 32056 socket 29430 29704 2016-09-23 21:47:23 tcptw 32805 2424 3150 0 tcpcb 29553 29775 inpcb 31977 32440 socket 29648 29956 2016-09-23 21:47:24 tcptw 32805 2590 3375 0 tcpcb 29711 29901 inpcb 32301 32744 socket 29807 30112 2016-09-23 21:47:25 tcptw 32805 2760 3780 0 tcpcb 29794 30015 inpcb 32554 33040 socket 29891 30224 2016-09-23 21:47:26 tcptw 32805 2935 3915 0 tcpcb 29879 30111 inpcb 32814 33312 socket 29976 30292 2016-09-23 21:47:27 tcptw 32805 3109 3915 0 tcpcb 29953 30195 inpcb 33062 33584 socket 30050 30392 2016-09-23 21:47:28 tcptw 32805 3264 4140 0 tcpcb 30060 30267 inpcb 33324 33824 socket 30158 30476 2016-09-23 21:47:29 tcptw 32805 3435 4275 0 tcpcb 30137 30363 inpcb 33572 34032 socket 30235 30572 2016-09-23 21:47:30 tcptw 32805 3600 4500 0 tcpcb 30221 30489 inpcb 33821 34304 socket 30320 30644 2016-09-23 21:47:31 tcptw 32805 3775 4635 0 tcpcb 30309 30588 inpcb 34084 34576 socket 30408 30740 2016-09-23 21:47:32 tcptw 32805 3936 4770 0 tcpcb 30534 30741 inpcb 34470 34960 socket 30634 30908 2016-09-23 21:47:33 tcptw 32805 4097 4905 0 tcpcb 30744 30951 inpcb 34841 35352 socket 30844 31160 2016-09-23 21:47:34 tcptw 32805 4233 5040 0 tcpcb 31006 31176 inpcb 35239 35680 socket 31106 31372 2016-09-23 21:47:35 tcptw 32805 4366 5265 0 tcpcb 31160 31386 inpcb 35526 35920 socket 31260 31568 2016-09-23 21:47:36 tcptw 32805 4738 5535 0 tcpcb 29529 31428 inpcb 34267 36016 socket 29629 31596 2016-09-23 21:47:37 tcptw 32805 4879 5625 0 tcpcb 29506 31428 inpcb 34385 36016 socket 29607 31596 2016-09-23 21:47:38 tcptw 32805 5011 5895 0 tcpcb 29590 31428 inpcb 34601 36016 socket 29691 31596 2016-09-23 21:47:39 tcptw 32805 5130 5895 0 tcpcb 29713 31428 inpcb 34843 36016 socket 29815 31596 2016-09-23 21:47:40 tcptw 32805 5259 6165 0 tcpcb 29783 31428 inpcb 35042 36016 socket 29886 31596 2016-09-23 21:47:41 tcptw 32805 5378 6255 0 tcpcb 29606 31428 inpcb 34984 36016 socket 29709 31596 2016-09-23 21:47:42 tcptw 32805 5489 6255 0 tcpcb 29638 31428 inpcb 35127 36016 socket 29741 31596 2016-09-23 21:47:43 tcptw 32805 5629 6390 0 tcpcb 29630 31428 inpcb 35259 36016 socket 29735 31596 2016-09-23 21:47:44 tcptw 32805 5754 6660 0 tcpcb 29593 31428 inpcb 35347 36016 socket 29696 31596 2016-09-23 21:47:45 tcptw 32805 5887 6660 0 tcpcb 29606 31428 inpcb 35493 36016 socket 29709 31596 2016-09-23 21:47:46 tcptw 32805 6011 6750 0 tcpcb 29613 31428 inpcb 35624 36016 socket 29716 31596 2016-09-23 21:47:47 tcptw 32805 6128 7020 0 tcpcb 29642 31428 inpcb 35770 36128 socket 29745 31596 2016-09-23 21:47:48 tcptw 32805 6250 7020 0 tcpcb 29742 31428 inpcb 35992 36416 socket 29845 31596 2016-09-23 21:47:49 tcptw 32805 6378 7155 0 tcpcb 29745 31428 inpcb 36123 36472 socket 29850 31596 2016-09-23 21:47:50 tcptw 32805 6486 7290 0 tcpcb 29756 31428 inpcb 36242 36648 socket 29861 31596 2016-09-23 21:47:51 tcptw 32805 6603 7515 0 tcpcb 29807 31428 inpcb 36410 36792 socket 29912 31596 2016-09-23 21:47:52 tcptw 32805 6736 7515 0 tcpcb 29830 31428 inpcb 36566 36912 socket 29935 31596 2016-09-23 21:47:53 tcptw 32805 6852 7785 0 tcpcb 29892 31428 inpcb 36744 37112 socket 29996 31596 2016-09-23 21:47:54 tcptw 32805 6991 7785 0 tcpcb 29876 31428 inpcb 36867 37288 socket 29981 31596 2016-09-23 21:47:55 tcptw 32805 7102 8010 0 tcpcb 29928 31428 inpcb 37030 37400 socket 30033 31596 2016-09-23 21:47:56 tcptw 32805 7227 8010 0 tcpcb 29960 31428 inpcb 37187 37544 socket 30065 31596 2016-09-23 21:47:57 tcptw 32805 7356 8280 0 tcpcb 30031 31428 inpcb 37387 37752 socket 30136 31596 2016-09-23 21:47:58 tcptw 32805 7505 8415 0 tcpcb 30164 31428 inpcb 37669 38040 socket 30270 31596 2016-09-23 21:47:59 tcptw 32805 7618 8415 0 tcpcb 30302 31428 inpcb 37920 38328 socket 30408 31596 2016-09-23 21:48:00 tcptw 32805 7740 8505 0 tcpcb 30478 31428 inpcb 38218 38560 socket 30584 31596 2016-09-23 21:48:01 tcptw 32805 7858 8640 0 tcpcb 30663 31428 inpcb 38521 38880 socket 30769 31596
On Fri, Sep 23, 2016 at 10:16:56PM +0300, Slawa Olhovchenkov wrote:> On Thu, Sep 22, 2016 at 01:20:45PM +0300, Slawa Olhovchenkov wrote: > > > On Thu, Sep 22, 2016 at 12:04:40PM +0200, Julien Charbon wrote: > > > > > >> These paths can indeed compete for the same INP lock, as both > > > >> tcp_tw_2msl_scan() calls always start with the first inp found in > > > >> twq_2msl list. But in both cases, this first inp should be quickly used > > > >> and its lock released anyway, thus that could explain your situation it > > > >> that the TCP stack is doing that all the time, for example: > > > >> > > > >> - Let say that you are running out completely and constantly of tcptw, > > > >> and then all connections transitioning to TIME_WAIT state are competing > > > >> with the TIME_WAIT timeout scan that tries to free all the expired > > > >> tcptw. If the stack is doing that all the time, it can appear like > > > >> "live" locked. > > > >> > > > >> This is just an hypothesis and as usual might be a red herring. > > > >> Anyway, could you run: > > > >> > > > >> $ vmstat -z | head -2; vmstat -z | grep -E 'tcp|sock' > > > > > > > > ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP > > > > > > > > socket: 864, 4192664, 18604, 25348,49276158, 0, 0 > > > > tcp_inpcb: 464, 4192664, 34226, 18702,49250593, 0, 0 > > > > tcpcb: 1040, 4192665, 18424, 18953,49250593, 0, 0 > > > > tcptw: 88, 16425, 15802, 623,14526919, 8, 0 > > > > tcpreass: 40, 32800, 15, 2285, 632381, 0, 0 > > > > > > > > In normal case tcptw is about 16425/600/900 > > > > > > > > And after `sysctl -a | grep tcp` system stuck on serial console and I am reset it. > > > > > > > >> Ideally, once when everything is ok, and once when you have the issue > > > >> to see the differences (if any). > > > >> > > > >> If it appears your are quite low in tcptw, and if you have enough > > > >> memory, could you try increase the tcptw limit using sysctl > > > > > > > > I think this is not eliminate stuck, just may do it less frequency > > > > > > You are right, it would just be a big hint that the tcp_tw_2msl_scan() > > > contention hypothesis is the right one. As I see you have plenty of > > > memory on your server, thus could you try with: > > > > > > net.inet.tcp.maxtcptw=4192665 > > > > > > And see what happen. Just to validate this hypothesis. > > > > This is bad way for validate, with maxtcptw=16384 happened is random > > and can be waited for month. After maxtcptw=4192665 I am don't know > > how long need to wait for verification this hypothesis. > > > > More frequency (may be 3-5 times per day) happening less traffic drops > > (not to zero for minutes). May be this caused also by contention in > > tcp_tw_2msl_scan, but fast resolved (stochastic process). By eating > > CPU power nginx can't service connection and clients closed > > connections and need more TIME_WAIT and can trigered > > tcp_tw_2msl_scan(reuse=1). After this we can got live lock. > > > > May be after I learning to catch and dignostic this validation is more > > accurately. > > Some more bits: > > socket: 864, 4192664, 30806, 790,28524160, 0, 0 > ipq: 56, 32802, 0, 1278, 1022, 0, 0 > udp_inpcb: 464, 4192664, 44, 364, 14066, 0, 0 > udpcb: 32, 4192750, 44, 3081, 14066, 0, 0 > tcp_inpcb: 464, 4192664, 38558, 378,28476709, 0, 0 > tcpcb: 1040, 4192665, 30690, 738,28476709, 0, 0 > tcptw: 88, 32805, 7868, 772, 8412249, 0, 0 > > last pid: 49575; load averages: 2.00, 2.05, 3.75 up 1+01:12:08 22:13:42 > 853 processes: 15 running, 769 sleeping, 35 waiting, 34 lock > CPU 0: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle > CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 4: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 5: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 6: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle > CPU 7: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 8: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 9: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 10: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle > CPU 11: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > Mem: 8659M Active, 8385M Inact, 107G Wired, 1325M Free > ARC: 99G Total, 88G MFU, 10G MRU, 32K Anon, 167M Header, 529M Other > Swap: 32G Total, 32G FreeI can collect some more info. First: 1046 - - R 125:10.79 nginx: worker process (nginx) 1047 kqread - I 150:11.98 nginx: worker process (nginx) 1049 kqread - I 145:58.35 nginx: worker process (nginx) 1050 kqread - I 136:33.36 nginx: worker process (nginx) 1051 kqread - I 140:59.73 nginx: worker process (nginx) 1052 kqread - I 137:18.12 nginx: worker process (nginx) pid 1046 is nginx running on CPU0 (affinity mask set). # procstat -k -k 1046 PID TID COMM TDNAME KSTACK 1046 100686 nginx - mi_switch+0xd2 critical_exit+0x7e lapic_handle_timer+0xb1 Xtimerint+0x8c __mtx_lock_sleep+0x168 zone_fetch_slab+0x47 zone_import+0x52 zone_alloc_item+0x36 keg_alloc_slab+0x63 keg_fetch_slab+0x16e zone_fetch_slab+0x6e zone_import+0x52 uma_zalloc_arg+0x36e m_getm2+0x14f m_uiotombuf+0x64 sosend_generic+0x356 soo_write+0x42 dofilewrite+0x87 Tracing command nginx pid 1046 tid 100686 td 0xfffff8014485f500 sched_switch() at 0xffffffff804c956d = sched_switch+0x6ad/frame 0xfffffe20216992a0 /usr/src/sys/kern/sched_ule.c:1973 mi_switch() at 0xffffffff804a8d92 = mi_switch+0xd2/frame 0xfffffe20216992d0 /usr/src/sys/kern/kern_synch.c:465 critical_exit() at 0xffffffff804a6bee = critical_exit+0x7e/frame 0xfffffe20216992f0 /usr/src/sys/kern/kern_switch.c:219 lapic_handle_timer() at 0xffffffff80771701 = lapic_handle_timer+0xb1/frame 0xfffffe2021699330 /usr/src/sys/x86/x86/local_apic.c:1185 Xtimerint() at 0xffffffff806cbbcc = Xtimerint+0x8c/frame 0xfffffe2021699330 /usr/src/sys/amd64/amd64/apic_vector.S:135 --- interrupt, rip = 0xffffffff804de424, rsp = 0xfffffe2021699400, rbp = 0xfffffe2021699420 --- lock_delay() at 0xffffffff804de424 = lock_delay+0x54/frame 0xfffffe2021699420 /usr/src/sys/kern/subr_lock.c:127 __mtx_lock_sleep() at 0xffffffff80484dc8 = __mtx_lock_sleep+0x168/frame 0xfffffe20216994a0 /usr/src/sys/kern/kern_mutex.c:512 zone_fetch_slab() at 0xffffffff806a4257 = zone_fetch_slab+0x47/frame 0xfffffe20216994e0 /usr/src/sys/vm/uma_core.c:2378 zone_import() at 0xffffffff806a4312 = zone_import+0x52/frame 0xfffffe2021699530 /usr/src/sys/vm/uma_core.c:2501 zone_alloc_item() at 0xffffffff806a0986 = zone_alloc_item+0x36/frame 0xfffffe2021699570 /usr/src/sys/vm/uma_core.c:2591 keg_alloc_slab() at 0xffffffff806a2463 = keg_alloc_slab+0x63/frame 0xfffffe20216995d0 /usr/src/sys/vm/uma_core.c:965 keg_fetch_slab() at 0xffffffff806a48ce = keg_fetch_slab+0x16e/frame 0xfffffe2021699620 /usr/src/sys/vm/uma_core.c:2349 zone_fetch_slab() at 0xffffffff806a427e = zone_fetch_slab+0x6e/frame 0xfffffe2021699660 /usr/src/sys/vm/uma_core.c:2375 zone_import() at 0xffffffff806a4312 = zone_import+0x52/frame 0xfffffe20216996b0 /usr/src/sys/vm/uma_core.c:2501 uma_zalloc_arg() at 0xffffffff806a147e = uma_zalloc_arg+0x36e/frame 0xfffffe2021699720 /usr/src/sys/vm/uma_core.c:2531 m_getm2() at 0xffffffff8048231f = m_getm2+0x14f/frame 0xfffffe2021699790 /usr/src/sys/kern/kern_mbuf.c:830 m_uiotombuf() at 0xffffffff80516044 = m_uiotombuf+0x64/frame 0xfffffe20216997e0 /usr/src/sys/kern/uipc_mbuf.c:1535 sosend_generic() at 0xffffffff8051ce56 = sosend_generic+0x356/frame 0xfffffe20216998a0 soo_write() at 0xffffffff804fd872 = soo_write+0x42/frame 0xfffffe20216998d0 dofilewrite() at 0xffffffff804f5c97 = dofilewrite+0x87/frame 0xfffffe2021699920 kern_writev() at 0xffffffff804f5978 = kern_writev+0x68/frame 0xfffffe2021699970 sys_writev() at 0xffffffff804f5be6 = sys_writev+0x36/frame 0xfffffe20216999a0 amd64_syscall() at 0xffffffff806e4051 = amd64_syscall+0x2c1/frame 0xfffffe2021699ab0 Xfast_syscall() at 0xffffffff806cb2bb = Xfast_syscall+0xfb/frame 0xfffffe2021699ab0 --- syscall (121, FreeBSD ELF64, sys_writev), rip = 0x8019cc6ba, rsp = 0x7fffffffd688, rbp = 0x7fffffffd6c0 --- And I am now collect and save kernel dump. # vmstat -M /var/crash/vmcore.1 -z| grep -i mbuf mbuf_packet: 256, 52276395, 26987, 189713,705271653, 0, 0 mbuf: 256, 52276395, 61994, 782439,46250344524, 0, 0 mbuf_cluster: 2048, 1048576, 49207, 34461, 293307, 0, 0 mbuf_jumbo_page: 4096, 4084093, 61839, 613537,8265099417, 0, 0 mbuf_jumbo_9k: 9216, 3630303, 0, 0, 0, 0, 0 mbuf_jumbo_16k: 16384, 2722728, 0, 0, 0, 0, 0
Hi Slawa, On 9/23/16 9:16 PM, Slawa Olhovchenkov wrote:> On Thu, Sep 22, 2016 at 01:20:45PM +0300, Slawa Olhovchenkov wrote: > >> On Thu, Sep 22, 2016 at 12:04:40PM +0200, Julien Charbon wrote: >> >>>>> These paths can indeed compete for the same INP lock, as both >>>>> tcp_tw_2msl_scan() calls always start with the first inp found in >>>>> twq_2msl list. But in both cases, this first inp should be quickly used >>>>> and its lock released anyway, thus that could explain your situation it >>>>> that the TCP stack is doing that all the time, for example: >>>>> >>>>> - Let say that you are running out completely and constantly of tcptw, >>>>> and then all connections transitioning to TIME_WAIT state are competing >>>>> with the TIME_WAIT timeout scan that tries to free all the expired >>>>> tcptw. If the stack is doing that all the time, it can appear like >>>>> "live" locked. >>>>> >>>>> This is just an hypothesis and as usual might be a red herring. >>>>> Anyway, could you run: >>>>> >>>>> $ vmstat -z | head -2; vmstat -z | grep -E 'tcp|sock' >>>> >>>> ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP >>>> >>>> socket: 864, 4192664, 18604, 25348,49276158, 0, 0 >>>> tcp_inpcb: 464, 4192664, 34226, 18702,49250593, 0, 0 >>>> tcpcb: 1040, 4192665, 18424, 18953,49250593, 0, 0 >>>> tcptw: 88, 16425, 15802, 623,14526919, 8, 0 >>>> tcpreass: 40, 32800, 15, 2285, 632381, 0, 0 >>>> >>>> In normal case tcptw is about 16425/600/900 >>>> >>>> And after `sysctl -a | grep tcp` system stuck on serial console and I am reset it. >>>> >>>>> Ideally, once when everything is ok, and once when you have the issue >>>>> to see the differences (if any). >>>>> >>>>> If it appears your are quite low in tcptw, and if you have enough >>>>> memory, could you try increase the tcptw limit using sysctl >>>> >>>> I think this is not eliminate stuck, just may do it less frequency >>> >>> You are right, it would just be a big hint that the tcp_tw_2msl_scan() >>> contention hypothesis is the right one. As I see you have plenty of >>> memory on your server, thus could you try with: >>> >>> net.inet.tcp.maxtcptw=4192665 >>> >>> And see what happen. Just to validate this hypothesis. >> >> This is bad way for validate, with maxtcptw=16384 happened is random >> and can be waited for month. After maxtcptw=4192665 I am don't know >> how long need to wait for verification this hypothesis. >> >> More frequency (may be 3-5 times per day) happening less traffic drops >> (not to zero for minutes). May be this caused also by contention in >> tcp_tw_2msl_scan, but fast resolved (stochastic process). By eating >> CPU power nginx can't service connection and clients closed >> connections and need more TIME_WAIT and can trigered >> tcp_tw_2msl_scan(reuse=1). After this we can got live lock. >> >> May be after I learning to catch and dignostic this validation is more >> accurately. > > Some more bits: > > socket: 864, 4192664, 30806, 790,28524160, 0, 0 > ipq: 56, 32802, 0, 1278, 1022, 0, 0 > udp_inpcb: 464, 4192664, 44, 364, 14066, 0, 0 > udpcb: 32, 4192750, 44, 3081, 14066, 0, 0 > tcp_inpcb: 464, 4192664, 38558, 378,28476709, 0, 0 > tcpcb: 1040, 4192665, 30690, 738,28476709, 0, 0 > tcptw: 88, 32805, 7868, 772, 8412249, 0, 0 > > last pid: 49575; load averages: 2.00, 2.05, 3.75 up 1+01:12:08 22:13:42 > 853 processes: 15 running, 769 sleeping, 35 waiting, 34 lock > CPU 0: 0.0% user, 0.0% nice, 0.0% system, 100% interrupt, 0.0% idle > CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 4: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 5: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 6: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle > CPU 7: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 8: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 9: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > CPU 10: 0.0% user, 0.0% nice, 0.4% system, 0.0% interrupt, 99.6% idle > CPU 11: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle > Mem: 8659M Active, 8385M Inact, 107G Wired, 1325M Free > ARC: 99G Total, 88G MFU, 10G MRU, 32K Anon, 167M Header, 529M Other > Swap: 32G Total, 32G Free > > PID UID PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 49566 0 20 0 26264K 7068K CPU10 10 0:00 0.14% top > 5 0 -8 - 0K 192K l2arc_ 6 31:07 0.13% zfskern{l2arc_fe > 46120 0 20 0 14608K 3868K nanslp 8 0:10 0.10% tcpw > 12 0 -76 - 0K 848K WAIT 1 0:00 0.02% intr{swi0: uart > > > total = used + free > Max used total used total used total used total > 2016-09-23 21:46:41 tcptw 32805 389 1665 0 tcpcb 27788 28557 inpcb 28177 29296 socket 27875 28740 > 2016-09-23 21:46:42 tcptw 32805 409 1665 0 tcpcb 27723 28557 inpcb 28132 29296 socket 27812 28740 > 2016-09-23 21:46:43 tcptw 32805 405 1665 0 tcpcb 27750 28557 inpcb 28155 29296 socket 27838 28740 > 2016-09-23 21:46:44 tcptw 32805 409 1665 0 tcpcb 27765 28557 inpcb 28174 29296 socket 27852 28740 > 2016-09-23 21:46:45 tcptw 32805 422 1665 0 tcpcb 27737 28557 inpcb 28159 29296 socket 27824 28740 > 2016-09-23 21:46:46 tcptw 32805 444 1665 0 tcpcb 27729 28557 inpcb 28173 29296 socket 27817 28740 > 2016-09-23 21:46:47 tcptw 32805 444 1665 0 tcpcb 27712 28557 inpcb 28156 29296 socket 27800 28740 > 2016-09-23 21:46:48 tcptw 32805 428 1665 0 tcpcb 27693 28557 inpcb 28121 29296 socket 27783 28740 > 2016-09-23 21:46:49 tcptw 32805 431 1665 0 tcpcb 27677 28557 inpcb 28108 29296 socket 27766 28740 > 2016-09-23 21:46:50 tcptw 32805 455 1665 0 tcpcb 27721 28557 inpcb 28176 29296 socket 27809 28740 > 2016-09-23 21:46:51 tcptw 32805 432 1665 0 tcpcb 27739 28557 inpcb 28171 29296 socket 27828 28740 > 2016-09-23 21:46:52 tcptw 32805 434 1665 0 tcpcb 27731 28557 inpcb 28165 29296 socket 27819 28740 > 2016-09-23 21:46:53 tcptw 32805 431 1665 0 tcpcb 27742 28557 inpcb 28173 29296 socket 27831 28740 > 2016-09-23 21:46:54 tcptw 32805 424 1665 0 tcpcb 27738 28557 inpcb 28162 29296 socket 27826 28740 > 2016-09-23 21:46:55 tcptw 32805 397 1665 0 tcpcb 27740 28557 inpcb 28137 29296 socket 27827 28740 > 2016-09-23 21:46:56 tcptw 32805 412 1665 0 tcpcb 27742 28557 inpcb 28154 29296 socket 27830 28740 > 2016-09-23 21:46:57 tcptw 32805 418 1665 0 tcpcb 27749 28557 inpcb 28167 29296 socket 27838 28740 > 2016-09-23 21:46:58 tcptw 32805 426 1665 0 tcpcb 27740 28557 inpcb 28166 29296 socket 27827 28740 > 2016-09-23 21:46:59 tcptw 32805 423 1665 0 tcpcb 27687 28557 inpcb 28110 29296 socket 27773 28740 > 2016-09-23 21:47:00 tcptw 32805 426 1665 0 tcpcb 27716 28557 inpcb 28142 29296 socket 27804 28740 > 2016-09-23 21:47:01 tcptw 32805 437 1665 0 tcpcb 27732 28557 inpcb 28169 29296 socket 27821 28740 > 2016-09-23 21:47:02 tcptw 32805 471 1665 0 tcpcb 27672 28557 inpcb 28143 29296 socket 27760 28740 > 2016-09-23 21:47:03 tcptw 32805 426 1665 0 tcpcb 27752 28557 inpcb 28178 29296 socket 27838 28740 > 2016-09-23 21:47:04 tcptw 32805 402 1665 0 tcpcb 27760 28557 inpcb 28162 29296 socket 27847 28740 > 2016-09-23 21:47:05 tcptw 32805 406 1665 0 tcpcb 27757 28557 inpcb 28163 29296 socket 27845 28740 > 2016-09-23 21:47:06 tcptw 32805 443 1665 0 tcpcb 27783 28557 inpcb 28226 29296 socket 27871 28740 > 2016-09-23 21:47:07 tcptw 32805 484 1665 0 tcpcb 27707 28557 inpcb 28191 29296 socket 27794 28740 > 2016-09-23 21:47:08 tcptw 32805 473 1665 0 tcpcb 27721 28557 inpcb 28194 29296 socket 27807 28740 > 2016-09-23 21:47:09 tcptw 32805 432 1665 0 tcpcb 27749 28557 inpcb 28181 29296 socket 27837 28740 > 2016-09-23 21:47:10 tcptw 32805 421 1665 0 tcpcb 27780 28557 inpcb 28201 29296 socket 27868 28740 > 2016-09-23 21:47:11 tcptw 32805 530 1665 0 tcpcb 27814 28557 inpcb 28344 29296 socket 27902 28740 > 2016-09-23 21:47:12 tcptw 32805 680 1665 0 tcpcb 27874 28557 inpcb 28554 29296 socket 27964 28740 > 2016-09-23 21:47:13 tcptw 32805 832 1665 0 tcpcb 27881 28557 inpcb 28713 29296 socket 27971 28740 > 2016-09-23 21:47:14 tcptw 32805 997 1665 0 tcpcb 27880 28557 inpcb 28877 29352 socket 27972 28740 > 2016-09-23 21:47:15 tcptw 32805 1155 1890 0 tcpcb 27931 28557 inpcb 29086 29560 socket 28023 28740 > 2016-09-23 21:47:16 tcptw 32805 1322 2250 0 tcpcb 27981 28557 inpcb 29303 29800 socket 28075 28740 > 2016-09-23 21:47:17 tcptw 32805 1496 2385 0 tcpcb 28065 28557 inpcb 29561 30040 socket 28159 28740 > 2016-09-23 21:47:18 tcptw 32805 1648 2385 0 tcpcb 28151 28557 inpcb 29799 30280 socket 28245 28740 > 2016-09-23 21:47:19 tcptw 32805 1790 2655 0 tcpcb 28398 28599 inpcb 30188 30672 socket 28492 28796 > 2016-09-23 21:47:20 tcptw 32805 1954 2655 0 tcpcb 28712 28923 inpcb 30666 31120 socket 28807 29116 > 2016-09-23 21:47:21 tcptw 32805 2115 3015 0 tcpcb 29061 29244 inpcb 31176 31576 socket 29156 29468 > 2016-09-23 21:47:22 tcptw 32805 2265 3150 0 tcpcb 29335 29538 inpcb 31600 32056 socket 29430 29704 > 2016-09-23 21:47:23 tcptw 32805 2424 3150 0 tcpcb 29553 29775 inpcb 31977 32440 socket 29648 29956 > 2016-09-23 21:47:24 tcptw 32805 2590 3375 0 tcpcb 29711 29901 inpcb 32301 32744 socket 29807 30112 > 2016-09-23 21:47:25 tcptw 32805 2760 3780 0 tcpcb 29794 30015 inpcb 32554 33040 socket 29891 30224 > 2016-09-23 21:47:26 tcptw 32805 2935 3915 0 tcpcb 29879 30111 inpcb 32814 33312 socket 29976 30292 > 2016-09-23 21:47:27 tcptw 32805 3109 3915 0 tcpcb 29953 30195 inpcb 33062 33584 socket 30050 30392 > 2016-09-23 21:47:28 tcptw 32805 3264 4140 0 tcpcb 30060 30267 inpcb 33324 33824 socket 30158 30476 > 2016-09-23 21:47:29 tcptw 32805 3435 4275 0 tcpcb 30137 30363 inpcb 33572 34032 socket 30235 30572 > 2016-09-23 21:47:30 tcptw 32805 3600 4500 0 tcpcb 30221 30489 inpcb 33821 34304 socket 30320 30644 > 2016-09-23 21:47:31 tcptw 32805 3775 4635 0 tcpcb 30309 30588 inpcb 34084 34576 socket 30408 30740 > 2016-09-23 21:47:32 tcptw 32805 3936 4770 0 tcpcb 30534 30741 inpcb 34470 34960 socket 30634 30908 > 2016-09-23 21:47:33 tcptw 32805 4097 4905 0 tcpcb 30744 30951 inpcb 34841 35352 socket 30844 31160 > 2016-09-23 21:47:34 tcptw 32805 4233 5040 0 tcpcb 31006 31176 inpcb 35239 35680 socket 31106 31372 > 2016-09-23 21:47:35 tcptw 32805 4366 5265 0 tcpcb 31160 31386 inpcb 35526 35920 socket 31260 31568 > 2016-09-23 21:47:36 tcptw 32805 4738 5535 0 tcpcb 29529 31428 inpcb 34267 36016 socket 29629 31596 > 2016-09-23 21:47:37 tcptw 32805 4879 5625 0 tcpcb 29506 31428 inpcb 34385 36016 socket 29607 31596 > 2016-09-23 21:47:38 tcptw 32805 5011 5895 0 tcpcb 29590 31428 inpcb 34601 36016 socket 29691 31596 > 2016-09-23 21:47:39 tcptw 32805 5130 5895 0 tcpcb 29713 31428 inpcb 34843 36016 socket 29815 31596 > 2016-09-23 21:47:40 tcptw 32805 5259 6165 0 tcpcb 29783 31428 inpcb 35042 36016 socket 29886 31596 > 2016-09-23 21:47:41 tcptw 32805 5378 6255 0 tcpcb 29606 31428 inpcb 34984 36016 socket 29709 31596 > 2016-09-23 21:47:42 tcptw 32805 5489 6255 0 tcpcb 29638 31428 inpcb 35127 36016 socket 29741 31596 > 2016-09-23 21:47:43 tcptw 32805 5629 6390 0 tcpcb 29630 31428 inpcb 35259 36016 socket 29735 31596 > 2016-09-23 21:47:44 tcptw 32805 5754 6660 0 tcpcb 29593 31428 inpcb 35347 36016 socket 29696 31596 > 2016-09-23 21:47:45 tcptw 32805 5887 6660 0 tcpcb 29606 31428 inpcb 35493 36016 socket 29709 31596 > 2016-09-23 21:47:46 tcptw 32805 6011 6750 0 tcpcb 29613 31428 inpcb 35624 36016 socket 29716 31596 > 2016-09-23 21:47:47 tcptw 32805 6128 7020 0 tcpcb 29642 31428 inpcb 35770 36128 socket 29745 31596 > 2016-09-23 21:47:48 tcptw 32805 6250 7020 0 tcpcb 29742 31428 inpcb 35992 36416 socket 29845 31596 > 2016-09-23 21:47:49 tcptw 32805 6378 7155 0 tcpcb 29745 31428 inpcb 36123 36472 socket 29850 31596 > 2016-09-23 21:47:50 tcptw 32805 6486 7290 0 tcpcb 29756 31428 inpcb 36242 36648 socket 29861 31596 > 2016-09-23 21:47:51 tcptw 32805 6603 7515 0 tcpcb 29807 31428 inpcb 36410 36792 socket 29912 31596 > 2016-09-23 21:47:52 tcptw 32805 6736 7515 0 tcpcb 29830 31428 inpcb 36566 36912 socket 29935 31596 > 2016-09-23 21:47:53 tcptw 32805 6852 7785 0 tcpcb 29892 31428 inpcb 36744 37112 socket 29996 31596 > 2016-09-23 21:47:54 tcptw 32805 6991 7785 0 tcpcb 29876 31428 inpcb 36867 37288 socket 29981 31596 > 2016-09-23 21:47:55 tcptw 32805 7102 8010 0 tcpcb 29928 31428 inpcb 37030 37400 socket 30033 31596 > 2016-09-23 21:47:56 tcptw 32805 7227 8010 0 tcpcb 29960 31428 inpcb 37187 37544 socket 30065 31596 > 2016-09-23 21:47:57 tcptw 32805 7356 8280 0 tcpcb 30031 31428 inpcb 37387 37752 socket 30136 31596 > 2016-09-23 21:47:58 tcptw 32805 7505 8415 0 tcpcb 30164 31428 inpcb 37669 38040 socket 30270 31596 > 2016-09-23 21:47:59 tcptw 32805 7618 8415 0 tcpcb 30302 31428 inpcb 37920 38328 socket 30408 31596 > 2016-09-23 21:48:00 tcptw 32805 7740 8505 0 tcpcb 30478 31428 inpcb 38218 38560 socket 30584 31596 > 2016-09-23 21:48:01 tcptw 32805 7858 8640 0 tcpcb 30663 31428 inpcb 38521 38880 socket 30769 31596So my hypothesis was wrong it is not a contention between pfslowtimo()/tcp_tw_2msl_scan(reuse=0) and tcp_input()/tcp_twstart()/tcp_tw_2msl_scan(reuse=1) as you still have plenty of unused tcptw. It is an issue between tcp_tw_2msl_scan(reuse=0) and something else. -- Julien -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160926/939f51fd/attachment.sig>