On Mon, Sep 19, 2016 at 10:32:13PM +0200, Julien Charbon wrote:> > > @ CPU_CLK_UNHALTED_CORE [4653445 samples] > > > > 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel > > 100.0% [2413083] __rw_wlock_hard > > 100.0% [2413083] tcp_tw_2msl_scan > > 99.99% [2412958] pfslowtimo > > 100.0% [2412958] softclock_call_cc > > 100.0% [2412958] softclock > > 100.0% [2412958] intr_event_execute_handlers > > 100.0% [2412958] ithread_loop > > 100.0% [2412958] fork_exit > > 00.01% [125] tcp_twstart > > 100.0% [125] tcp_do_segment > > 100.0% [125] tcp_input > > 100.0% [125] ip_input > > 100.0% [125] swi_net > > 100.0% [125] intr_event_execute_handlers > > 100.0% [125] ithread_loop > > 100.0% [125] fork_exit > > The only write lock tcp_tw_2msl_scan() tries to get is a > INP_WLOCK(inp). Thus here, tcp_tw_2msl_scan() seems to be stuck > spinning on INP_WLOCK (or pfslowtimo() is going crazy and calls > tcp_tw_2msl_scan() at high rate but this will be quite unexpected). > > Thus my hypothesis is that something is holding the INP_WLOCK and not > releasing it, and tcp_tw_2msl_scan() is spinning on it. > > If you can, could you compile the kernel with below options: > > options DDB # Support DDB. > options DEADLKRES # Enable the deadlock resolver > options INVARIANTS # Enable calls of extra sanity > checking > options INVARIANT_SUPPORT # Extra sanity checks of internal > structures, required by INVARIANTS > options WITNESS # Enable checks to detect > deadlocks and cycles > options WITNESS_SKIPSPIN # Don't run witness on spinlocks > for speedCurrently this host run with 100% CPU load (on all cores), i.e. enabling WITNESS will be significant drop performance. Can I use only some subset of options? Also, I can some troubles to DDB enter in this case. May be kgdb will be success (not tryed yet)?> And once the issue is reproduce, run in ddb run the below commands: > > show pcpu > show allpcpu > show locks > show alllocks > show lockchain > show allchains > show all trace > > This is to see if the contention is indeed on the tcp_tw_2msl_scan's > INP_WLOCK.
Hi Slawa, On 9/19/16 10:43 PM, Slawa Olhovchenkov wrote:> On Mon, Sep 19, 2016 at 10:32:13PM +0200, Julien Charbon wrote: >> >>> @ CPU_CLK_UNHALTED_CORE [4653445 samples] >>> >>> 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel >>> 100.0% [2413083] __rw_wlock_hard >>> 100.0% [2413083] tcp_tw_2msl_scan >>> 99.99% [2412958] pfslowtimo >>> 100.0% [2412958] softclock_call_cc >>> 100.0% [2412958] softclock >>> 100.0% [2412958] intr_event_execute_handlers >>> 100.0% [2412958] ithread_loop >>> 100.0% [2412958] fork_exit >>> 00.01% [125] tcp_twstart >>> 100.0% [125] tcp_do_segment >>> 100.0% [125] tcp_input >>> 100.0% [125] ip_input >>> 100.0% [125] swi_net >>> 100.0% [125] intr_event_execute_handlers >>> 100.0% [125] ithread_loop >>> 100.0% [125] fork_exit >> >> The only write lock tcp_tw_2msl_scan() tries to get is a >> INP_WLOCK(inp). Thus here, tcp_tw_2msl_scan() seems to be stuck >> spinning on INP_WLOCK (or pfslowtimo() is going crazy and calls >> tcp_tw_2msl_scan() at high rate but this will be quite unexpected). >> >> Thus my hypothesis is that something is holding the INP_WLOCK and not >> releasing it, and tcp_tw_2msl_scan() is spinning on it. >> >> If you can, could you compile the kernel with below options: >> >> options DDB # Support DDB. >> options DEADLKRES # Enable the deadlock resolver >> options INVARIANTS # Enable calls of extra sanity >> checking >> options INVARIANT_SUPPORT # Extra sanity checks of internal >> structures, required by INVARIANTS >> options WITNESS # Enable checks to detect >> deadlocks and cycles >> options WITNESS_SKIPSPIN # Don't run witness on spinlocks >> for speed > > Currently this host run with 100% CPU load (on all cores), i.e. > enabling WITNESS will be significant drop performance. > Can I use only some subset of options? > > Also, I can some troubles to DDB enter in this case. > May be kgdb will be success (not tryed yet)?If these kernel options will certainly slow down your kernel, they also might found the root cause of your issue before reaching the point where you have 100% cpu load on all cores (thanks to INVARIANTS). I would suggest: #1. Try above kernel options at least once, and see what you can get. #2. If #1 is a total failure try below patch: It won't solve anything, it just makes tcp_tw_2msl_scan() less greedy when there is contention on the INP write lock. If it makes the debugging more feasible, continue to #3. diff --git a/sys/netinet/tcp_timewait.c b/sys/netinet/tcp_timewait.c index a8b78f9..4206ea3 100644 --- a/sys/netinet/tcp_timewait.c +++ b/sys/netinet/tcp_timewait.c @@ -701,34 +701,42 @@ tcp_tw_2msl_scan(int reuse) in_pcbref(inp); TW_RUNLOCK(V_tw_lock); +retry: if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) { - INP_WLOCK(inp); - tw = intotw(inp); - if (in_pcbrele_wlocked(inp)) { - KASSERT(tw == NULL, ("%s: held last inp " - "reference but tw not NULL", __func__)); - INP_INFO_RUNLOCK(&V_tcbinfo); - continue; - } + if (INP_TRY_WLOCK(inp)) { + tw = intotw(inp); + if (in_pcbrele_wlocked(inp)) { + KASSERT(tw == NULL, ("%s: held last inp " + "reference but tw not NULL", __func__)); + INP_INFO_RUNLOCK(&V_tcbinfo); + continue; + } - if (tw == NULL) { - /* tcp_twclose() has already been called */ - INP_WUNLOCK(inp); - INP_INFO_RUNLOCK(&V_tcbinfo); - continue; - } + if (tw == NULL) { + /* tcp_twclose() has already been called */ + INP_WUNLOCK(inp); + INP_INFO_RUNLOCK(&V_tcbinfo); + continue; + } - tcp_twclose(tw, reuse); - INP_INFO_RUNLOCK(&V_tcbinfo); - if (reuse) - return tw; + tcp_twclose(tw, reuse); + INP_INFO_RUNLOCK(&V_tcbinfo); + if (reuse) + return tw; + } else { + INP_INFO_RUNLOCK(&V_tcbinfo); + goto retry; + } } else { /* INP_INFO lock is busy, continue later. */ - INP_WLOCK(inp); - if (!in_pcbrele_wlocked(inp)) - INP_WUNLOCK(inp); - break; + if (INP_TRY_WLOCK(inp)) { + if (!in_pcbrele_wlocked(inp)) + INP_WUNLOCK(inp); + break; + } else { + goto retry; + } } } #3. Once the issue is reproduced, launch ddb and run the below commands: show pcpu show allpcpu show locks show alllocks show lockchain show allchains show all trace My 2 cents. -- Julien -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160920/cd753a8c/attachment.sig>
On Tue, Sep 20, 2016 at 10:00:25PM +0200, Julien Charbon wrote:> > Hi Slawa, > > On 9/19/16 10:43 PM, Slawa Olhovchenkov wrote: > > On Mon, Sep 19, 2016 at 10:32:13PM +0200, Julien Charbon wrote: > >> > >>> @ CPU_CLK_UNHALTED_CORE [4653445 samples] > >>> > >>> 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel > >>> 100.0% [2413083] __rw_wlock_hard > >>> 100.0% [2413083] tcp_tw_2msl_scan > >>> 99.99% [2412958] pfslowtimo > >>> 100.0% [2412958] softclock_call_cc > >>> 100.0% [2412958] softclock > >>> 100.0% [2412958] intr_event_execute_handlers > >>> 100.0% [2412958] ithread_loop > >>> 100.0% [2412958] fork_exit > >>> 00.01% [125] tcp_twstart > >>> 100.0% [125] tcp_do_segment > >>> 100.0% [125] tcp_input > >>> 100.0% [125] ip_input > >>> 100.0% [125] swi_net > >>> 100.0% [125] intr_event_execute_handlers > >>> 100.0% [125] ithread_loop > >>> 100.0% [125] fork_exit > >> > >> The only write lock tcp_tw_2msl_scan() tries to get is a > >> INP_WLOCK(inp). Thus here, tcp_tw_2msl_scan() seems to be stuck > >> spinning on INP_WLOCK (or pfslowtimo() is going crazy and calls > >> tcp_tw_2msl_scan() at high rate but this will be quite unexpected). > >> > >> Thus my hypothesis is that something is holding the INP_WLOCK and not > >> releasing it, and tcp_tw_2msl_scan() is spinning on it. > >> > >> If you can, could you compile the kernel with below options: > >> > >> options DDB # Support DDB. > >> options DEADLKRES # Enable the deadlock resolver > >> options INVARIANTS # Enable calls of extra sanity > >> checking > >> options INVARIANT_SUPPORT # Extra sanity checks of internal > >> structures, required by INVARIANTS > >> options WITNESS # Enable checks to detect > >> deadlocks and cycles > >> options WITNESS_SKIPSPIN # Don't run witness on spinlocks > >> for speed > > > > Currently this host run with 100% CPU load (on all cores), i.e. > > enabling WITNESS will be significant drop performance. > > Can I use only some subset of options? > > > > Also, I can some troubles to DDB enter in this case. > > May be kgdb will be success (not tryed yet)? > > If these kernel options will certainly slow down your kernel, they also > might found the root cause of your issue before reaching the point where > you have 100% cpu load on all cores (thanks to INVARIANTS). I would > suggest:Hmmm, may be I am not clarified. This host run at peak hours with 100% CPU load as normal operation, this is for servering 2x10G, this is CPU load not result of lock issuse, this is not us case. And this is because I am fear to enable WITNESS -- I am fear drop performance. This lock issuse happen irregulary and may be caused by other issuse (nginx crashed). In this case about 1/3 cores have 100% cpu load, perhaps by this lock -- I am can trace only from one core and need more then hour for this (may be on other cores different trace, I can't guaranted anything).> #1. Try above kernel options at least once, and see what you can get.OK, I am try this after some time.> #2. If #1 is a total failure try below patch: It won't solve anything, > it just makes tcp_tw_2msl_scan() less greedy when there is contention on > the INP write lock. If it makes the debugging more feasible, continue > to #3.OK, thanks. What purpose to not skip locked tcptw in this loop?> diff --git a/sys/netinet/tcp_timewait.c b/sys/netinet/tcp_timewait.c > index a8b78f9..4206ea3 100644 > --- a/sys/netinet/tcp_timewait.c > +++ b/sys/netinet/tcp_timewait.c > @@ -701,34 +701,42 @@ tcp_tw_2msl_scan(int reuse) > in_pcbref(inp); > TW_RUNLOCK(V_tw_lock); > > +retry: > if (INP_INFO_TRY_RLOCK(&V_tcbinfo)) { > > - INP_WLOCK(inp); > - tw = intotw(inp); > - if (in_pcbrele_wlocked(inp)) { > - KASSERT(tw == NULL, ("%s: held last inp " > - "reference but tw not NULL", __func__)); > - INP_INFO_RUNLOCK(&V_tcbinfo); > - continue; > - } > + if (INP_TRY_WLOCK(inp)) { > + tw = intotw(inp); > + if (in_pcbrele_wlocked(inp)) { > + KASSERT(tw == NULL, ("%s: held > last inp " > + "reference but tw not NULL", > __func__)); > + INP_INFO_RUNLOCK(&V_tcbinfo); > + continue; > + } > > - if (tw == NULL) { > - /* tcp_twclose() has already been called */ > - INP_WUNLOCK(inp); > - INP_INFO_RUNLOCK(&V_tcbinfo); > - continue; > - } > + if (tw == NULL) { > + /* tcp_twclose() has already > been called */ > + INP_WUNLOCK(inp); > + INP_INFO_RUNLOCK(&V_tcbinfo); > + continue; > + } > > - tcp_twclose(tw, reuse); > - INP_INFO_RUNLOCK(&V_tcbinfo); > - if (reuse) > - return tw; > + tcp_twclose(tw, reuse); > + INP_INFO_RUNLOCK(&V_tcbinfo); > + if (reuse) > + return tw; > + } else { > + INP_INFO_RUNLOCK(&V_tcbinfo); > + goto retry; > + } > } else { > /* INP_INFO lock is busy, continue later. */ > - INP_WLOCK(inp); > - if (!in_pcbrele_wlocked(inp)) > - INP_WUNLOCK(inp); > - break; > + if (INP_TRY_WLOCK(inp)) { > + if (!in_pcbrele_wlocked(inp)) > + INP_WUNLOCK(inp); > + break; > + } else { > + goto retry; > + } > } > } > > #3. Once the issue is reproduced, launch ddb and run the below commands: > > show pcpu > show allpcpu > show locks > show alllocks > show lockchain > show allchains > show all trace > > My 2 cents. > > -- > Julien >