+ jch@ On 09/16/16 at 10:03P, Slawa Olhovchenkov wrote:> On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote: > > > On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote: > > > On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote: > > > > > > > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote: > > > > > > > > > On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote: > > > > > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > > > > > > > > > > > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > > > > > > > > I am try using 11.0 on Dual E5-2620 (no X2APIC). > > > > > > > > Under high network load and may be addtional conditional system go to > > > > > > > > unresponsible state -- no reaction to network and console (USB IPMI > > > > > > > > emulation). INVARIANTS give to high overhad. Is this exist some way to > > > > > > > > debug this? > > > > > > > > > > > > > > Can you panic it from console to get to db> to get backtrace and other > > > > > > > info when it goes unresponsive? > > > > > > > > > > > > ipmi console don't respond (chassis power diag don't react) > > > > > > login on sol console stuck on *tcp. > > > > > > > > > > Is 'login' you reference is the ipmi client state, or you mean login(1) > > > > > on the wedged host ? > > > > > > > > on the wedged host > > > > > > > > > If BMC stops responding simultaneously with the host, I would suspect > > > > > the hardware platform issues instead of a software problem. Do you have > > > > > dedicated LAN port for BMC ? > > > > > > > > Yes. > > > > But BMC emulate USB keyboard and this is may be lock inside USB > > > > system. > > > > "ipmi console don't respond" must be read as "ipmi console runnnig and > > > > attached but system don't react to keypress on this console". > > > > at the sime moment system respon to `enter` on ipmi sol console, but > > > > after enter `root` stuck in login in the '*tcp' state (I think this is > > > > NIS related). > > > > > > ~^B don't break to debuger. > > > But I can login to sol console. > > > > You can probably: > > debug.kdb.enter: set to enter the debugger > > > > or force a panic and get vmcore: > > debug.kdb.panic: set to panic the kernel > > I am reset this host. > PMC samples collected and decoded: > > @ CPU_CLK_UNHALTED_CORE [4653445 samples] > > 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel > 100.0% [2413083] __rw_wlock_hard > 100.0% [2413083] tcp_tw_2msl_scan > 99.99% [2412958] pfslowtimo > 100.0% [2412958] softclock_call_cc > 100.0% [2412958] softclock > 100.0% [2412958] intr_event_execute_handlers > 100.0% [2412958] ithread_loop > 100.0% [2412958] fork_exit > 00.01% [125] tcp_twstart > 100.0% [125] tcp_do_segment > 100.0% [125] tcp_input > 100.0% [125] ip_input > 100.0% [125] swi_net > 100.0% [125] intr_event_execute_handlers > 100.0% [125] ithread_loop > 100.0% [125] fork_exit > > 09.43% [438774] _rw_runlock_cookie @ /boot/kernel.VSTREAM/kernel > 100.0% [438774] tcp_tw_2msl_scan > 99.99% [438735] pfslowtimo > 100.0% [438735] softclock_call_cc > 100.0% [438735] softclock > 100.0% [438735] intr_event_execute_handlers > 100.0% [438735] ithread_loop > 100.0% [438735] fork_exit > 00.01% [39] tcp_twstart > 100.0% [39] tcp_do_segment > 100.0% [39] tcp_input > 100.0% [39] ip_input > 100.0% [39] swi_net > 100.0% [39] intr_event_execute_handlers > 100.0% [39] ithread_loop > 100.0% [39] fork_exit > > 08.57% [398970] __rw_wlock_hard @ /boot/kernel.VSTREAM/kernel > 100.0% [398970] tcp_tw_2msl_scan > 99.99% [398940] pfslowtimo > 100.0% [398940] softclock_call_cc > 100.0% [398940] softclock > 100.0% [398940] intr_event_execute_handlers > 100.0% [398940] ithread_loop > 100.0% [398940] fork_exit > 00.01% [30] tcp_twstart > 100.0% [30] tcp_do_segment > 100.0% [30] tcp_input > 100.0% [30] ip_input > 100.0% [30] swi_net > 100.0% [30] intr_event_execute_handlers > 100.0% [30] ithread_loop > 100.0% [30] fork_exit > > 05.79% [269224] __rw_try_rlock @ /boot/kernel.VSTREAM/kernel > 100.0% [269224] tcp_tw_2msl_scan > 99.99% [269203] pfslowtimo > 100.0% [269203] softclock_call_cc > 100.0% [269203] softclock > 100.0% [269203] intr_event_execute_handlers > 100.0% [269203] ithread_loop > 100.0% [269203] fork_exit > 00.01% [21] tcp_twstart > 100.0% [21] tcp_do_segment > 100.0% [21] tcp_input > 100.0% [21] ip_input > 100.0% [21] swi_net > 100.0% [21] intr_event_execute_handlers > 100.0% [21] ithread_loop > 100.0% [21] fork_exit > > 05.35% [249141] _rw_wlock_cookie @ /boot/kernel.VSTREAM/kernel > 99.76% [248543] tcp_tw_2msl_scan > 99.99% [248528] pfslowtimo > 100.0% [248528] softclock_call_cc > 100.0% [248528] softclock > 100.0% [248528] intr_event_execute_handlers > 100.0% [248528] ithread_loop > 100.0% [248528] fork_exit > 00.01% [15] tcp_twstart > 100.0% [15] tcp_do_segment > 100.0% [15] tcp_input > 100.0% [15] ip_input > 100.0% [15] swi_net > 100.0% [15] intr_event_execute_handlers > 100.0% [15] ithread_loop > 100.0% [15] fork_exit > 00.24% [598] pfslowtimo > 100.0% [598] softclock_call_cc > 100.0% [598] softclock > 100.0% [598] intr_event_execute_handlers > 100.0% [598] ithread_loop > 100.0% [598] fork_exit >As I suspected, this looks like a hang trying to lock V_tcbinfo. I'm ccing Julien here who worked on WLOCK -> RLOCK transition to improve performance for short-lived connections. I am not too sure if thats the problem but looks in similar area so he may be able to provide some insights. Cheers, Hiren -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 603 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160916/669159a0/attachment.sig>
On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote:> > As I suspected, this looks like a hang trying to lock V_tcbinfo. > > I'm ccing Julien here who worked on WLOCK -> RLOCK transition to improve > performance for short-lived connections. I am not too sure if thats the > problem but looks in similar area so he may be able to provide some > insights.No, this is other case. In may case at this time no network traffic more then hour. This is some sore of deadlock or like.
On Fri, Sep 16, 2016 at 12:11:55PM -0700, hiren panchasara wrote:> + jch@ > On 09/16/16 at 10:03P, Slawa Olhovchenkov wrote: > > On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote: > > > > > On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote: > > > > On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote: > > > > > > > > > On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov wrote: > > > > > > > > > > > On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa Olhovchenkov wrote: > > > > > > > On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren panchasara wrote: > > > > > > > > > > > > > > > On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote: > > > > > > > > > I am try using 11.0 on Dual E5-2620 (no X2APIC). > > > > > > > > > Under high network load and may be addtional conditional system go to > > > > > > > > > unresponsible state -- no reaction to network and console (USB IPMI > > > > > > > > > emulation). INVARIANTS give to high overhad. Is this exist some way to > > > > > > > > > debug this? > > > > > > > > > > > > > > > > Can you panic it from console to get to db> to get backtrace and other > > > > > > > > info when it goes unresponsive? > > > > > > > > > > > > > > ipmi console don't respond (chassis power diag don't react) > > > > > > > login on sol console stuck on *tcp. > > > > > > > > > > > > Is 'login' you reference is the ipmi client state, or you mean login(1) > > > > > > on the wedged host ? > > > > > > > > > > on the wedged host > > > > > > > > > > > If BMC stops responding simultaneously with the host, I would suspect > > > > > > the hardware platform issues instead of a software problem. Do you have > > > > > > dedicated LAN port for BMC ? > > > > > > > > > > Yes. > > > > > But BMC emulate USB keyboard and this is may be lock inside USB > > > > > system. > > > > > "ipmi console don't respond" must be read as "ipmi console runnnig and > > > > > attached but system don't react to keypress on this console". > > > > > at the sime moment system respon to `enter` on ipmi sol console, but > > > > > after enter `root` stuck in login in the '*tcp' state (I think this is > > > > > NIS related). > > > > > > > > ~^B don't break to debuger. > > > > But I can login to sol console. > > > > > > You can probably: > > > debug.kdb.enter: set to enter the debugger > > > > > > or force a panic and get vmcore: > > > debug.kdb.panic: set to panic the kernel > > > > I am reset this host. > > PMC samples collected and decoded: > > > > @ CPU_CLK_UNHALTED_CORE [4653445 samples] > > > > 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel > > 100.0% [2413083] __rw_wlock_hard > > 100.0% [2413083] tcp_tw_2msl_scan > > 99.99% [2412958] pfslowtimo > > 100.0% [2412958] softclock_call_cc > > 100.0% [2412958] softclock > > 100.0% [2412958] intr_event_execute_handlers > > 100.0% [2412958] ithread_loop > > 100.0% [2412958] fork_exit > > 00.01% [125] tcp_twstart > > 100.0% [125] tcp_do_segment > > 100.0% [125] tcp_input > > 100.0% [125] ip_input > > 100.0% [125] swi_net > > 100.0% [125] intr_event_execute_handlers > > 100.0% [125] ithread_loop > > 100.0% [125] fork_exit > > > > 09.43% [438774] _rw_runlock_cookie @ /boot/kernel.VSTREAM/kernel > > 100.0% [438774] tcp_tw_2msl_scan > > 99.99% [438735] pfslowtimo > > 100.0% [438735] softclock_call_cc > > 100.0% [438735] softclock > > 100.0% [438735] intr_event_execute_handlers > > 100.0% [438735] ithread_loop > > 100.0% [438735] fork_exit > > 00.01% [39] tcp_twstart > > 100.0% [39] tcp_do_segment > > 100.0% [39] tcp_input > > 100.0% [39] ip_input > > 100.0% [39] swi_net > > 100.0% [39] intr_event_execute_handlers > > 100.0% [39] ithread_loop > > 100.0% [39] fork_exit > > > > 08.57% [398970] __rw_wlock_hard @ /boot/kernel.VSTREAM/kernel > > 100.0% [398970] tcp_tw_2msl_scan > > 99.99% [398940] pfslowtimo > > 100.0% [398940] softclock_call_cc > > 100.0% [398940] softclock > > 100.0% [398940] intr_event_execute_handlers > > 100.0% [398940] ithread_loop > > 100.0% [398940] fork_exit > > 00.01% [30] tcp_twstart > > 100.0% [30] tcp_do_segment > > 100.0% [30] tcp_input > > 100.0% [30] ip_input > > 100.0% [30] swi_net > > 100.0% [30] intr_event_execute_handlers > > 100.0% [30] ithread_loop > > 100.0% [30] fork_exit > > > > 05.79% [269224] __rw_try_rlock @ /boot/kernel.VSTREAM/kernel > > 100.0% [269224] tcp_tw_2msl_scan > > 99.99% [269203] pfslowtimo > > 100.0% [269203] softclock_call_cc > > 100.0% [269203] softclock > > 100.0% [269203] intr_event_execute_handlers > > 100.0% [269203] ithread_loop > > 100.0% [269203] fork_exit > > 00.01% [21] tcp_twstart > > 100.0% [21] tcp_do_segment > > 100.0% [21] tcp_input > > 100.0% [21] ip_input > > 100.0% [21] swi_net > > 100.0% [21] intr_event_execute_handlers > > 100.0% [21] ithread_loop > > 100.0% [21] fork_exit > > > > 05.35% [249141] _rw_wlock_cookie @ /boot/kernel.VSTREAM/kernel > > 99.76% [248543] tcp_tw_2msl_scan > > 99.99% [248528] pfslowtimo > > 100.0% [248528] softclock_call_cc > > 100.0% [248528] softclock > > 100.0% [248528] intr_event_execute_handlers > > 100.0% [248528] ithread_loop > > 100.0% [248528] fork_exit > > 00.01% [15] tcp_twstart > > 100.0% [15] tcp_do_segment > > 100.0% [15] tcp_input > > 100.0% [15] ip_input > > 100.0% [15] swi_net > > 100.0% [15] intr_event_execute_handlers > > 100.0% [15] ithread_loop > > 100.0% [15] fork_exit > > 00.24% [598] pfslowtimo > > 100.0% [598] softclock_call_cc > > 100.0% [598] softclock > > 100.0% [598] intr_event_execute_handlers > > 100.0% [598] ithread_loop > > 100.0% [598] fork_exit > > > > As I suspected, this looks like a hang trying to lock V_tcbinfo. > > I'm ccing Julien here who worked on WLOCK -> RLOCK transition to improve > performance for short-lived connections. I am not too sure if thats the > problem but looks in similar area so he may be able to provide some > insights.I am point to tcp_tw_2msl_scan. I am expect traveling by list V_twq_2msl. But I am see only endless attempt to lock first element from this list. Is this correct?