Hi Slawa,
On 9/16/16 9:03 PM, Slawa Olhovchenkov wrote:> On Fri, Sep 16, 2016 at 11:30:53AM -0700, hiren panchasara wrote:
>
>> On 09/16/16 at 09:18P, Slawa Olhovchenkov wrote:
>>> On Thu, Sep 15, 2016 at 12:06:33PM +0300, Slawa Olhovchenkov wrote:
>>>
>>>> On Thu, Sep 15, 2016 at 11:59:38AM +0300, Konstantin Belousov
wrote:
>>>>
>>>>> On Thu, Sep 15, 2016 at 12:35:04AM +0300, Slawa
Olhovchenkov wrote:
>>>>>> On Sun, Sep 04, 2016 at 06:46:12PM -0700, hiren
panchasara wrote:
>>>>>>
>>>>>>> On 09/05/16 at 12:57P, Slawa Olhovchenkov wrote:
>>>>>>>> I am try using 11.0 on Dual E5-2620 (no
X2APIC).
>>>>>>>> Under high network load and may be addtional
conditional system go to
>>>>>>>> unresponsible state -- no reaction to network
and console (USB IPMI
>>>>>>>> emulation). INVARIANTS give to high overhad. Is
this exist some way to
>>>>>>>> debug this?
>>>>>>>
>>>>>>> Can you panic it from console to get to db> to
get backtrace and other
>>>>>>> info when it goes unresponsive?
>>>>>>
>>>>>> ipmi console don't respond (chassis power diag
don't react)
>>>>>> login on sol console stuck on *tcp.
>>>>>
>>>>> Is 'login' you reference is the ipmi client state,
or you mean login(1)
>>>>> on the wedged host ?
>>>>
>>>> on the wedged host
>>>>
>>>>> If BMC stops responding simultaneously with the host, I
would suspect
>>>>> the hardware platform issues instead of a software problem.
Do you have
>>>>> dedicated LAN port for BMC ?
>>>>
>>>> Yes.
>>>> But BMC emulate USB keyboard and this is may be lock inside USB
>>>> system.
>>>> "ipmi console don't respond" must be read as
"ipmi console runnnig and
>>>> attached but system don't react to keypress on this
console".
>>>> at the sime moment system respon to `enter` on ipmi sol
console, but
>>>> after enter `root` stuck in login in the '*tcp' state
(I think this is
>>>> NIS related).
>>>
>>> ~^B don't break to debuger.
>>> But I can login to sol console.
>>
>> You can probably:
>> debug.kdb.enter: set to enter the debugger
>>
>> or force a panic and get vmcore:
>> debug.kdb.panic: set to panic the kernel
>
> I am reset this host.
> PMC samples collected and decoded:
>
> @ CPU_CLK_UNHALTED_CORE [4653445 samples]
>
> 51.86% [2413083] lock_delay @ /boot/kernel.VSTREAM/kernel
> 100.0% [2413083] __rw_wlock_hard
> 100.0% [2413083] tcp_tw_2msl_scan
> 99.99% [2412958] pfslowtimo
> 100.0% [2412958] softclock_call_cc
> 100.0% [2412958] softclock
> 100.0% [2412958] intr_event_execute_handlers
> 100.0% [2412958] ithread_loop
> 100.0% [2412958] fork_exit
> 00.01% [125] tcp_twstart
> 100.0% [125] tcp_do_segment
> 100.0% [125] tcp_input
> 100.0% [125] ip_input
> 100.0% [125] swi_net
> 100.0% [125] intr_event_execute_handlers
> 100.0% [125] ithread_loop
> 100.0% [125] fork_exit
The only write lock tcp_tw_2msl_scan() tries to get is a
INP_WLOCK(inp). Thus here, tcp_tw_2msl_scan() seems to be stuck
spinning on INP_WLOCK (or pfslowtimo() is going crazy and calls
tcp_tw_2msl_scan() at high rate but this will be quite unexpected).
Thus my hypothesis is that something is holding the INP_WLOCK and not
releasing it, and tcp_tw_2msl_scan() is spinning on it.
If you can, could you compile the kernel with below options:
options DDB # Support DDB.
options DEADLKRES # Enable the deadlock resolver
options INVARIANTS # Enable calls of extra sanity
checking
options INVARIANT_SUPPORT # Extra sanity checks of internal
structures, required by INVARIANTS
options WITNESS # Enable checks to detect
deadlocks and cycles
options WITNESS_SKIPSPIN # Don't run witness on spinlocks
for speed
And once the issue is reproduce, run in ddb run the below commands:
show pcpu
show allpcpu
show locks
show alllocks
show lockchain
show allchains
show all trace
This is to see if the contention is indeed on the tcp_tw_2msl_scan's
INP_WLOCK.
--
Julien
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160919/2506c15f/attachment.sig>