Hi Slawa,
On 10/12/16 2:13 PM, Slawa Olhovchenkov wrote:> On Wed, Oct 12, 2016 at 02:06:59PM +0200, Julien Charbon wrote:
>>>>>>> sofree() call tcp_usr_detach() and in
tcp_usr_detach() we have
>>>>>>> unexpected INP_TIMEWAIT.
>>>>>>
>>>>>> I see, thus just for the context: The TCP stack in
sys/dev/cxgb* is a
>>>>>> TOE (TCP Offload Engine?) TCP stack for Chelsio NICs,
it is a
>>>>>> separate/side TCP stack that is used only with
TCP_OFFLOAD option.
>>>>>>
>>>>>> This TOE TCP stack actually has its own set of
detach()/input()
>>>>>> functions and seems to check INP_DROPPED flag properly.
I guess @np
>>>>>> check fixes in socket TCP stack and decides which one
can also impact
>>>>>> the Chelsio TOE TCP stack. Some bugs are only in
socket TCP stack, some
>>>>>> are only in TOE TCP stack.
>>>>>
>>>>> I am fear about other direction -- setting INP_TIMEWAIT in
Chelsio TOE
>>>>> TCP stack and impact this to
>>>>> tcp_timer_2msl()/tcp_close()/sofree()/tcp_usr_detach()
path.
>>>>
>>>> I see, I expect no problem on this side as tcp_timer_2msl()
checks the
>>>> INP_TIMEWAIT flag and do not call tcp_close() if set.
>>>
>>> I am about case when at time of first INP_WUNLOCK()
tcp_timer_2msl()
>>> don't see INP_TIMEWAIT, call tcp_close(), tcp_close() do
INP_WUNLOCK()
>>> and now Chelsio TOE take INP_WLOCK, do tcp_twstart() and set
>>> INP_TIMEWAIT. After this tcp_timer_2msl resume and have unexpected
>>> INP_TIMEWAIT in tcp_usr_detach().
>>
>> Sure, basically the same bug that in classic TCP stack. If you think
>> it can happen, send an email describing that to np@ and he will check
>> and fix that. He is a TOE TCP stack expert and I am not. In all
cases,
>> if this issue is possible in TOE TCP stack context, the patch will be
>> straightforward: If the INP_DROPPED flag is set do not call
tcp_twstart().
>>
>> The current patch focuses only on the classic TCP stack.
>
> May be current workaround (with logging) in tcp_usr_detach() is good
> solutuion for preventing system lockout by similar bugs?
Good question, the quick workaround in tcp_usr_detach() does not handle
all the cases. If it reduces the number of crashes you can still find
scenarios where it can have unexpected side effect.
Long term solution is to enforce: If the inp has the INP_DROPPED flag
just stop processing it and return. If you grep the INP_DROPPED flag in
kernel sources, you can see that this test is already done in almost all
tcp_*() processing functions but tcp_input().
I would say that even without this issue tcp_input() should check
INP_DROPPED flags after INP_WLOCK anyway. Same for the TOE TCP stack,
you are simply not supposed to process a inp with INP_DROPPED flag.
--
Julien
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20161012/54a8caf0/attachment.sig>