Hi Slawa,
On 10/12/16 5:42 PM, Slawa Olhovchenkov wrote:> On Wed, Oct 12, 2016 at 05:17:35PM +0200, Julien Charbon wrote:
>
>>>>>>>>>> I see, thus just for the context: The
TCP stack in sys/dev/cxgb* is a
>>>>>>>>>> TOE (TCP Offload Engine?) TCP stack for
Chelsio NICs, it is a
>>>>>>>>>> separate/side TCP stack that is used
only with TCP_OFFLOAD option.
>>>>>>>>>>
>>>>>>>>>> This TOE TCP stack actually has its
own set of detach()/input()
>>>>>>>>>> functions and seems to check
INP_DROPPED flag properly. I guess @np
>>>>>>>>>> check fixes in socket TCP stack and
decides which one can also impact
>>>>>>>>>> the Chelsio TOE TCP stack. Some bugs
are only in socket TCP stack, some
>>>>>>>>>> are only in TOE TCP stack.
>>>>>>>>>
>>>>>>>>> I am fear about other direction -- setting
INP_TIMEWAIT in Chelsio TOE
>>>>>>>>> TCP stack and impact this to
>>>>>>>>>
tcp_timer_2msl()/tcp_close()/sofree()/tcp_usr_detach() path.
>>>>>>>>
>>>>>>>> I see, I expect no problem on this side as
tcp_timer_2msl() checks the
>>>>>>>> INP_TIMEWAIT flag and do not call tcp_close()
if set.
>>>>>>>
>>>>>>> I am about case when at time of first INP_WUNLOCK()
tcp_timer_2msl()
>>>>>>> don't see INP_TIMEWAIT, call tcp_close(),
tcp_close() do INP_WUNLOCK()
>>>>>>> and now Chelsio TOE take INP_WLOCK, do
tcp_twstart() and set
>>>>>>> INP_TIMEWAIT. After this tcp_timer_2msl resume and
have unexpected
>>>>>>> INP_TIMEWAIT in tcp_usr_detach().
>>>>>>
>>>>>> Sure, basically the same bug that in classic TCP
stack. If you think
>>>>>> it can happen, send an email describing that to np@ and
he will check
>>>>>> and fix that. He is a TOE TCP stack expert and I am
not. In all cases,
>>>>>> if this issue is possible in TOE TCP stack context, the
patch will be
>>>>>> straightforward: If the INP_DROPPED flag is set do not
call tcp_twstart().
>
> I am email to np@
>
>>>>>> The current patch focuses only on the classic TCP
stack.
>>>>>
>>>>> May be current workaround (with logging) in
tcp_usr_detach() is good
>>>>> solutuion for preventing system lockout by similar bugs?
>>>>
>>>> Good question, the quick workaround in tcp_usr_detach() does
not handle
>>>> all the cases. If it reduces the number of crashes you can
still find
>>>> scenarios where it can have unexpected side effect.
>>>
>>> This is best then guaranted lockout.
>>>
>>>> Long term solution is to enforce: If the inp has the
INP_DROPPED flag
>>>> just stop processing it and return. If you grep the
INP_DROPPED flag in
>>>> kernel sources, you can see that this test is already done in
almost all
>>>> tcp_*() processing functions but tcp_input().
>>>>
>>>> I would say that even without this issue tcp_input() should
check
>>>> INP_DROPPED flags after INP_WLOCK anyway. Same for the TOE TCP
stack,
>>>> you are simply not supposed to process a inp with INP_DROPPED
flag.
>>>
>>> Absolutly acceptant!
>>> May point is: more check and good handling of check result is best
for
>>> stability.
>>>
>>> I.e. AND check INP_DROPPED in tcp_input AND workaroud INP_TIMEWAIT
in
>>> tcp_usr_detach (with logging) and check of some posible cases in
XXX TOE.
>>>
>>> Current TCP stack too complex and have many corner cases. This is
need
>>> additional guards where posible (not caused kernel panic).
>>
>> I see your point: Even if this issue is caught by this assert:
>>
>> KASSERT(tp == NULL, ("tcp_detach: INP_TIMEWAIT && "
>> "INP_DROPPED && tp != NULL"));
>>
https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/tcp_usrreq.c#L213
>>
>> you might not have INVARIANT option, then you will get a lockout quite
>> difficult to debug. Thus what we can do is:
>>
>> - If INVARIANT is set: kernel panic to get all the details in the
core.
>> - If INVARIANT is not set: Log this error with an explicit kernel
>> log(LOG_ERR) describing the issue, and then use the workaround to avoid
>> the double-free to let the system to good enough state.
>>
>> Something like:
>
> Yes, thanks!
Proposed changes added in the review:
https://reviews.freebsd.org/D8211
tell me when you have three days without issue with this change.
>> tcp_detach() {
>>
>> ...
>> if (inp->inp_flags & INP_TIMEWAIT) {
>>
>> ...
>> if (inp->inp_flags & INP_DROPPED) {
>>
>> in_pcbdetach(inp);
>> if (__predict_true(tp == NULL)) {
>> in_pcbfree(inp);
>> } else {
>> #ifdef INVARIANTS
>> panic("tcp_detach: tp != NULL, That's not good because
'blah'\n");
>> #else
>> log(LOG_ERR, "tcp_detach: tp != NULL, That's not good
because
>> 'blah'\n");
>
> May be some more info in log can help to detect root cause of issuse?
> I am don't know what info, may be flags or number of references?
For this kind of issue, the useful part is the stacktrace. INVARIANT
will give you that trace in the core, and without INVARIANT then it is
better to use dtrace:
$ cat tcp-twstart-dropped.d
fbt::tcp_twstart:entry
/args[0]->t_inpcb->inp_flags & 0x04000000/
{
stack();
printf("INP_DROPPED in tcp_twstart: %x",
args[0]->t_inpcb->inp_flags);
}
--
Julien
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20161013/0d648962/attachment.sig>