thr3ads.net - freebsd stable - 11.0 stuck on high network load [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Slawa Olhovchenkov

2016-Oct-12 08:40 UTC

11.0 stuck on high network load

On Wed, Oct 12, 2016 at 10:18:18AM +0200, Julien Charbon wrote:
> 
>  Hi Slawa,
> 
> On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote:
> > On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote:
> >>  Then threads are competing for the INP_WLOCK lock.  For the
example,
> >> let's say the thread A wants to run
tcp_input()/in_pcblookup_mbuf() and
> >> racing for this INP_WLOCK:
> >>
> >>
https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/in_pcb.c#L1964
> >>
> >>  And thread B wants to run
tcp_timer_2msl()/tcp_close()/in_pcbdrop() and
> >> racing for this INP_WLOCK:
> >>
> >>
https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/tcp_timer.c#L323
> >>
> >>  That leads to two cases:
> >>
> >>  o Thread A wins the race:
> >>
> >>   Thread A will continue tcp_input() as usal and INP_DROPPED flags
is
> >> not set and inp is still in TCP hash table.
> >>   Thread B is waiting on thread A to release INP_WLOCK after
finishing
> >> tcp_input() processing, and thread B will continue
> >> tcp_timer_2msl()/tcp_close()/in_pcbdrop() processing.
> >>
> >>  o Thread B wins the race:
> >>
> >>   Thread B runs tcp_timer_2msl()/tcp_close()/in_pcbdrop() and inp
> >> INP_DROPPED is set and inp being removed from TCP hash table.
> >>   In parallel, thread A has found the inp in TCP hash before is
was
> >> removed, and waiting on the found inp INP_WLOCK lock.
> >>   Once thread B has released the INP_WLOCK lock, thread A gets
this lock
> >> and sees the INP_DROPPED flag and do "goto findpcb" but
here because the
> >> inp is not more in TCP hash table and it will not be find again by
> >> in_pcblookup_mbuf().
> >>
> >>  Hopefully I am clear enough here.
> > 
> > Thanks, very clear.
> > Small qeustion: when both thread run on same CPU core, INP_WLOCK will
> > be re-schedule?
> 
>  Hmm, a thread can re-scheduled but not a lock. Thus no sure I
> understand your question here.  :)
I am don't know how work INP_WLOCK in this case (all on same cpu):

thread1: INP_WLOCK
-interrupt--
thread2: INP_WLOCK

if INP_WLOCK is like spinlock -- this is dead lock.
if INP_WLOCK is like mutex -- thread1 resheduled.
> > As I remeber race created by call tcp_twstart() at time of end
> > tcp_close(), at path sofree()-tcp_usr_detach() and unexpected
> > INP_TIMEWAIT state in the tcp_usr_detach(). INP_TIMEWAIT set in
tcp_twstart()
> 
>  Exactly, thus the current fix is:  If you already have the INP_DROPPED
> flag set you are not allowed to call tcp_twstart(), actually it is a
> good candidate for a new INVARIANT.  Let me add that.
> 
> > After check source code I am found invocation of tcp_twstart() in
> > sys/netinet/tcp_stacks/fastpath.c, sys/netinet/tcp_input.c,
> > sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, sys/dev/cxgbe/tom/t4_cpl_io.c.
> > 
> > Invocation from sys/netinet/tcp_stacks/fastpath.c and
> > sys/netinet/tcp_input.c guarded by INP_WLOCK in tcp_input(), and now
> > will be OK.
> > 
> > Invocation from sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c and
> > sys/dev/cxgbe/tom/t4_cpl_io.c is not clear to me, I am see independed
> > INP_WLOCK. Is this OK?
> > 
> > Can be thread A wants do_peer_close() directed from chelsio IRQ
> > handler, bypass tcp_input()?
> 
>  If you look carefully INP_WLOCK is used in cxgb_cpl_io.c and
> t4_cpl_io.c before calling tcp_twstart().
Yes, and you remeber: sys/netinet/tcp_subr.c

  1535  struct tcpcb *
  1536  tcp_close(struct tcpcb *tp)
  1537  {
...
  1569                  INP_WUNLOCK(inp);
  1570                  ACCEPT_LOCK();
  1571                  SOCK_LOCK(so);
  1572                  so->so_state &= ~SS_PROTOREF;
  1573                  sofree(so);
  1574                  return (NULL);

sofree() call tcp_usr_detach() and in tcp_usr_detach() we have
unexpected INP_TIMEWAIT.

Julien Charbon

2016-Oct-12 09:19 UTC

head link

11.0 stuck on high network load

Hi Slawa,

On 10/12/16 10:40 AM, Slawa Olhovchenkov wrote:> On Wed, Oct 12, 2016 at 10:18:18AM +0200, Julien Charbon wrote:
>> On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote:
>>> On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote:
>>>>  Then threads are competing for the INP_WLOCK lock.  For the
example,
>>>> let's say the thread A wants to run
tcp_input()/in_pcblookup_mbuf() and
>>>> racing for this INP_WLOCK:
>>>>
>>>>
https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/in_pcb.c#L1964
>>>>
>>>>  And thread B wants to run
tcp_timer_2msl()/tcp_close()/in_pcbdrop() and
>>>> racing for this INP_WLOCK:
>>>>
>>>>
https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/tcp_timer.c#L323
>>>>
>>>>  That leads to two cases:
>>>>
>>>>  o Thread A wins the race:
>>>>
>>>>   Thread A will continue tcp_input() as usal and INP_DROPPED
flags is
>>>> not set and inp is still in TCP hash table.
>>>>   Thread B is waiting on thread A to release INP_WLOCK after
finishing
>>>> tcp_input() processing, and thread B will continue
>>>> tcp_timer_2msl()/tcp_close()/in_pcbdrop() processing.
>>>>
>>>>  o Thread B wins the race:
>>>>
>>>>   Thread B runs tcp_timer_2msl()/tcp_close()/in_pcbdrop() and
inp
>>>> INP_DROPPED is set and inp being removed from TCP hash table.
>>>>   In parallel, thread A has found the inp in TCP hash before is
was
>>>> removed, and waiting on the found inp INP_WLOCK lock.
>>>>   Once thread B has released the INP_WLOCK lock, thread A gets
this lock
>>>> and sees the INP_DROPPED flag and do "goto findpcb"
but here because the
>>>> inp is not more in TCP hash table and it will not be find again
by
>>>> in_pcblookup_mbuf().
>>>>
>>>>  Hopefully I am clear enough here.
>>>
>>> Thanks, very clear.
>>> Small qeustion: when both thread run on same CPU core, INP_WLOCK
will
>>> be re-schedule?
>>
>>  Hmm, a thread can re-scheduled but not a lock. Thus no sure I
>> understand your question here.  :)
> 
> I am don't know how work INP_WLOCK in this case (all on same cpu):
> 
> thread1: INP_WLOCK
> -interrupt--
> thread2: INP_WLOCK
> 
> if INP_WLOCK is like spinlock -- this is dead lock.
> if INP_WLOCK is like mutex -- thread1 resheduled.
 Thanks, I understand you question now.  No an interrupt cannot bypass a
lock:  Here INP_WLOCK is like mutex -- thread1 resheduled.
>>> As I remeber race created by call tcp_twstart() at time of end
>>> tcp_close(), at path sofree()-tcp_usr_detach() and unexpected
>>> INP_TIMEWAIT state in the tcp_usr_detach(). INP_TIMEWAIT set in
tcp_twstart()
>>
>>  Exactly, thus the current fix is:  If you already have the INP_DROPPED
>> flag set you are not allowed to call tcp_twstart(), actually it is a
>> good candidate for a new INVARIANT.  Let me add that.
>>
>>> After check source code I am found invocation of tcp_twstart() in
>>> sys/netinet/tcp_stacks/fastpath.c, sys/netinet/tcp_input.c,
>>> sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, sys/dev/cxgbe/tom/t4_cpl_io.c.
>>>
>>> Invocation from sys/netinet/tcp_stacks/fastpath.c and
>>> sys/netinet/tcp_input.c guarded by INP_WLOCK in tcp_input(), and
now
>>> will be OK.
>>>
>>> Invocation from sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c and
>>> sys/dev/cxgbe/tom/t4_cpl_io.c is not clear to me, I am see
independed
>>> INP_WLOCK. Is this OK?
>>>
>>> Can be thread A wants do_peer_close() directed from chelsio IRQ
>>> handler, bypass tcp_input()?
>>
>>  If you look carefully INP_WLOCK is used in cxgb_cpl_io.c and
>> t4_cpl_io.c before calling tcp_twstart().
> 
> Yes, and you remeber: sys/netinet/tcp_subr.c
> 
>   1535  struct tcpcb *
>   1536  tcp_close(struct tcpcb *tp)
>   1537  {
> ...
>   1569                  INP_WUNLOCK(inp);
>   1570                  ACCEPT_LOCK();
>   1571                  SOCK_LOCK(so);
>   1572                  so->so_state &= ~SS_PROTOREF;
>   1573                  sofree(so);
>   1574                  return (NULL);
> 
> sofree() call tcp_usr_detach() and in tcp_usr_detach() we have
> unexpected INP_TIMEWAIT.
 I see, thus just for the context:  The TCP stack in sys/dev/cxgb* is a
TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a
separate/side TCP stack that is used only with TCP_OFFLOAD option.

 This TOE TCP stack actually has its own set of detach()/input()
functions and seems to check INP_DROPPED flag properly.  I guess @np
check fixes in socket TCP stack and decides which one can also impact
the Chelsio TOE TCP stack.  Some bugs are only in socket TCP stack, some
are only in TOE TCP stack.

--
Julien

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20161012/656e3df0/attachment.sig>

Slawa Olhovchenkov

2016-Oct-12 09:29 UTC

head link

11.0 stuck on high network load

On Wed, Oct 12, 2016 at 11:19:48AM +0200, Julien Charbon wrote:
> > if INP_WLOCK is like spinlock -- this is dead lock.
> > if INP_WLOCK is like mutex -- thread1 resheduled.
> 
>  Thanks, I understand you question now.  No an interrupt cannot bypass a
> lock:  Here INP_WLOCK is like mutex -- thread1 resheduled.
Thanks, nice.
> >>> As I remeber race created by call tcp_twstart() at time of end
> >>> tcp_close(), at path sofree()-tcp_usr_detach() and unexpected
> >>> INP_TIMEWAIT state in the tcp_usr_detach(). INP_TIMEWAIT set
in tcp_twstart()
> >>
> >>  Exactly, thus the current fix is:  If you already have the
INP_DROPPED
> >> flag set you are not allowed to call tcp_twstart(), actually it is
a
> >> good candidate for a new INVARIANT.  Let me add that.
> >>
> >>> After check source code I am found invocation of tcp_twstart()
in
> >>> sys/netinet/tcp_stacks/fastpath.c, sys/netinet/tcp_input.c,
> >>> sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c,
sys/dev/cxgbe/tom/t4_cpl_io.c.
> >>>
> >>> Invocation from sys/netinet/tcp_stacks/fastpath.c and
> >>> sys/netinet/tcp_input.c guarded by INP_WLOCK in tcp_input(),
and now
> >>> will be OK.
> >>>
> >>> Invocation from sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c and
> >>> sys/dev/cxgbe/tom/t4_cpl_io.c is not clear to me, I am see
independed
> >>> INP_WLOCK. Is this OK?
> >>>
> >>> Can be thread A wants do_peer_close() directed from chelsio
IRQ
> >>> handler, bypass tcp_input()?
> >>
> >>  If you look carefully INP_WLOCK is used in cxgb_cpl_io.c and
> >> t4_cpl_io.c before calling tcp_twstart().
> > 
> > Yes, and you remeber: sys/netinet/tcp_subr.c
> > 
> >   1535  struct tcpcb *
> >   1536  tcp_close(struct tcpcb *tp)
> >   1537  {
> > ...
> >   1569                  INP_WUNLOCK(inp);
> >   1570                  ACCEPT_LOCK();
> >   1571                  SOCK_LOCK(so);
> >   1572                  so->so_state &= ~SS_PROTOREF;
> >   1573                  sofree(so);
> >   1574                  return (NULL);
> > 
> > sofree() call tcp_usr_detach() and in tcp_usr_detach() we have
> > unexpected INP_TIMEWAIT.
> 
>  I see, thus just for the context:  The TCP stack in sys/dev/cxgb* is a
> TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a
> separate/side TCP stack that is used only with TCP_OFFLOAD option.
> 
>  This TOE TCP stack actually has its own set of detach()/input()
> functions and seems to check INP_DROPPED flag properly.  I guess @np
> check fixes in socket TCP stack and decides which one can also impact
> the Chelsio TOE TCP stack.  Some bugs are only in socket TCP stack, some
> are only in TOE TCP stack.
I am fear about other direction -- setting INP_TIMEWAIT in Chelsio TOE
TCP stack and impact this to
tcp_timer_2msl()/tcp_close()/sofree()/tcp_usr_detach() path.

freebsd stable - Oct 2016 - 11.0 stuck on high network load

11.0 stuck on high network load

11.0 stuck on high network load

11.0 stuck on high network load