On Wed, Oct 12, 2016 at 10:18:18AM +0200, Julien Charbon wrote:> > Hi Slawa, > > On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote: > > On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote: > >> Then threads are competing for the INP_WLOCK lock. For the example, > >> let's say the thread A wants to run tcp_input()/in_pcblookup_mbuf() and > >> racing for this INP_WLOCK: > >> > >> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/in_pcb.c#L1964 > >> > >> And thread B wants to run tcp_timer_2msl()/tcp_close()/in_pcbdrop() and > >> racing for this INP_WLOCK: > >> > >> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/tcp_timer.c#L323 > >> > >> That leads to two cases: > >> > >> o Thread A wins the race: > >> > >> Thread A will continue tcp_input() as usal and INP_DROPPED flags is > >> not set and inp is still in TCP hash table. > >> Thread B is waiting on thread A to release INP_WLOCK after finishing > >> tcp_input() processing, and thread B will continue > >> tcp_timer_2msl()/tcp_close()/in_pcbdrop() processing. > >> > >> o Thread B wins the race: > >> > >> Thread B runs tcp_timer_2msl()/tcp_close()/in_pcbdrop() and inp > >> INP_DROPPED is set and inp being removed from TCP hash table. > >> In parallel, thread A has found the inp in TCP hash before is was > >> removed, and waiting on the found inp INP_WLOCK lock. > >> Once thread B has released the INP_WLOCK lock, thread A gets this lock > >> and sees the INP_DROPPED flag and do "goto findpcb" but here because the > >> inp is not more in TCP hash table and it will not be find again by > >> in_pcblookup_mbuf(). > >> > >> Hopefully I am clear enough here. > > > > Thanks, very clear. > > Small qeustion: when both thread run on same CPU core, INP_WLOCK will > > be re-schedule? > > Hmm, a thread can re-scheduled but not a lock. Thus no sure I > understand your question here. :)I am don't know how work INP_WLOCK in this case (all on same cpu): thread1: INP_WLOCK -interrupt-- thread2: INP_WLOCK if INP_WLOCK is like spinlock -- this is dead lock. if INP_WLOCK is like mutex -- thread1 resheduled.> > As I remeber race created by call tcp_twstart() at time of end > > tcp_close(), at path sofree()-tcp_usr_detach() and unexpected > > INP_TIMEWAIT state in the tcp_usr_detach(). INP_TIMEWAIT set in tcp_twstart() > > Exactly, thus the current fix is: If you already have the INP_DROPPED > flag set you are not allowed to call tcp_twstart(), actually it is a > good candidate for a new INVARIANT. Let me add that. > > > After check source code I am found invocation of tcp_twstart() in > > sys/netinet/tcp_stacks/fastpath.c, sys/netinet/tcp_input.c, > > sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, sys/dev/cxgbe/tom/t4_cpl_io.c. > > > > Invocation from sys/netinet/tcp_stacks/fastpath.c and > > sys/netinet/tcp_input.c guarded by INP_WLOCK in tcp_input(), and now > > will be OK. > > > > Invocation from sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c and > > sys/dev/cxgbe/tom/t4_cpl_io.c is not clear to me, I am see independed > > INP_WLOCK. Is this OK? > > > > Can be thread A wants do_peer_close() directed from chelsio IRQ > > handler, bypass tcp_input()? > > If you look carefully INP_WLOCK is used in cxgb_cpl_io.c and > t4_cpl_io.c before calling tcp_twstart().Yes, and you remeber: sys/netinet/tcp_subr.c 1535 struct tcpcb * 1536 tcp_close(struct tcpcb *tp) 1537 { ... 1569 INP_WUNLOCK(inp); 1570 ACCEPT_LOCK(); 1571 SOCK_LOCK(so); 1572 so->so_state &= ~SS_PROTOREF; 1573 sofree(so); 1574 return (NULL); sofree() call tcp_usr_detach() and in tcp_usr_detach() we have unexpected INP_TIMEWAIT.
Hi Slawa, On 10/12/16 10:40 AM, Slawa Olhovchenkov wrote:> On Wed, Oct 12, 2016 at 10:18:18AM +0200, Julien Charbon wrote: >> On 10/11/16 2:11 PM, Slawa Olhovchenkov wrote: >>> On Tue, Oct 11, 2016 at 09:20:17AM +0200, Julien Charbon wrote: >>>> Then threads are competing for the INP_WLOCK lock. For the example, >>>> let's say the thread A wants to run tcp_input()/in_pcblookup_mbuf() and >>>> racing for this INP_WLOCK: >>>> >>>> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/in_pcb.c#L1964 >>>> >>>> And thread B wants to run tcp_timer_2msl()/tcp_close()/in_pcbdrop() and >>>> racing for this INP_WLOCK: >>>> >>>> https://github.com/freebsd/freebsd/blob/release/11.0.0/sys/netinet/tcp_timer.c#L323 >>>> >>>> That leads to two cases: >>>> >>>> o Thread A wins the race: >>>> >>>> Thread A will continue tcp_input() as usal and INP_DROPPED flags is >>>> not set and inp is still in TCP hash table. >>>> Thread B is waiting on thread A to release INP_WLOCK after finishing >>>> tcp_input() processing, and thread B will continue >>>> tcp_timer_2msl()/tcp_close()/in_pcbdrop() processing. >>>> >>>> o Thread B wins the race: >>>> >>>> Thread B runs tcp_timer_2msl()/tcp_close()/in_pcbdrop() and inp >>>> INP_DROPPED is set and inp being removed from TCP hash table. >>>> In parallel, thread A has found the inp in TCP hash before is was >>>> removed, and waiting on the found inp INP_WLOCK lock. >>>> Once thread B has released the INP_WLOCK lock, thread A gets this lock >>>> and sees the INP_DROPPED flag and do "goto findpcb" but here because the >>>> inp is not more in TCP hash table and it will not be find again by >>>> in_pcblookup_mbuf(). >>>> >>>> Hopefully I am clear enough here. >>> >>> Thanks, very clear. >>> Small qeustion: when both thread run on same CPU core, INP_WLOCK will >>> be re-schedule? >> >> Hmm, a thread can re-scheduled but not a lock. Thus no sure I >> understand your question here. :) > > I am don't know how work INP_WLOCK in this case (all on same cpu): > > thread1: INP_WLOCK > -interrupt-- > thread2: INP_WLOCK > > if INP_WLOCK is like spinlock -- this is dead lock. > if INP_WLOCK is like mutex -- thread1 resheduled.Thanks, I understand you question now. No an interrupt cannot bypass a lock: Here INP_WLOCK is like mutex -- thread1 resheduled.>>> As I remeber race created by call tcp_twstart() at time of end >>> tcp_close(), at path sofree()-tcp_usr_detach() and unexpected >>> INP_TIMEWAIT state in the tcp_usr_detach(). INP_TIMEWAIT set in tcp_twstart() >> >> Exactly, thus the current fix is: If you already have the INP_DROPPED >> flag set you are not allowed to call tcp_twstart(), actually it is a >> good candidate for a new INVARIANT. Let me add that. >> >>> After check source code I am found invocation of tcp_twstart() in >>> sys/netinet/tcp_stacks/fastpath.c, sys/netinet/tcp_input.c, >>> sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, sys/dev/cxgbe/tom/t4_cpl_io.c. >>> >>> Invocation from sys/netinet/tcp_stacks/fastpath.c and >>> sys/netinet/tcp_input.c guarded by INP_WLOCK in tcp_input(), and now >>> will be OK. >>> >>> Invocation from sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c and >>> sys/dev/cxgbe/tom/t4_cpl_io.c is not clear to me, I am see independed >>> INP_WLOCK. Is this OK? >>> >>> Can be thread A wants do_peer_close() directed from chelsio IRQ >>> handler, bypass tcp_input()? >> >> If you look carefully INP_WLOCK is used in cxgb_cpl_io.c and >> t4_cpl_io.c before calling tcp_twstart(). > > Yes, and you remeber: sys/netinet/tcp_subr.c > > 1535 struct tcpcb * > 1536 tcp_close(struct tcpcb *tp) > 1537 { > ... > 1569 INP_WUNLOCK(inp); > 1570 ACCEPT_LOCK(); > 1571 SOCK_LOCK(so); > 1572 so->so_state &= ~SS_PROTOREF; > 1573 sofree(so); > 1574 return (NULL); > > sofree() call tcp_usr_detach() and in tcp_usr_detach() we have > unexpected INP_TIMEWAIT.I see, thus just for the context: The TCP stack in sys/dev/cxgb* is a TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a separate/side TCP stack that is used only with TCP_OFFLOAD option. This TOE TCP stack actually has its own set of detach()/input() functions and seems to check INP_DROPPED flag properly. I guess @np check fixes in socket TCP stack and decides which one can also impact the Chelsio TOE TCP stack. Some bugs are only in socket TCP stack, some are only in TOE TCP stack. -- Julien -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20161012/656e3df0/attachment.sig>
On Wed, Oct 12, 2016 at 11:19:48AM +0200, Julien Charbon wrote:> > if INP_WLOCK is like spinlock -- this is dead lock. > > if INP_WLOCK is like mutex -- thread1 resheduled. > > Thanks, I understand you question now. No an interrupt cannot bypass a > lock: Here INP_WLOCK is like mutex -- thread1 resheduled.Thanks, nice.> >>> As I remeber race created by call tcp_twstart() at time of end > >>> tcp_close(), at path sofree()-tcp_usr_detach() and unexpected > >>> INP_TIMEWAIT state in the tcp_usr_detach(). INP_TIMEWAIT set in tcp_twstart() > >> > >> Exactly, thus the current fix is: If you already have the INP_DROPPED > >> flag set you are not allowed to call tcp_twstart(), actually it is a > >> good candidate for a new INVARIANT. Let me add that. > >> > >>> After check source code I am found invocation of tcp_twstart() in > >>> sys/netinet/tcp_stacks/fastpath.c, sys/netinet/tcp_input.c, > >>> sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c, sys/dev/cxgbe/tom/t4_cpl_io.c. > >>> > >>> Invocation from sys/netinet/tcp_stacks/fastpath.c and > >>> sys/netinet/tcp_input.c guarded by INP_WLOCK in tcp_input(), and now > >>> will be OK. > >>> > >>> Invocation from sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c and > >>> sys/dev/cxgbe/tom/t4_cpl_io.c is not clear to me, I am see independed > >>> INP_WLOCK. Is this OK? > >>> > >>> Can be thread A wants do_peer_close() directed from chelsio IRQ > >>> handler, bypass tcp_input()? > >> > >> If you look carefully INP_WLOCK is used in cxgb_cpl_io.c and > >> t4_cpl_io.c before calling tcp_twstart(). > > > > Yes, and you remeber: sys/netinet/tcp_subr.c > > > > 1535 struct tcpcb * > > 1536 tcp_close(struct tcpcb *tp) > > 1537 { > > ... > > 1569 INP_WUNLOCK(inp); > > 1570 ACCEPT_LOCK(); > > 1571 SOCK_LOCK(so); > > 1572 so->so_state &= ~SS_PROTOREF; > > 1573 sofree(so); > > 1574 return (NULL); > > > > sofree() call tcp_usr_detach() and in tcp_usr_detach() we have > > unexpected INP_TIMEWAIT. > > I see, thus just for the context: The TCP stack in sys/dev/cxgb* is a > TOE (TCP Offload Engine?) TCP stack for Chelsio NICs, it is a > separate/side TCP stack that is used only with TCP_OFFLOAD option. > > This TOE TCP stack actually has its own set of detach()/input() > functions and seems to check INP_DROPPED flag properly. I guess @np > check fixes in socket TCP stack and decides which one can also impact > the Chelsio TOE TCP stack. Some bugs are only in socket TCP stack, some > are only in TOE TCP stack.I am fear about other direction -- setting INP_TIMEWAIT in Chelsio TOE TCP stack and impact this to tcp_timer_2msl()/tcp_close()/sofree()/tcp_usr_detach() path.