thr3ads.net - freebsd stable - 82576 + NETMAP + VLAN [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Giuseppe Lettieri

2016-Feb-15 16:02 UTC

82576 + NETMAP + VLAN

Il 15/02/2016 16:13, Slawa Olhovchenkov ha scritto:> On Mon, Feb 15, 2016 at 04:10:30PM +0100, Giuseppe Lettieri wrote:
>
>> Hi Slawa,
>>
>> I think WITNESS is seeing a false positive, since those two are always
>> different mutexes.
>>
>> The actual deadlock you experience should be caused by something else.
I
>
> Are you sure? When deadlock occur I am see threads waiting on nm_kn_lock.
The deadlock I mentioned still involves nm_kn_locks, sorry if I was not 
clear about that. I am just saying that we never try to take the same 
lock that we already holding.

Nonetheless, there are indeed problems in the path that WITNESS has 
seen. The problem is that pipes have to notify the other end while 
called by kevent. kevent holds the nm_kn_lock on the TX src ring and the 
notification takes the nm_kn_lock on the RX dst ring.>
>> have not been able to reproduce it locally (I have not tried that hard,
>> to be honest). I am pretty sure that there is a lock inversion - one
>> that may cause real deadlocks - when you use netmap pipes+kqueue and
you
>> don't pass NETMAP_NO_TX_POLL at NIOCREGIF time. The attached patch
>> should solve this particular problem, but there may be others. May you
>> please try it?
>
> Try it with or w/o WITNESS?
I am trying to see if the actual deadlock disappears, so disable WITNESS 
if it slows down the system and masks the real deadlock. Otherwise, 
leave it on.

Cheers,
Giuseppe

>
>> Cheers,
>> Giuseppe
>>
>> Il 11/02/2016 14:34, Slawa Olhovchenkov ha scritto:
>>> On Thu, Feb 11, 2016 at 10:11:59AM +0100, Giuseppe Lettieri wrote:
>>>
>>>> Il 10/02/2016 14:53, Slawa Olhovchenkov ha scritto:
>>>>> On Wed, Feb 10, 2016 at 02:33:20PM +0100, Giuseppe Lettieri
wrote:
>>>>>
>>>>>> Il 10/02/2016 12:59, Slawa Olhovchenkov ha scritto:
>>>>>>> Can you look also on second issue?
>>>>>>>
>>>>>>> PS: What need from me? May be open PR?
>>>>>>
>>>>>> May you provide some example code that triggers the
issue?
>>>>>
>>>>> This is about 700 lines of code (not very clear), may be I
can describe it?
>>>>
>>>> I just need some code to trigger the problem locally. Don't
worry about
>>>> the clarity and the line count, unless you cannot share the
code for
>>>> other reasons.
>>>
>>> I am attach source.
>>> run as "prog if1 if2"
>>> Got `acquiring duplicate lock of same type: "nm_kn_lock"`
immediatly
>>> after start.
>>> Dead locking may be occur immediatly after start or may be need
>>> traffic flooding.
>>>
>>
>>
>> --
>> Dr. Ing. Giuseppe Lettieri
>> Dipartimento di Ingegneria della Informazione
>> Universita' di Pisa
>> Largo Lucio Lazzarino 1, 56122 Pisa - Italy
>> Ph. : (+39) 050-2217.649 (direct) .599 (switch)
>> Fax : (+39) 050-2217.600
>> e-mail: g.lettieri at iet.unipi.it
>
>> Index: dev/netmap/netmap.c
>>
==================================================================>> ---
dev/netmap/netmap.c	(revision 287671)
>> +++ dev/netmap/netmap.c	(working copy)
>> @@ -2378,7 +2378,7 @@
>>   	 * XXX should also check cur != hwcur on the tx rings.
>>   	 * Fortunately, normal tx mode has np_txpoll set.
>>   	 */
>> -	if (priv->np_txpoll || want_tx) {
>> +	if ((priv->np_txpoll && !is_kevent) || want_tx) {
>>   		/*
>>   		 * The first round checks if anyone is ready, if not
>>   		 * do a selrecord and another round to handle races.
>

-- 
Dr. Ing. Giuseppe Lettieri
Dipartimento di Ingegneria della Informazione
Universita' di Pisa
Largo Lucio Lazzarino 1, 56122 Pisa - Italy
Ph. : (+39) 050-2217.649 (direct) .599 (switch)
Fax : (+39) 050-2217.600
e-mail: g.lettieri at iet.unipi.it

Slawa Olhovchenkov

2016-Feb-15 17:49 UTC

head link

82576 + NETMAP + VLAN

On Mon, Feb 15, 2016 at 05:02:36PM +0100, Giuseppe Lettieri wrote:
> Il 15/02/2016 16:13, Slawa Olhovchenkov ha scritto:
> > On Mon, Feb 15, 2016 at 04:10:30PM +0100, Giuseppe Lettieri wrote:
> >
> >> Hi Slawa,
> >>
> >> I think WITNESS is seeing a false positive, since those two are
always
> >> different mutexes.
> >>
> >> The actual deadlock you experience should be caused by something
else. I
> >
> > Are you sure? When deadlock occur I am see threads waiting on
nm_kn_lock.
> 
> The deadlock I mentioned still involves nm_kn_locks, sorry if I was not 
> clear about that. I am just saying that we never try to take the same 
> lock that we already holding.
> 
> Nonetheless, there are indeed problems in the path that WITNESS has 
> seen. The problem is that pipes have to notify the other end while 
> called by kevent. kevent holds the nm_kn_lock on the TX src ring and the 
> notification takes the nm_kn_lock on the RX dst ring.
Thanks for clarification.
> >
> >> have not been able to reproduce it locally (I have not tried that
hard,
> >> to be honest). I am pretty sure that there is a lock inversion -
one
> >> that may cause real deadlocks - when you use netmap pipes+kqueue
and you
> >> don't pass NETMAP_NO_TX_POLL at NIOCREGIF time. The attached
patch
> >> should solve this particular problem, but there may be others. May
you
> >> please try it?
> >
> > Try it with or w/o WITNESS?
> 
> I am trying to see if the actual deadlock disappears, so disable WITNESS 
> if it slows down the system and masks the real deadlock. Otherwise, 
> leave it on.
OK. With and w/o WITNESS I am currently don't see deadlock.

Just for record, two LOR, may be already well-known:

lock order reversal:
 1st 0xfffffe0172c6fa78 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3130
 2nd 0xfffff8005ca81000 dirhash (dirhash) @
/usr/src/sys/ufs/ufs/ufs_dirhash.c:280
KDB: stack backtrace:
#0 0xffffffff809702b0 at kdb_backtrace+0x60
#1 0xffffffff8098825e at witness_checkorder+0xc7e
#2 0xffffffff8093e137 at _sx_xlock+0x47
#3 0xffffffff80b75d6a at ufsdirhash_add+0x3a
#4 0xffffffff80b78b40 at ufs_direnter+0x6a0
#5 0xffffffff80b815ab at ufs_makeinode+0x56b
#6 0xffffffff80b7d5dd at ufs_create+0x2d
#7 0xffffffff80e33311 at VOP_CREATE_APV+0xa1
#8 0xffffffff809e2009 at vn_open_cred+0x3b9
#9 0xffffffff809db30f at kern_openat+0x26f
#10 0xffffffff80d0e8a4 at amd64_syscall+0x2d4
#11 0xffffffff80cf4f5b at Xfast_syscall+0xfb
lock order reversal:
 1st 0xfffff80049138d50 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2415
 2nd 0xfffffe0172cb1b80 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_vnops.c:262
 3rd 0xfffff800a6832d50 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2415
KDB: stack backtrace:
#0 0xffffffff809702b0 at kdb_backtrace+0x60
#1 0xffffffff8098825e at witness_checkorder+0xc7e
#2 0xffffffff80918dd8 at __lockmgr_args+0x738
#3 0xffffffff80b71594 at ffs_lock+0x84
#4 0xffffffff80e3512b at VOP_LOCK1_APV+0xab
#5 0xffffffff809e28f3 at _vn_lock+0x43
#6 0xffffffff809d42fb at vget+0x5b
#7 0xffffffff809c8c51 at vfs_hash_get+0xe1
#8 0xffffffff80b6d0a0 at ffs_vgetf+0x40
#9 0xffffffff80b64c50 at softdep_sync_buf+0x300
#10 0xffffffff80b72296 at ffs_syncvnode+0x226
#11 0xffffffff80b4b6b3 at ffs_truncate+0x683
#12 0xffffffff80b78c99 at ufs_direnter+0x7f9
#13 0xffffffff80b808eb at ufs_mkdir+0x86b
#14 0xffffffff80e34987 at VOP_MKDIR_APV+0xa7
#15 0xffffffff809dfca9 at kern_mkdirat+0x209
#16 0xffffffff80d0e8a4 at amd64_syscall+0x2d4
#17 0xffffffff80cf4f5b at Xfast_syscall+0xfb

> >
> >> Cheers,
> >> Giuseppe
> >>
> >> Il 11/02/2016 14:34, Slawa Olhovchenkov ha scritto:
> >>> On Thu, Feb 11, 2016 at 10:11:59AM +0100, Giuseppe Lettieri
wrote:
> >>>
> >>>> Il 10/02/2016 14:53, Slawa Olhovchenkov ha scritto:
> >>>>> On Wed, Feb 10, 2016 at 02:33:20PM +0100, Giuseppe
Lettieri wrote:
> >>>>>
> >>>>>> Il 10/02/2016 12:59, Slawa Olhovchenkov ha
scritto:
> >>>>>>> Can you look also on second issue?
> >>>>>>>
> >>>>>>> PS: What need from me? May be open PR?
> >>>>>>
> >>>>>> May you provide some example code that triggers
the issue?
> >>>>>
> >>>>> This is about 700 lines of code (not very clear), may
be I can describe it?
> >>>>
> >>>> I just need some code to trigger the problem locally.
Don't worry about
> >>>> the clarity and the line count, unless you cannot share
the code for
> >>>> other reasons.
> >>>
> >>> I am attach source.
> >>> run as "prog if1 if2"
> >>> Got `acquiring duplicate lock of same type:
"nm_kn_lock"` immediatly
> >>> after start.
> >>> Dead locking may be occur immediatly after start or may be
need
> >>> traffic flooding.
> >>>
> >>
> >>
> >> --
> >> Dr. Ing. Giuseppe Lettieri
> >> Dipartimento di Ingegneria della Informazione
> >> Universita' di Pisa
> >> Largo Lucio Lazzarino 1, 56122 Pisa - Italy
> >> Ph. : (+39) 050-2217.649 (direct) .599 (switch)
> >> Fax : (+39) 050-2217.600
> >> e-mail: g.lettieri at iet.unipi.it
> >
> >> Index: dev/netmap/netmap.c
> >>
==================================================================> >>
--- dev/netmap/netmap.c	(revision 287671)
> >> +++ dev/netmap/netmap.c	(working copy)
> >> @@ -2378,7 +2378,7 @@
> >>   	 * XXX should also check cur != hwcur on the tx rings.
> >>   	 * Fortunately, normal tx mode has np_txpoll set.
> >>   	 */
> >> -	if (priv->np_txpoll || want_tx) {
> >> +	if ((priv->np_txpoll && !is_kevent) || want_tx) {
> >>   		/*
> >>   		 * The first round checks if anyone is ready, if not
> >>   		 * do a selrecord and another round to handle races.
> >
> 
> 
> -- 
> Dr. Ing. Giuseppe Lettieri
> Dipartimento di Ingegneria della Informazione
> Universita' di Pisa
> Largo Lucio Lazzarino 1, 56122 Pisa - Italy
> Ph. : (+39) 050-2217.649 (direct) .599 (switch)
> Fax : (+39) 050-2217.600
> e-mail: g.lettieri at iet.unipi.it

Slawa Olhovchenkov

2016-Feb-15 21:44 UTC

head link

82576 + NETMAP + VLAN

On Mon, Feb 15, 2016 at 05:02:36PM +0100, Giuseppe Lettieri wrote:
> Il 15/02/2016 16:13, Slawa Olhovchenkov ha scritto:
> > On Mon, Feb 15, 2016 at 04:10:30PM +0100, Giuseppe Lettieri wrote:
> >
> >> Hi Slawa,
> >>
> >> I think WITNESS is seeing a false positive, since those two are
always
> >> different mutexes.
> >>
> >> The actual deadlock you experience should be caused by something
else. I
> >
> > Are you sure? When deadlock occur I am see threads waiting on
nm_kn_lock.
> 
> The deadlock I mentioned still involves nm_kn_locks, sorry if I was not 
> clear about that. I am just saying that we never try to take the same 
> lock that we already holding.
> 
> Nonetheless, there are indeed problems in the path that WITNESS has 
> seen. The problem is that pipes have to notify the other end while 
> called by kevent. kevent holds the nm_kn_lock on the TX src ring and the 
> notification takes the nm_kn_lock on the RX dst ring.
Can you comment other issuses? Is this by design or is this bug?

- with kevent sync for transmiting need first register
  EVFILT_WRITE/EV_DISABLE and after every write
  EVFILT_WRITE/EV_DISPATCH

- with kevent sync all opening /dev/netmap and registering need do
  from same thread as kevent using for sinc (unless no RX/TX).

freebsd stable - Feb 2016 - 82576 + NETMAP + VLAN

82576 + NETMAP + VLAN

82576 + NETMAP + VLAN

82576 + NETMAP + VLAN