thr3ads.net - Linux Virtualization - [PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Waiman Long

2014-Mar-12 19:08 UTC

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

On 03/12/2014 02:54 PM, Waiman Long wrote:> +
> +		/*
> +		 * Now wait until the lock bit is cleared
> +		 */
> +		while (smp_load_acquire(&qlock->qlcode)&  _QSPINLOCK_LOCKED)
> +			arch_mutex_cpu_relax();
> +
> +		/*
> +		 * Set the lock bit&  clear the waiting bit simultaneously
> +		 * It is assumed that there is no lock stealing with this
> +		 * quick path active.
> +		 *
> +		 * A direct memory store of _QSPINLOCK_LOCKED into the
> +		 * lock_wait field causes problem with the lockref code, e.g.
> +		 *   ACCESS_ONCE(qlock->lock_wait) = _QSPINLOCK_LOCKED;
> +		 *
> +		 * It is not currently clear why this happens. A workaround
> +		 * is to use atomic instruction to store the new value.
> +		 */
> +		{
> +			u16 lw = xchg(&qlock->lock_wait, _QSPINLOCK_LOCKED);
> +			BUG_ON(lw != _QSPINLOCK_WAITING);
> +		}
> +		return 1;
>
It was found that when I used a direct memory store instead of an atomic 
op, the following kernel crash might happen at filesystem dismount time:

Red Hat Enterprise Linux Server 7.0 (Maipo)
Kernel 3.14.0-rc6-qlock on an x86_64

h11-kvm20 login: [ 1529.934047] BUG: Dentry 
ffff883f4c048480{i=30181e9e,n=libopc
odes-2.23.52.0.1-15.el7.so} still in use (-1) [unmount of xfs dm-1]
[ 1529.935762] ------------[ cut here ]------------
[ 1529.936331] kernel BUG at fs/dcache.c:1343!
[ 1529.936714] invalid opcode: 0000 [#1] SMP
[ 1529.936714] Modules linked in: ext4 mbcache jbd2 binfmt_misc brd 
ip6t_rpfilte
r cfg80211 ip6t_REJECT rfkill ipt_REJECT xt_conntrack ebtable_nat 
ebtable_broute
  bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 
nf_defrag
_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw 
ip6table_filter
  ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_c
onntrack iptable_mangle iptable_security iptable_raw iptable_filter 
ip_tables sg
  ppdev snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep 
snd_seq snd_s
eq_device snd_pcm snd_timer snd parport_pc parport soundcore serio_raw 
i2c_piix4
  virtio_console virtio_balloon microcode pcspkr nfsd auth_rpcgss 
nfs_acl lockd s
unrpc uinput xfs libcrc32c sr_mod cdrom ata_generic pata_acpi qxl 
virtio_blk vir
tio_net drm_kms_helper ttm drm ata_piix libata virtio_pci virtio_ring 
floppy i2c
_core virtio dm_mirror dm_region_hash dm_log dm_mod
[ 1529.936714] CPU: 12 PID: 11106 Comm: umount Not tainted 
3.14.0-rc6-qlock #1
[ 1529.936714] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[ 1529.936714] task: ffff881f9183b540 ti: ffff881f920fa000 task.ti: 
ffff881f920f
a000
[ 1529.936714] RIP: 0010:[<ffffffff811c185c>]  [<ffffffff811c185c>] 
umount_colle
ct+0xec/0x110
[ 1529.936714] RSP: 0018:ffff881f920fbdc8  EFLAGS: 00010282
[ 1529.936714] RAX: 0000000000000073 RBX: ffff883f4c048480 RCX: 
0000000000000000
[ 1529.936714] RDX: 0000000000000001 RSI: 0000000000000046 RDI: 
0000000000000246
[ 1529.936714] RBP: ffff881f920fbde0 R08: ffffffff819e42e0 R09: 
0000000000000396
[ 1529.936714] R10: 0000000000000000 R11: ffff881f920fbb06 R12: 
ffff881f920fbe60
[ 1529.936714] R13: ffff883f8d458460 R14: ffff883f4c048480 R15: 
ffff883f8d4583c0
[ 1529.936714] FS:  00007f6027b0c880(0000) GS:ffff88403fc40000(0000) 
knlGS:00000
00000000000
[ 1529.936714] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1529.936714] CR2: 00007f60276c4900 CR3: 0000003f421c0000 CR4: 
00000000000006e0
[ 1529.936714] Stack:
[ 1529.936714]  ffff883f8edf4ac8 ffff883f4c048510 ffff883f910a02d0 
ffff881f920fb
e50
[ 1529.936714]  ffffffff811c2d03 0000000000000000 00ff881f920fbe50 
0000896600000
000
[ 1529.936714]  ffff883f8d4587d8 ffff883f8d458780 ffffffff811c1770 
ffff881f920fb
e60
[ 1529.936714] Call Trace:
[ 1529.936714]  [<ffffffff811c2d03>] d_walk+0xc3/0x260
[ 1529.936714]  [<ffffffff811c1770>] ? check_and_collect+0x30/0x30
[ 1529.936714]  [<ffffffff811c3985>] shrink_dcache_for_umount+0x75/0x120
[ 1529.936714]  [<ffffffff811adf21>] generic_shutdown_super+0x21/0xf0
[ 1529.936714]  [<ffffffff811ae207>] kill_block_super+0x27/0x70
[ 1529.936714]  [<ffffffff811ae4ed>] deactivate_locked_super+0x3d/0x60
[ 1529.936714]  [<ffffffff811aea96>] deactivate_super+0x46/0x60
[ 1529.936714]  [<ffffffff811ca277>] mntput_no_expire+0xa7/0x140
[ 1529.936714]  [<ffffffff811cb6ce>] SyS_umount+0x8e/0x100
[ 1529.936714]  [<ffffffff815d2c29>] system_call_fastpath+0x16/0x1b
[ 1529.936714] Code: 00 00 48 8b 40 28 4c 8b 08 48 8b 43 30 48 85 c0 74 
2a 48 8b
  50 40 48 89 34 24 48 c7 c7 e0 4a 7f 81 48 89 de 31 c0 e8 03 cb 3f 00 
<0f> 0b 66
  90 48 89 f7 e8 c8 fc ff ff e9 66 ff ff ff 31 d2 90 eb
[ 1529.936714] RIP  [<ffffffff811c185c>] umount_collect+0xec/0x110
[ 1529.936714]  RSP <ffff881f920fbdc8>
[ 1529.976523] ---[ end trace 6c8ce7cee0969bbb ]---
[ 1529.977137] Kernel panic - not syncing: Fatal exception
[ 1529.978119] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation 
range: 0xf
fffffff80000000-0xffffffff9fffffff)
[ 1529.978119] drm_kms_helper: panic occurred, switching back to text 
console

It was more readily reproducible in a KVM guest. It was harder to 
reproduce in a bare metal machine, but kernel crash still happened after 
several tries.

I am not sure what exactly cause this crash, but it will have something 
to do with the interaction between the lockref and the qspinlock code. I 
would like more eyes on that to find the root cause of it.

-Longman

Peter Zijlstra

2014-Mar-13 13:57 UTC

head link

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

On Wed, Mar 12, 2014 at 03:08:24PM -0400, Waiman Long
wrote:> On 03/12/2014 02:54 PM, Waiman Long wrote:
> >+		/*
> >+		 * Set the lock bit&  clear the waiting bit simultaneously
> >+		 * It is assumed that there is no lock stealing with this
> >+		 * quick path active.
> >+		 *
> >+		 * A direct memory store of _QSPINLOCK_LOCKED into the
> >+		 * lock_wait field causes problem with the lockref code, e.g.
> >+		 *   ACCESS_ONCE(qlock->lock_wait) = _QSPINLOCK_LOCKED;
> >+		 *
> >+		 * It is not currently clear why this happens. A workaround
> >+		 * is to use atomic instruction to store the new value.
> >+		 */
> >+		{
> >+			u16 lw = xchg(&qlock->lock_wait, _QSPINLOCK_LOCKED);
> >+			BUG_ON(lw != _QSPINLOCK_WAITING);
> >+		}
> It was found that when I used a direct memory store instead of an atomic
op,
> the following kernel crash might happen at filesystem dismount time:
> 
> [ 1529.936714] Call Trace:
> [ 1529.936714]  [<ffffffff811c2d03>] d_walk+0xc3/0x260
> [ 1529.936714]  [<ffffffff811c1770>] ? check_and_collect+0x30/0x30
> [ 1529.936714]  [<ffffffff811c3985>]
shrink_dcache_for_umount+0x75/0x120
> [ 1529.936714]  [<ffffffff811adf21>] generic_shutdown_super+0x21/0xf0
> [ 1529.936714]  [<ffffffff811ae207>] kill_block_super+0x27/0x70
> [ 1529.936714]  [<ffffffff811ae4ed>]
deactivate_locked_super+0x3d/0x60
> [ 1529.936714]  [<ffffffff811aea96>] deactivate_super+0x46/0x60
> [ 1529.936714]  [<ffffffff811ca277>] mntput_no_expire+0xa7/0x140
> [ 1529.936714]  [<ffffffff811cb6ce>] SyS_umount+0x8e/0x100
> [ 1529.936714]  [<ffffffff815d2c29>] system_call_fastpath+0x16/0x1b
> It was more readily reproducible in a KVM guest. It was harder to reproduce
> in a bare metal machine, but kernel crash still happened after several
> tries.
> 
> I am not sure what exactly cause this crash, but it will have something to
> do with the interaction between the lockref and the qspinlock code. I would
> like more eyes on that to find the root cause of it.
I cannot reproduce with my series that has the one word write.

What I did was I made my swap partition (who needs that anyway on a
machine with 16G of memory) into an XFS partition.

Then I copied my linux.git onto it and unmounted.

I'll try a few more times; the above trace seems to suggest it happens
during dcache cleanup, so I suppose I should read the filesystem some
and unmount again.

Is there anything specific you did to make it go bang?

Waiman Long

2014-Mar-17 17:23 UTC

head link

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

On 03/13/2014 09:57 AM, Peter Zijlstra wrote:> On Wed, Mar 12, 2014 at 03:08:24PM -0400, Waiman Long wrote:
>> On 03/12/2014 02:54 PM, Waiman Long wrote:
>>> +		/*
>>> +		 * Set the lock bit&   clear the waiting bit simultaneously
>>> +		 * It is assumed that there is no lock stealing with this
>>> +		 * quick path active.
>>> +		 *
>>> +		 * A direct memory store of _QSPINLOCK_LOCKED into the
>>> +		 * lock_wait field causes problem with the lockref code, e.g.
>>> +		 *   ACCESS_ONCE(qlock->lock_wait) = _QSPINLOCK_LOCKED;
>>> +		 *
>>> +		 * It is not currently clear why this happens. A workaround
>>> +		 * is to use atomic instruction to store the new value.
>>> +		 */
>>> +		{
>>> +			u16 lw = xchg(&qlock->lock_wait, _QSPINLOCK_LOCKED);
>>> +			BUG_ON(lw != _QSPINLOCK_WAITING);
>>> +		}
>> It was found that when I used a direct memory store instead of an
atomic op,
>> the following kernel crash might happen at filesystem dismount time:
>>
>> [ 1529.936714] Call Trace:
>> [ 1529.936714]  [<ffffffff811c2d03>] d_walk+0xc3/0x260
>> [ 1529.936714]  [<ffffffff811c1770>] ?
check_and_collect+0x30/0x30
>> [ 1529.936714]  [<ffffffff811c3985>]
shrink_dcache_for_umount+0x75/0x120
>> [ 1529.936714]  [<ffffffff811adf21>]
generic_shutdown_super+0x21/0xf0
>> [ 1529.936714]  [<ffffffff811ae207>] kill_block_super+0x27/0x70
>> [ 1529.936714]  [<ffffffff811ae4ed>]
deactivate_locked_super+0x3d/0x60
>> [ 1529.936714]  [<ffffffff811aea96>] deactivate_super+0x46/0x60
>> [ 1529.936714]  [<ffffffff811ca277>] mntput_no_expire+0xa7/0x140
>> [ 1529.936714]  [<ffffffff811cb6ce>] SyS_umount+0x8e/0x100
>> [ 1529.936714]  [<ffffffff815d2c29>]
system_call_fastpath+0x16/0x1b
>> It was more readily reproducible in a KVM guest. It was harder to
reproduce
>> in a bare metal machine, but kernel crash still happened after several
>> tries.
>>
>> I am not sure what exactly cause this crash, but it will have something
to
>> do with the interaction between the lockref and the qspinlock code. I
would
>> like more eyes on that to find the root cause of it.
> I cannot reproduce with my series that has the one word write.
>
> What I did was I made my swap partition (who needs that anyway on a
> machine with 16G of memory) into an XFS partition.
>
> Then I copied my linux.git onto it and unmounted.
>
> I'll try a few more times; the above trace seems to suggest it happens
> during dcache cleanup, so I suppose I should read the filesystem some
> and unmount again.
>
> Is there anything specific you did to make it go bang?
I had found the reason for the crash, it has to do with my original 
definition of the queue_spin_value_unlocked() function. When I extended 
it to cover the first 2 bytes (lock + wait bit), the problem is gone.

-Longman

Reasonably Related Threads

Search for more possibly parallel threads

Linux Virtualization - Mar 2014 - [PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

[PATCH v6 04/11] qspinlock: Optimized code path for 2 contending tasks

Reasonably Related Threads