thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH] NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock

If this information is useful, please help other people find it:
Share via:

Joseph Qi

2015-Dec-23 09:31 UTC

[Ocfs2-devel] [PATCH] NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

Hi Tariq,
As I know, global bitmap inode mutex will be held when doing allocation.
So how can it happen that EX and PR requests come concurrently int the
same node?

Thanks,
Joseph

On 2015/8/26 4:55, Tariq Saeed wrote:> Orabug: 20933419
> 
> NFS on a 2 node ocfs2 cluster each node exporting dir. The lock causing
> the hang is the global bit map inode lock.  Node 1 is master, has
> the lock granted in PR mode; Node 2 is in the converting list (PR ->
> EX). There are no holders of the lock on the master node so it should
> downconvert to NL and grant EX to node 2 but that does not happen.
> BLOCKED + QUEUED in lock res are set and it is on osb blocked list.
> Threads are waiting in __ocfs2_cluster_lock on BLOCKED.  One thread wants
> EX, rest want PR. So it is as though the downconvert thread needs to be
> kicked to complete the conv.
> 
> The hang is caused by an EX req coming into  __ocfs2_cluster_lock on
> the heels of a PR req after it sets BUSY (drops l_lock, releasing EX
> thread), forcing the incoming EX to wait on BUSY without doing anything.
> PR has called ocfs2_dlm_lock, which  sets the node 1 lock from NL ->
> PR, queues ast.
> 
> At this time, upconvert (PR ->EX) arrives from node 2, finds conflict
with
> node 1 lock in PR, so the lock res is put on dlm thread's dirty listt.
> 
> After ret from ocf2_dlm_lock, PR thread now waits behind EX on BUSY till
> awoken by ast.
> 
> Now it is dlm_thread that serially runs dlm_shuffle_lists, ast,  bast,
> in that order.  dlm_shuffle_lists ques a bast on behalf of node 2
> (which will be run by dlm_thread right after the ast).  ast does its
> part, sets UPCONVERT_FINISHING, clears BUSY and wakes its waiters. Next,
> dlm_thread runs  bast. It sets BLOCKED and kicks dc thread.  dc thread
> runs ocfs2_unblock_lock, but since UPCONVERT_FINISHING set, skips doing
> anything and reques.
> 
> Inside of __ocfs2_cluster_lock, since EX has been waiting on BUSY ahead
> of PR, it wakes up first, finds BLOCKED set and skips doing anything
> but clearing UPCONVERT_FINISHING (which was actually "meant" for
the
> PR thread), and this time waits on BLOCKED.  Next, the PR thread comes
> out of wait but since UPCONVERT_FINISHING is not set, it skips updating
> the l_ro_holders and goes straight to wait on BLOCKED. So there, we
> have a hang! Threads in __ocfs2_cluster_lock wait on BLOCKED, lock
> res in osb blocked list. Only when dc thread is awoken, it will run
> ocfs2_unblock_lock and things will unhang.
> 
> One way to fix this is to wake the dc thread on the flag after clearing
> UPCONVERT_FINISHING
> 
> Signed-off-by: Tariq Saeed <tariq.x.saeed at oracle.com>
> Reviewed-by: Wengang Wang <wen.gang.wang at oracle.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar at oracle.com>
> ---
>  fs/ocfs2/dlmglue.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
> index 8b23aa2..313c816 100644
> --- a/fs/ocfs2/dlmglue.c
> +++ b/fs/ocfs2/dlmglue.c
> @@ -1390,6 +1390,7 @@ static int __ocfs2_cluster_lock(struct ocfs2_super
*osb,
>  	unsigned int gen;
>  	int noqueue_attempted = 0;
>  	int dlm_locked = 0;
> +	int kick_dc = 0;
>  
>  	if (!(lockres->l_flags & OCFS2_LOCK_INITIALIZED)) {
>  		mlog_errno(-EINVAL);
> @@ -1524,7 +1525,12 @@ update_holders:
>  unlock:
>  	lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
>  
> +	/* ocfs2_unblock_lock reques on seeing OCFS2_LOCK_UPCONVERT_FINISHING */
> +	kick_dc = (lockres->l_flags & OCFS2_LOCK_BLOCKED);
> +
>  	spin_unlock_irqrestore(&lockres->l_lock, flags);
> +	if (kick_dc)
> +		ocfs2_wake_downconvert_thread(osb);
>  out:
>  	/*
>  	 * This is helping work around a lock inversion between the page lock
>

Tariq Saeed

2015-Dec-23 19:20 UTC

head link

[Ocfs2-devel] [PATCH] NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

Hi Joseph,
Many threads doing this:
PID: 6352   TASK: ffff8801a0a30100  CPU: 3   COMMAND: "nfsd"//blocked
on
same lock as 6323 , gen 0
  #0 [ffff8801a0a338d0] __schedule at ffffffff8150a954
  #1 [ffff8801a0a33978] schedule at ffffffff8150b0ef
  #2 [ffff8801a0a33988] schedule_timeout at ffffffff8150b59d
  #3 [ffff8801a0a33a38] wait_for_common at ffffffff8150af7a
  #4 [ffff8801a0a33ad8] wait_for_completion at ffffffff8150b0ad
  #5 [ffff8801a0a33ae8] __ocfs2_cluster_lock at ffffffffa03febd8 [ocfs2]
  #6 [ffff8801a0a33c48] ocfs2_inode_lock_full_nested at ffffffffa0404086 
[ocfs2]
  #7 [ffff8801a0a33cc8] ocfs2_statfs at ffffffffa045462e [ocfs2]
<<<<
does not take mutex of global bit map taken.
  #8 [ffff8801a0a33d28] statfs_by_dentry at ffffffff81199b29
  #9 [ffff8801a0a33d48] vfs_statfs at ffffffff81199b6b
#10 [ffff8801a0a33d68] nfsd_statfs at ffffffffa036bd73 [nfsd]
#11 [ffff8801a0a33da8] nfsd3_proc_fsstat at ffffffffa0374f0d [nfsd]
#12 [ffff8801a0a33dd8] nfsd_dispatch at ffffffffa036743e [nfsd]
#13 [ffff8801a0a33e18] svc_process_common at ffffffffa0278324 [sunrpc]
#14 [ffff8801a0a33e98] svc_process at ffffffffa0278970 [sunrpc]
#15 [ffff8801a0a33eb8] nfsd at ffffffffa0367b92 [nfsd]
#16 [ffff8801a0a33ee8] kthread at ffffffff81090ec6
#17 [ffff8801a0a33f48] kernel_thread_helper at ffffffff815166c4

and the one doing alloc w

PID: 6323   TASK: ffff8801a7054640  CPU: 0   COMMAND: "nfsd"
  #0 [ffff8801a7056f10] __schedule at ffffffff8150a954
  #1 [ffff8801a7056fb8] schedule at ffffffff8150b0ef
  #2 [ffff8801a7056fc8] schedule_timeout at ffffffff8150b59d
  #3 [ffff8801a7057078] wait_for_common at ffffffff8150af7a
  #4 [ffff8801a7057118] wait_for_completion at ffffffff8150b0ad
  #5 [ffff8801a7057128] __ocfs2_cluster_lock at ffffffffa03febd8 [ocfs2]
  #6 [ffff8801a7057288] ocfs2_inode_lock_full_nested at ffffffffa0404086 
[ocfs2]
  #7 [ffff8801a7057308] ocfs2_reserve_suballoc_bits at ffffffffa043f199 
[ocfs2]  <<<< takes global bit map mutex
  #8 [ffff8801a7057388] ocfs2_reserve_cluster_bitmap_bits at 
ffffffffa043f5f5 [ocfs2]
  #9 [ffff8801a70573b8] ocfs2_local_alloc_reserve_for_window at 
ffffffffa04198a4 [ocfs2]
#10 [ffff8801a7057418] ocfs2_local_alloc_slide_window at 
ffffffffa041b5a0 [ocfs2]
#11 [ffff8801a7057498] ocfs2_reserve_local_alloc_bits at 
ffffffffa041bc67 [ocfs2]
#12 [ffff8801a70574f8] ocfs2_reserve_clusters_with_limit at 
ffffffffa043e8c5 [ocfs2]
#13 [ffff8801a7057548] ocfs2_reserve_clusters at ffffffffa043f7f8 [ocfs2]
#14 [ffff8801a7057558] ocfs2_lock_allocators at ffffffffa044021e [ocfs2]
#15 [ffff8801a70575b8] ocfs2_write_begin_nolock at ffffffffa03ed713 [ocfs2]
#16 [ffff8801a7057758] ocfs2_write_begin at ffffffffa03ee0c6 [ocfs2]
#17 [ffff8801a70577e8] generic_perform_write at ffffffff8110ffc3
#18 [ffff8801a7057888] generic_file_buffered_write_iter at ffffffff811100eb
#19 [ffff8801a70578b8] ocfs2_file_write_iter at ffffffffa040a837 [ocfs2]
#20 [ffff8801a70579c8] ocfs2_file_aio_write at ffffffffa040af28 [ocfs2]
#21 [ffff8801a7057a38] do_sync_readv_writev at ffffffff8116cf7b
#22 [ffff8801a7057b48] do_readv_writev at ffffffff8116ddf4
#23 [ffff8801a7057c28] vfs_writev at ffffffff8116df46
#24 [ffff8801a7057c38] nfsd_vfs_write at ffffffffa036c7f3 [nfsd]
#25 [ffff8801a7057d08] nfsd_write at ffffffffa036de77 [nfsd]
#26 [ffff8801a7057d68] nfsd3_proc_write at ffffffffa03762ff [nfsd]
#27 [ffff8801a7057dd8] nfsd_dispatch at ffffffffa036743e [nfsd]
#28 [ffff8801a7057e18] svc_process_common at ffffffffa0278324 [sunrpc]
#29 [ffff8801a7057e98] svc_process at ffffffffa0278970 [sunrpc]
#30 [ffff8801a7057eb8] nfsd at ffffffffa0367b92 [nfsd]
#31 [ffff8801a7057ee8] kthread at ffffffff81090ec6
#32 [ffff8801a7057f48] kernel_thread_helper at ffffffff815166c4

Thanks
-Tariq

On 12/23/2015 01:31 AM, Joseph Qi wrote:> Hi Tariq,
> As I know, global bitmap inode mutex will be held when doing allocation.
> So how can it happen that EX and PR requests come concurrently int the
> same node?
>
> Thanks,
> Joseph
>
> On 2015/8/26 4:55, Tariq Saeed wrote:
>> Orabug: 20933419
>>
>> NFS on a 2 node ocfs2 cluster each node exporting dir. The lock causing
>> the hang is the global bit map inode lock.  Node 1 is master, has
>> the lock granted in PR mode; Node 2 is in the converting list (PR ->
>> EX). There are no holders of the lock on the master node so it should
>> downconvert to NL and grant EX to node 2 but that does not happen.
>> BLOCKED + QUEUED in lock res are set and it is on osb blocked list.
>> Threads are waiting in __ocfs2_cluster_lock on BLOCKED.  One thread
wants
>> EX, rest want PR. So it is as though the downconvert thread needs to be
>> kicked to complete the conv.
>>
>> The hang is caused by an EX req coming into  __ocfs2_cluster_lock on
>> the heels of a PR req after it sets BUSY (drops l_lock, releasing EX
>> thread), forcing the incoming EX to wait on BUSY without doing
anything.
>> PR has called ocfs2_dlm_lock, which  sets the node 1 lock from NL ->
>> PR, queues ast.
>>
>> At this time, upconvert (PR ->EX) arrives from node 2, finds
conflict with
>> node 1 lock in PR, so the lock res is put on dlm thread's dirty
listt.
>>
>> After ret from ocf2_dlm_lock, PR thread now waits behind EX on BUSY
till
>> awoken by ast.
>>
>> Now it is dlm_thread that serially runs dlm_shuffle_lists, ast,  bast,
>> in that order.  dlm_shuffle_lists ques a bast on behalf of node 2
>> (which will be run by dlm_thread right after the ast).  ast does its
>> part, sets UPCONVERT_FINISHING, clears BUSY and wakes its waiters.
Next,
>> dlm_thread runs  bast. It sets BLOCKED and kicks dc thread.  dc thread
>> runs ocfs2_unblock_lock, but since UPCONVERT_FINISHING set, skips doing
>> anything and reques.
>>
>> Inside of __ocfs2_cluster_lock, since EX has been waiting on BUSY ahead
>> of PR, it wakes up first, finds BLOCKED set and skips doing anything
>> but clearing UPCONVERT_FINISHING (which was actually "meant"
for the
>> PR thread), and this time waits on BLOCKED.  Next, the PR thread comes
>> out of wait but since UPCONVERT_FINISHING is not set, it skips updating
>> the l_ro_holders and goes straight to wait on BLOCKED. So there, we
>> have a hang! Threads in __ocfs2_cluster_lock wait on BLOCKED, lock
>> res in osb blocked list. Only when dc thread is awoken, it will run
>> ocfs2_unblock_lock and things will unhang.
>>
>> One way to fix this is to wake the dc thread on the flag after clearing
>> UPCONVERT_FINISHING
>>
>> Signed-off-by: Tariq Saeed <tariq.x.saeed at oracle.com>
>> Reviewed-by: Wengang Wang <wen.gang.wang at oracle.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar at
oracle.com>
>> ---
>>   fs/ocfs2/dlmglue.c |    6 ++++++
>>   1 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
>> index 8b23aa2..313c816 100644
>> --- a/fs/ocfs2/dlmglue.c
>> +++ b/fs/ocfs2/dlmglue.c
>> @@ -1390,6 +1390,7 @@ static int __ocfs2_cluster_lock(struct
ocfs2_super *osb,
>>   	unsigned int gen;
>>   	int noqueue_attempted = 0;
>>   	int dlm_locked = 0;
>> +	int kick_dc = 0;
>>   
>>   	if (!(lockres->l_flags & OCFS2_LOCK_INITIALIZED)) {
>>   		mlog_errno(-EINVAL);
>> @@ -1524,7 +1525,12 @@ update_holders:
>>   unlock:
>>   	lockres_clear_flags(lockres, OCFS2_LOCK_UPCONVERT_FINISHING);
>>   
>> +	/* ocfs2_unblock_lock reques on seeing OCFS2_LOCK_UPCONVERT_FINISHING
*/
>> +	kick_dc = (lockres->l_flags & OCFS2_LOCK_BLOCKED);
>> +
>>   	spin_unlock_irqrestore(&lockres->l_lock, flags);
>> +	if (kick_dc)
>> +		ocfs2_wake_downconvert_thread(osb);
>>   out:
>>   	/*
>>   	 * This is helping work around a lock inversion between the page
lock
>>
>

Ocfs2 devel - Dec 2015 - [PATCH] NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

[Ocfs2-devel] [PATCH] NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock

[Ocfs2-devel] [PATCH] NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock