This email is also sent to cluster-devel at redhat.com. Since this issue is about both dlm and ocfs2, I send the email here to look for help from upstream. This is an already known issue. on ocfs2 with user space cluster stack, run the test script from http://people.redhat.com/~teigland/make_panic on the mounted ocfs2 volume from 2 nodes simultaneously, the access to ocfs2 volume on both nodes will get hung. This issue also described in Novell bugzilla #492055 (https://bugzilla.novell.com/show_bug.cgi?id=492055). Now on upstream kernel, the dead-hang is not reproduced. But the accessing will still get blocked time to time. Blocked time to time, means make_panic can run for several minutes, then get blocked on both nodes. The blocking time is variable, from dozens of seconds to dozens of minutes. The longest time I observed is 25 minutes. Then make_panic on both nodes continue to run. Also I observed, when run make_panic under same directory of the ocfs2 volume from both nodes, the chance to reproduce the blocking issue will increase a lot. In further debugging, I added some printk information in fs/ocfs2/dlmglue.c, and did some statistic. Here is the statistic info for 4 seconds when both nodes gets blocked: Here is a statistic info on the frequency of each functions got called during the 4 seconds, 1352 lockres_set_flags 728 lockres_or_flags 624 lockres_clear_flags 312 __lockres_clear_pending 213 ocfs2_process_blocked_lock 213 ocfs2_locking_ast 213 ocfs2_downconvert_thread_do_work 213 lockres_set_pending 213 lockres_clear_pending 213 lockres_add_mask_waiter 156 ocfs2_prepare_downconvert 156 ocfs2_blocking_ast 104 ocfs2_unblock_lock 104 ocfs2_schedule_blocked_lock 104 ocfs2_generic_handle_downconvert_action 104 ocfs2_generic_handle_convert_action 104 ocfs2_generic_handle_bast 104 ocfs2_downconvert_thread 104 ocfs2_downconvert_lock 104 ocfs2_data_convert_worker 104 ocfs2_cluster_lock>From above data, I can see lockres_set_flags gets called for 1352 times in the 4seconds, then it's lockres_or_flags for 728 times and lockres_clear_flags for 624 times. When I add more printk inside the code, the blocking will very hard to reproduce. Therefore, I suspect there is kind of race inside. I work on this issue for quite many days, still no idea how this issue comes and how to fix it. Many people here might know this issue already, wish upstream developers can watch on it and provide hints on the fix. Thanks in advance. -- Coly Li SuSE Labs
On Wed, Aug 19, 2009 at 03:26:59AM +0800, Coly Li wrote:> When I add more printk inside the code, the blocking will very hard to > reproduce. Therefore, I suspect there is kind of race inside. > > I work on this issue for quite many days, still no idea how this issue comes > and how to fix it. Many people here might know this issue already, wish > upstream developers can watch on it and provide hints on the fix.I've tried to dig into this a couple of times, and each time get stopped trying to sort out the dlmglue.c logic; I think that's where the problem lies. Dave
Read this thread for some background. There are others like this. http://oss.oracle.com/pipermail/ocfs2-devel/2009-April/004313.html David had run into a similar issue with two nodes. The symptoms were the same. In that case, we were failing to kick the downconvert thread under one situation. Bottomline, the reason for the hang is that a node is not downconverting its lock. It could be a race in dlmglue or something else. The node has a PR and an another nodes wants an EX. Unless the node downconverts to a NL, the master cannot upconvert the other node to EX. Hang. Also, cancel converts are in the mix. For example: Lockres: M0000000000000000046e2a00000000 Mode: Protected Read Flags: Initialized Attached Blocked Needs Refresh Queued RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Protected Read Blocking Mode: Exclusive PR > Gets: 463 Fails: 0 Waits (usec) Total: 1052625 Max: 41985 EX > Gets: 37 Fails: 0 Waits (usec) Total: 990652 Max: 79971 Disk Refreshes: 0 You can see the lockres has a PR and is blocking an EX. The downconvert thread should downconvert it to a NL. One reason it won't is if there are any holders. But we can see 0 RO and EX holders. DownCnvt => Pid: 2759 Count: 1 WakeSeq: 14838 WorkSeq: 14838 The downcnvt shows 1 lockres is queued. We have to assume it is this one. If not, then we have a bigger problem. Maybe add a quick/dirty hack to dump the lockres in this queue. Maybe we are forgetting to kick it like last time. I did scan the code for that but came up empty handed. To solve this mystery, you have to find out as to why the dc thread is not acting on the lockres. Forget stats. Just add printks in that thread. Starting from say ocfs2_downconvert_thread_do_work(). Sunil Coly Li wrote:> This email is also sent to cluster-devel at redhat.com. Since this issue is about > both dlm and ocfs2, I send the email here to look for help from upstream. > > This is an already known issue. > > on ocfs2 with user space cluster stack, run the test script from > http://people.redhat.com/~teigland/make_panic on the mounted ocfs2 volume from 2 > nodes simultaneously, the access to ocfs2 volume on both nodes will get hung. > > This issue also described in Novell bugzilla #492055 > (https://bugzilla.novell.com/show_bug.cgi?id=492055). Now on upstream kernel, > the dead-hang is not reproduced. But the accessing will still get blocked time > to time. > > Blocked time to time, means make_panic can run for several minutes, then get > blocked on both nodes. The blocking time is variable, from dozens of seconds to > dozens of minutes. The longest time I observed is 25 minutes. Then make_panic on > both nodes continue to run. > > Also I observed, when run make_panic under same directory of the ocfs2 volume > from both nodes, the chance to reproduce the blocking issue will increase a lot. > > In further debugging, I added some printk information in fs/ocfs2/dlmglue.c, and > did some statistic. Here is the statistic info for 4 seconds when both nodes > gets blocked: > Here is a statistic info on the frequency of each functions got called during > the 4 seconds, > 1352 lockres_set_flags > 728 lockres_or_flags > 624 lockres_clear_flags > 312 __lockres_clear_pending > 213 ocfs2_process_blocked_lock > 213 ocfs2_locking_ast > 213 ocfs2_downconvert_thread_do_work > 213 lockres_set_pending > 213 lockres_clear_pending > 213 lockres_add_mask_waiter > 156 ocfs2_prepare_downconvert > 156 ocfs2_blocking_ast > 104 ocfs2_unblock_lock > 104 ocfs2_schedule_blocked_lock > 104 ocfs2_generic_handle_downconvert_action > 104 ocfs2_generic_handle_convert_action > 104 ocfs2_generic_handle_bast > 104 ocfs2_downconvert_thread > 104 ocfs2_downconvert_lock > 104 ocfs2_data_convert_worker > 104 ocfs2_cluster_lock > > >From above data, I can see lockres_set_flags gets called for 1352 times in the 4 > seconds, then it's lockres_or_flags for 728 times and lockres_clear_flags for > 624 times. > > When I add more printk inside the code, the blocking will very hard to > reproduce. Therefore, I suspect there is kind of race inside. > > I work on this issue for quite many days, still no idea how this issue comes and > how to fix it. Many people here might know this issue already, wish upstream > developers can watch on it and provide hints on the fix. > > Thanks in advance. >
Sunil Mushran Wrote:> Read this thread for some background. There are others like this. > http://oss.oracle.com/pipermail/ocfs2-devel/2009-April/004313.html > > David had run into a similar issue with two nodes. The symptoms were the > same. In that case, we were failing to kick the downconvert thread under > one situation. > > Bottomline, the reason for the hang is that a node is not downconverting > its lock. It could be a race in dlmglue or something else. > > The node has a PR and an another nodes wants an EX. Unless the node > downconverts > to a NL, the master cannot upconvert the other node to EX. Hang. Also, > cancel > converts are in the mix. > [snip] > The downcnvt shows 1 lockres is queued. We have to assume it is this one. > If not, then we have a bigger problem. Maybe add a quick/dirty hack to dump > the lockres in this queue. > > Maybe we are forgetting to kick it like last time. I did scan the code > for that but came up empty handed. > > To solve this mystery, you have to find out as to why the dc thread is > not acting on the lockres. Forget stats. Just add printks in that thread. > Starting from say ocfs2_downconvert_thread_do_work().I simplified the original perl script to a simple bash script, --------------------------------- #!/bin/sh prefix=`hostname` i=1 while [ 1 ];do f="$prefix"_"$i" echo $f touch $f i=`expr $i + 1` if [ $i -ge 1000 ];then i=1 rm -f "$prefix"_* fi done --------------------------------- Run the above script on both nodes can also reproduce the blocking issue. When the blocking happens, ocfs2_downconvert_thread_do_work() still gets called again and again. I add a printk to display osb->blocked_lock_count before the while(1) loop inside ocfs2_downconvert_thread_do_work(). Here is what I observed, 1) Before the blocking happens, the number sequence is, ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 2 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 (the count could be 1, 0, 2 and in an irregular sequence) 2) when the blocking happens, the number sequence of osb->blocked_lock_count is always like this, ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 (all are 0-1-0-1-0-1-... in a regular sequence) Continue to track... -- Coly Li SuSE Labs
So the thread is removing the lockres from the list (thus making the count 0), and then calling ocfs2_process_blocked_lock() which is adding that lockres back in the list (thus 1). Trace ocfs2_process_blocked_lock() to see as to why it is putting it back on the list. Coly Li wrote:> > Sunil Mushran Wrote: >> Read this thread for some background. There are others like this. >> http://oss.oracle.com/pipermail/ocfs2-devel/2009-April/004313.html >> >> David had run into a similar issue with two nodes. The symptoms were the >> same. In that case, we were failing to kick the downconvert thread under >> one situation. >> >> Bottomline, the reason for the hang is that a node is not downconverting >> its lock. It could be a race in dlmglue or something else. >> >> The node has a PR and an another nodes wants an EX. Unless the node >> downconverts >> to a NL, the master cannot upconvert the other node to EX. Hang. Also, >> cancel >> converts are in the mix. >> [snip] >> The downcnvt shows 1 lockres is queued. We have to assume it is this one. >> If not, then we have a bigger problem. Maybe add a quick/dirty hack to dump >> the lockres in this queue. >> >> Maybe we are forgetting to kick it like last time. I did scan the code >> for that but came up empty handed. >> >> To solve this mystery, you have to find out as to why the dc thread is >> not acting on the lockres. Forget stats. Just add printks in that thread. >> Starting from say ocfs2_downconvert_thread_do_work(). >> > > I simplified the original perl script to a simple bash script, > --------------------------------- > #!/bin/sh > > prefix=`hostname` > i=1 > while [ 1 ];do > f="$prefix"_"$i" > echo $f > touch $f > i=`expr $i + 1` > if [ $i -ge 1000 ];then > i=1 > rm -f "$prefix"_* > fi > done > --------------------------------- > > Run the above script on both nodes can also reproduce the blocking issue. > > When the blocking happens, ocfs2_downconvert_thread_do_work() still gets called > again and again. > > I add a printk to display osb->blocked_lock_count before the while(1) loop > inside ocfs2_downconvert_thread_do_work(). > > Here is what I observed, > 1) Before the blocking happens, the > number sequence is, > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 2 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > (the count could be 1, 0, 2 and in an irregular sequence) > > 2) when the blocking happens, the number sequence of osb->blocked_lock_count is > always like this, > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 > ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 > (all are 0-1-0-1-0-1-... in a regular sequence) > > Continue to track... > >
Sunil Mushran Wrote:> So the thread is removing the lockres from the list (thus making > the count 0), and then calling ocfs2_process_blocked_lock() which > is adding that lockres back in the list (thus 1). > > Trace ocfs2_process_blocked_lock() to see as to why it is putting it > back on the list. >> 2) when the blocking happens, the number sequence of >> osb->blocked_lock_count is >> always like this, >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >> (all are 0-1-0-1-0-1-... in a regular sequence) >>I also observed even when no file is created or unlinked, I did nothing. From dmesg output, the above information still infinite running, on both nodes. -- Coly Li SuSE Labs
You will have to trace thru process_blocked_lock() to make sense of this. Coly Li wrote:> Sunil Mushran Wrote: > >> So the thread is removing the lockres from the list (thus making >> the count 0), and then calling ocfs2_process_blocked_lock() which >> is adding that lockres back in the list (thus 1). >> >> Trace ocfs2_process_blocked_lock() to see as to why it is putting it >> back on the list. >> >>> 2) when the blocking happens, the number sequence of >>> osb->blocked_lock_count is >>> always like this, >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 0 >>> ocfs2_downconvert_thread_do_work:3725: osb->blocked_lock_count: 1 >>> (all are 0-1-0-1-0-1-... in a regular sequence) >>> >>> > > I also observed even when no file is created or unlinked, I did nothing. From > dmesg output, the above information still infinite running, on both nodes. > > > >
Thanks, Coly. So dlmglue calls dlmlock to downconvert the lock from EX => NL. However, in the next iteration, we see lock at PR. Strange. Anycase, I have created a patch that should provide more info. The patch is against the current mainline. It should apply against older trees. But may require a bit of tlc. Please re-run with the the following patch. http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch Coly Li wrote:> > Sunil Mushran Wrote: >> Can you send me the raw trace output. >> > Here is the compressed trace info. > > Thanks.
The full trace is available here. http://oss.oracle.com/~smushran/calltrace_x1 So one sees the following block repeated. It shows that the lock is being downconverted from EX to NL but also upconverted presumably to EX. =============================================================[412.378525] (12732,0):ocfs2_blocking_ast:1025 BAST fired for lockres M0000000000000000085e0200000000, blocking 5, level 5 type Meta [412.378532] (12731,0):ocfs2_process_blocked_lock:3839 lockres M0000000000000000085e0200000000 blocked. [412.378537] (12731,0):ocfs2_prepare_downconvert:3232 lock M0000000000000000085e0200000000, new_level = 0, l_blocking = 5 [412.378542] (12731,0):ocfs2_downconvert_lock:3252 lock M0000000000000000085e0200000000, level 5 => 0 [412.378554] (12731,0):__lockres_clear_pending:978 lock M0000000000000000085e0200000000 Clear; flg=0x557, lpg=268, gen=268 [412.378558] (12731,0):ocfs2_process_blocked_lock:3863 lockres M0000000000000000085e0200000000, requeue = no. [412.378578] (12732,0):ocfs2_locking_ast:1069 lock M0000000000000000085e0200000000, action 3, unlock 0 [412.378584] (12732,0):__lockres_clear_pending:972 lock M0000000000000000085e0200000000 Abort; flg=0x51, lpg=269, gen=269 [412.378596] (9370,0):__lockres_clear_pending:978 lock M0000000000000000085e0200000000 Clear; flg=0x453, lpg=269, gen=269 [412.578719] (12732,0):ocfs2_locking_ast:1069 lock M0000000000000000085e0200000000, action 2, unlock 0 [412.578727] (12732,0):__lockres_clear_pending:972 lock M0000000000000000085e0200000000 Abort; flg=0x51, lpg=270, gen=270 ============================================================= The BAST [412.378525] (12732,0):ocfs2_blocking_ast:1025 BAST fired for lockres M0000000000000000085e0200000000, blocking 5, level 5 type Meta [412.378532] (12731,0):ocfs2_process_blocked_lock:3839 lockres M0000000000000000085e0200000000 blocked. Lock is at EX (5) and new level is NL (0). [412.378537] (12731,0):ocfs2_prepare_downconvert:3232 lock M0000000000000000085e0200000000, new_level = 0, l_blocking = 5 [412.378542] (12731,0):ocfs2_downconvert_lock:3252 lock M0000000000000000085e0200000000, level 5 => 0 Flags: 0x557 => PENDING QUEUED INITIALIZED NEEDS_REFRESH BLOCKED BUSY ATTACHED [412.378554] (12731,0):__lockres_clear_pending:978 lock M0000000000000000085e0200000000 Clear; flg=0x557, lpg=268, gen=268 requeue no means there is no need to delay the downconvert [412.378558] (12731,0):ocfs2_process_blocked_lock:3863 lockres M0000000000000000085e0200000000, requeue = no. Action 3 means DOWNCONVERT [412.378578] (12732,0):ocfs2_locking_ast:1069 lock M0000000000000000085e0200000000, action 3, unlock 0 Flags: 0x51 => INITIALIZED NEEDS_REFRESH ATTACHED [412.378584] (12732,0):__lockres_clear_pending:972 lock M0000000000000000085e0200000000 Abort; flg=0x51, lpg=269, gen=269 Flags: 0x453 => PENDING INITIALIZED NEEDS_REFRESH BUSY ATTACHED [412.378596] (9370,0):__lockres_clear_pending:978 lock M0000000000000000085e0200000000 Clear; flg=0x453, lpg=269, gen=269 Action 2 means CONVERT [412.578719] (12732,0):ocfs2_locking_ast:1069 lock M0000000000000000085e0200000000, action 2, unlock 0 [412.578727] (12732,0):__lockres_clear_pending:972 lock M0000000000000000085e0200000000 Abort; flg=0x51, lpg=270, gen=270 A quick scan shows the CONVERT is consistently 200 usecs (??) after the DOWNCONVERT. Coli, Can you map the pids to the process names. Sunil Coly Li wrote:> Sunil Mushran Wrote: > >> Thanks. Can you email me the _full_ trace log. >> >> > Hi Sunil, > > Here is the full trace log from boot to shutdown, on both nodes. > > On node x1, I run 2 make_panic process (the simplified bash script). > On node x2, I run 1 make_panic process. > > The above operation can reproduce the blocking very fast. I wait for several > minutes, when no any new file created on both nodes. Then I shut the nodes. > > Please check the attachment for trace log. > > Thanks. >
Sunil Mushran Wrote:> The full trace is available here. > http://oss.oracle.com/~smushran/calltrace_x1 > > So one sees the following block repeated. It shows that the lock is > being downconverted from EX to NL but also upconverted presumably to EX. > > > Coli, Can you map the pids to the process names. >Hi Sunil, In the attached trace info, I add current->comm after the pid. Here is the steps I reproduce the blocking, - This time I only run 1 make_panic process on each node. - I ran make_panic on x1 firstly, then on x2. - On node x1, after creating 3-4 files, I start the make_panic script on node x2. - On node x2, make_panic blocked immediately, no file created. On node x1, after creating 13 files, make_panic blocked too. - After waiting for several minutes, still blocked. I stop gathering trace info. Please check the attachment. Thank. -- Coly Li SuSE Labs -------------- next part -------------- A non-text attachment was scrubbed... Name: calltrace_x1.bz2 Type: application/x-bzip Size: 46329 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20090915/f528a475/attachment-0002.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: calltrace_x2.bz2 Type: application/x-bzip Size: 44701 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20090915/f528a475/attachment-0003.bin
So originally my thinking was that the dc thread was not getting kicked. That is not the case. The lock is getting downconverted. But it is getting upconverted shortly thereafter. This just could be the case in which dlmglue is slow to increment the holders to block the dc thread from downconverting the lock. The snippet shows that BAST is received 16 usecs after the upconvert. Coly, I have another patch. Pop out the older patch before applying this one. http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch BAST: [368.807757] (2572,dlm_astd,0):ocfs2_blocking_ast:1025 BAST fired for lockres M0000000000000000085e0200000000, blocking 5, level 3 type Meta [368.807767] (2571,ocfs2dc,0):ocfs2_process_blocked_lock:3839 lockres M0000000000000000085e0200000000 blocked. [368.807774] (2571,ocfs2dc,0):ocfs2_prepare_downconvert:3232 lock M0000000000000000085e0200000000, new_level = 0, l_blocking = 5 [368.807779] (2571,ocfs2dc,0):ocfs2_downconvert_lock:3252 lock M0000000000000000085e0200000000, level 3 => 0 [368.807799] (2571,ocfs2dc,0):ocfs2_process_blocked_lock:3863 lockres M0000000000000000085e0200000000, requeue = no. Downconvert AST: [368.807806] (2572,dlm_astd,0):ocfs2_locking_ast:1069 lock M0000000000000000085e0200000000, action 3, unlock 0 Upconvert AST: [369.007930] (2572,dlm_astd,0):ocfs2_locking_ast:1069 lock M0000000000000000085e0200000000, action 2, unlock 0 BAST: [369.007946] (2572,dlm_astd,0):ocfs2_blocking_ast:1025 BAST fired for lockres M0000000000000000085e0200000000, blocking 5, level 3 type Meta [369.007956] (2571,ocfs2dc,0):ocfs2_process_blocked_lock:3839 lockres M0000000000000000085e0200000000 blocked. [369.007962] (2571,ocfs2dc,0):ocfs2_prepare_downconvert:3232 lock M0000000000000000085e0200000000, new_level = 0, l_blocking = 5 [369.007967] (2571,ocfs2dc,0):ocfs2_downconvert_lock:3252 lock M0000000000000000085e0200000000, level 3 => 0 [369.007987] (2571,ocfs2dc,0):ocfs2_process_blocked_lock:3863 lockres M0000000000000000085e0200000000, requeue = no. Downconvert AST: [369.007994] (2572,dlm_astd,0):ocfs2_locking_ast:1069 lock M0000000000000000085e0200000000, action 3, unlock 0 Upconvert AST: [369.208048] (2572,dlm_astd,0):ocfs2_locking_ast:1069 lock M0000000000000000085e0200000000, action 2, unlock 0 Coly Li wrote:> > Sunil Mushran Wrote: >> The full trace is available here. >> http://oss.oracle.com/~smushran/calltrace_x1 >> >> So one sees the following block repeated. It shows that the lock is >> being downconverted from EX to NL but also upconverted presumably to EX. >> >> >> Coli, Can you map the pids to the process names. >> > > Hi Sunil, > > In the attached trace info, I add current->comm after the pid. > > Here is the steps I reproduce the blocking, > - This time I only run 1 make_panic process on each node. > - I ran make_panic on x1 firstly, then on x2. > - On node x1, after creating 3-4 files, I start the make_panic script on node x2. > - On node x2, make_panic blocked immediately, no file created. On node x1, after > creating 13 files, make_panic blocked too. > - After waiting for several minutes, still blocked. I stop gathering trace info. > > Please check the attachment. Thank. >
The patch does not have a fix. Only tracing. We may have to disable a printk for the 2 node to reproduce. For the BUG, can I have the full logs. The oops trace and the tracing from all nodes. Thanks Sunil Coly Li wrote:> Hi Sunil, > > I tried this patch, on 2 nodes cluster, it works. No blocking observed so far. > Then I run it on a 4 nodes cluster, run make_panic on each node simultaneously, > and BUG inside ocfs2_prepare_downconvert() triggered (in line 3224) on one of > the nodes (I observed the oops on node x4), > > 3214 static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres, > 3215 int new_level) > 3216 { > 3217 assert_spin_locked(&lockres->l_lock); > 3218 > 3219 BUG_ON(lockres->l_blocking <= DLM_LOCK_NL); > 3220 > 3221 if (lockres->l_level <= new_level) { > 3222 mlog(ML_ERROR, "lockres->l_level (%d) <= new_level (%d)\n", > 3223 lockres->l_level, new_level); > 3224 BUG(); > 3225 } > 3226 > 3227 mlog(ML_NOTICE, "lock %s, new_level = %d, l_blocking = %d\n", > 3228 lockres->l_name, new_level, lockres->l_blocking); > 3229 > 3230 lockres->l_action = OCFS2_AST_DOWNCONVERT; > 3231 lockres->l_requested = new_level; > 3232 lockres_or_flags(lockres, OCFS2_LOCK_BUSY); > 3233 return lockres_set_pending(lockres); > 3234 } > > I am trying to understand what you did now :-) > > Sunil Mushran Wrote: > >> So originally my thinking was that the dc thread was not getting kicked. >> That is not the case. The lock is getting downconverted. But it is getting >> upconverted shortly thereafter. This just could be the case in which >> dlmglue >> is slow to increment the holders to block the dc thread from downconverting >> the lock. The snippet shows that BAST is received 16 usecs after the >> upconvert. >> >> Coly, I have another patch. Pop out the older patch before applying this >> one. >> http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch >> >>
Hi Sunil, I tried this patch, on 2 nodes cluster, it works. No blocking observed so far. Then I run it on a 4 nodes cluster, run make_panic on each node simultaneously, and BUG inside ocfs2_prepare_downconvert() triggered (in line 3224) on one of the nodes (I observed the oops on node x4), 3214 static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres, 3215 int new_level) 3216 { 3217 assert_spin_locked(&lockres->l_lock); 3218 3219 BUG_ON(lockres->l_blocking <= DLM_LOCK_NL); 3220 3221 if (lockres->l_level <= new_level) { 3222 mlog(ML_ERROR, "lockres->l_level (%d) <= new_level (%d)\n", 3223 lockres->l_level, new_level); 3224 BUG(); 3225 } 3226 3227 mlog(ML_NOTICE, "lock %s, new_level = %d, l_blocking = %d\n", 3228 lockres->l_name, new_level, lockres->l_blocking); 3229 3230 lockres->l_action = OCFS2_AST_DOWNCONVERT; 3231 lockres->l_requested = new_level; 3232 lockres_or_flags(lockres, OCFS2_LOCK_BUSY); 3233 return lockres_set_pending(lockres); 3234 } I am trying to understand what you did now :-) Sunil Mushran Wrote:> So originally my thinking was that the dc thread was not getting kicked. > That is not the case. The lock is getting downconverted. But it is getting > upconverted shortly thereafter. This just could be the case in which > dlmglue > is slow to increment the holders to block the dc thread from downconverting > the lock. The snippet shows that BAST is received 16 usecs after the > upconvert. > > Coly, I have another patch. Pop out the older patch before applying this > one. > http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch >-- Coly Li SuSE Labs
Please could you log a bugzilla (oss.oracle.com/bugzilla) and attach the logs to it. Sunil Mushran wrote:> The patch does not have a fix. Only tracing. We may have to disable > a printk for the 2 node to reproduce. > > For the BUG, can I have the full logs. The oops trace and the tracing > from all nodes. > > Thanks > Sunil > > Coly Li wrote: > >> Hi Sunil, >> >> I tried this patch, on 2 nodes cluster, it works. No blocking observed so far. >> Then I run it on a 4 nodes cluster, run make_panic on each node simultaneously, >> and BUG inside ocfs2_prepare_downconvert() triggered (in line 3224) on one of >> the nodes (I observed the oops on node x4), >> >> 3214 static unsigned int ocfs2_prepare_downconvert(struct ocfs2_lock_res *lockres, >> 3215 int new_level) >> 3216 { >> 3217 assert_spin_locked(&lockres->l_lock); >> 3218 >> 3219 BUG_ON(lockres->l_blocking <= DLM_LOCK_NL); >> 3220 >> 3221 if (lockres->l_level <= new_level) { >> 3222 mlog(ML_ERROR, "lockres->l_level (%d) <= new_level (%d)\n", >> 3223 lockres->l_level, new_level); >> 3224 BUG(); >> 3225 } >> 3226 >> 3227 mlog(ML_NOTICE, "lock %s, new_level = %d, l_blocking = %d\n", >> 3228 lockres->l_name, new_level, lockres->l_blocking); >> 3229 >> 3230 lockres->l_action = OCFS2_AST_DOWNCONVERT; >> 3231 lockres->l_requested = new_level; >> 3232 lockres_or_flags(lockres, OCFS2_LOCK_BUSY); >> 3233 return lockres_set_pending(lockres); >> 3234 } >> >> I am trying to understand what you did now :-) >> >> Sunil Mushran Wrote: >> >> >>> So originally my thinking was that the dc thread was not getting kicked. >>> That is not the case. The lock is getting downconverted. But it is getting >>> upconverted shortly thereafter. This just could be the case in which >>> dlmglue >>> is slow to increment the holders to block the dc thread from downconverting >>> the lock. The snippet shows that BAST is received 16 usecs after the >>> upconvert. >>> >>> Coly, I have another patch. Pop out the older patch before applying this >>> one. >>> http://oss.oracle.com/~smushran/0001-ocfs2-Patch-to-debug-hang-in-dlmglue-when-running-d.patch >>> >>> >>> > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel at oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-devel >
Sure, I do it now :-) Sunil Mushran Wrote:> Please could you log a bugzilla (oss.oracle.com/bugzilla) and attach > the logs to it. > > Sunil Mushran wrote: >> The patch does not have a fix. Only tracing. We may have to disable >> a printk for the 2 node to reproduce. >> >> For the BUG, can I have the full logs. The oops trace and the tracing >> from all nodes. >> >> Thanks >> Sunil-- Coly Li SuSE Labs
Hi Sunil, The oops is reported on http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178 dmesg info can be found from the attachment. Thanks. Sunil Mushran Wrote:> Please could you log a bugzilla (oss.oracle.com/bugzilla) and attach > the logs to it.-- Coly Li SuSE Labs