thr3ads.net - Lustre discuss - [Lustre-discuss] System Deadlock [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Roger Spellman

2011-Aug-17 20:23 UTC

[Lustre-discuss] System Deadlock

Hi,

I am in the process of porting Lustre client 1.8.4 to a recent kernel,
2.6.38.8.  This has been a challenge for a variety of reasons, such as
the dcache_lock being removed from the kernel.  I am pretty close to
having it working, but I still can generate system lockup. So far, I
have only tested Lustre over TCP over ethernet.

 

Using magic-sysrq, I was able to get a call trace.  But, I am not sure
if I fully understand it.  Can someone please validate my analysis?  

 

Here is one of the running threads:

 

rm              R  running task        0  2039   2030 0x00000088

 ffff88011cfbb578 ffffffffa059c8df 0000000000000000 0000000000000000

 ffff88011caf8880 000000000001ce7d 00000000000000c1 0000000000000000

 ffff88011cfbb548 ffffffffa06bfe34 0000000000000000 ffff88011ce9c800

Call Trace:

 [<ffffffffa059c8df>] ? LNetMDUnlink+0x6f/0x110 [lnet]

 [<ffffffffa06bfe34>] ? lustre_msg_get_slv+0x94/0x100 [ptlrpc]

 [<ffffffff8104ce53>] ? __wake_up+0x53/0x70

 [<ffffffffa0698c29>] ? ldlm_completion_ast+0x349/0x8d0 [ptlrpc]

 [<ffffffffa067af48>] ? ldlm_lock_enqueue+0x228/0xbb0 [ptlrpc]

 [<ffffffffa0675148>] ? lock_res_and_lock+0x58/0xe0 [ptlrpc]

 [<ffffffffa067b9fd>] ? ldlm_lock_change_resource+0x12d/0x3f0 [ptlrpc]

 [<ffffffffa067e6f9>] ? ldlm_resource_get+0xe9/0xc00 [ptlrpc]

 [<ffffffffa067e1c3>] ? ldlm_resource_putref+0x73/0x430 [ptlrpc]

 [<ffffffffa067c7b3>] ? ldlm_lock_match+0x273/0x8f0 [ptlrpc]

 [<ffffffff8116b342>] ? find_inode+0x62/0xb0

 [<ffffffffa097dea2>] ? ll_update_inode+0x3a2/0x1140 [lustre]

 [<ffffffffa099d150>] ? fid_test_inode+0x0/0x80 [lustre]

 [<ffffffff8116c766>] ? ifind+0x66/0xc0

 [<ffffffffa099d150>] ? fid_test_inode+0x0/0x80 [lustre]

 [<ffffffffa06764d1>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0 [ptlrpc]

 [<ffffffffa0676846>] ? ldlm_lock_add_to_lru+0x46/0x110 [ptlrpc]

 [<ffffffffa067e6f9>] ? ldlm_resource_get+0xe9/0xc00 [ptlrpc]

 [<ffffffffa067664d>] ? ldlm_lock_remove_from_lru_nolock+0x3d/0xe0
[ptlrpc]

 [<ffffffffa0676951>] ? ldlm_lock_remove_from_lru+0x41/0x110 [ptlrpc]

 [<ffffffffa067e1c3>] ? ldlm_resource_putref+0x73/0x430 [ptlrpc]

 [<ffffffffa0676a41>] ? ldlm_lock_addref_internal_nolock+0x21/0xa0
[ptlrpc]

 [<ffffffffa0677866>] ? search_queue+0xc6/0x170 [ptlrpc]

 [<ffffffffa067c62a>] ? ldlm_lock_match+0xea/0x8f0 [ptlrpc]

 [<ffffffffa043369e>] ? cfs_free+0xe/0x10 [libcfs]

 [<ffffffffa06aedad>] ? __ptlrpc_req_finished+0x59d/0xb30 [ptlrpc]

 [<ffffffffa09579f0>] ? ll_lookup_finish_locks+0x80/0x140 [lustre]

 [<ffffffffa06764d1>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0 [ptlrpc]

 [<ffffffffa0676846>] ? ldlm_lock_add_to_lru+0x46/0x110 [ptlrpc]

 [<ffffffffa067bfb0>] ? ldlm_lock_decref_internal+0x2f0/0x880 [ptlrpc]

 [<ffffffffa0678b8f>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc]

 [<ffffffffa0678b8f>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc]

 [<ffffffffa067cfa1>] ? ldlm_lock_decref+0x41/0xb0 [ptlrpc]

 [<ffffffffa08e92f4>] ? mdc_set_lock_data+0xd4/0x270 [mdc]

 [<ffffffffa0957950>] ? ll_intent_drop_lock+0xa0/0xc0 [lustre]

 [<ffffffff8116966a>] ? d_kill+0xaa/0x110

 [<ffffffffa09579f0>] ? ll_lookup_finish_locks+0x80/0x140 [lustre]

 [<ffffffffa099d7ee>] ? ll_prepare_mdc_op_data+0xbe/0x120 [lustre]

 [<ffffffffa04390c3>] ? ts_kernel_list_record_file_line+0x123/0x3c0
[libcfs]

 [<ffffffffa0963f41>] ? __ll_inode_revalidate_it+0x191/0x6e0 [lustre]

 [<ffffffffa099d990>] ? ll_mdc_blocking_ast+0x0/0x890 [lustre]

 [<ffffffff8116a03b>] ? dput+0x9b/0x190

 [<ffffffff8108854f>] ? up+0x2f/0x50

 [<ffffffff814abb8e>] ? common_interrupt+0xe/0x13

 [<ffffffffa099c9bb>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre]

 [<ffffffff8116fb1f>] ? mntput_no_expire+0x4f/0x1c0

 [<ffffffff8116fcad>] ? mntput+0x1d/0x30

 [<ffffffff8115d8b2>] ? path_put+0x22/0x30

 [<ffffffff81157b23>] ? vfs_fstatat+0x73/0x80

 [<ffffffff81157b54>] ? sys_newfstatat+0x24/0x50

 [<ffffffff8100bf82>] ? system_call_fastpath+0x16/0x1b

 

Am I to understand that mntput_no_expire was running, then an interrupt
was called (i.e. common_interrupt).  This then made some calls,
including a call to ll_mdc_blocking_ast.  Is that right?

 

Is ll_mdc_blocking_ast supposed to run under an interrupt ???

 

Look how deep the stack goes on after that.  No wonder there is a
lockup!  Normally, I would expect an interrupt service routine to do
something quickly, and then start a worker thread if something more
substantial needed to be done.

 

Perhaps I am completely misunderstanding this stack trace.  Can someone
please advise me?

 

Also, how do I interpret a line like this:

 

[<ffffffffa099d990>] ? ll_mdc_blocking_ast+0x0/0x890 [lustre]


Why is there a question mark?  What are the two addresses separated by
the slash?

 

Thanks.

 

Roger Spellman

Staff Engineer

Terascala, Inc.

508-588-1501

www.terascala.com http://www.terascala.com/

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110817/c4656d33/attachment.html

Andreas Dilger

2011-Aug-18 04:10 UTC

head link

[Lustre-discuss] [wc-discuss] System Deadlock

On 2011-08-17, at 2:23 PM, Roger Spellman wrote:> I am in the process of porting Lustre client 1.8.4 to a recent kernel,
2.6.38.8.
That is somewhat an unfortunate starting point, since 1.8.6 clients at least
work with 2.6.32 kernels.
>  This has been a challenge for a variety of reasons, such as the
dcache_lock being removed from the kernel.
Yes, the dcache change is a major obstacle to newer kernels.
>  I am pretty close to having it working, but I still can generate system
lockup. So far, I have only tested Lustre over TCP over ethernet.
It''s difficult to make any kind of assessment without knowing what
changes you have made to the client.  It would be useful if you would submit a
series of patches so that we can take a look at your patches.
> Using magic-sysrq, I was able to get a call trace.  But, I am not sure if I
fully understand it.  Can someone please validate my analysis?
>  
> Here is one of the running threads:
>  
> rm              R  running task        0  2039   2030 0x00000088
>  ffff88011cfbb578 ffffffffa059c8df 0000000000000000 0000000000000000
>  ffff88011caf8880 000000000001ce7d 00000000000000c1 0000000000000000
>  ffff88011cfbb548 ffffffffa06bfe34 0000000000000000 ffff88011ce9c800
> Call Trace:
>  [<ffffffffa059c8df>] ? LNetMDUnlink+0x6f/0x110 [lnet]
>  [<ffffffffa06bfe34>] ? lustre_msg_get_slv+0x94/0x100 [ptlrpc]
>  [<ffffffff8104ce53>] ? __wake_up+0x53/0x70
>  [<ffffffffa0698c29>] ? ldlm_completion_ast+0x349/0x8d0 [ptlrpc]
>  [<ffffffffa067af48>] ? ldlm_lock_enqueue+0x228/0xbb0 [ptlrpc]
>  [<ffffffffa0675148>] ? lock_res_and_lock+0x58/0xe0 [ptlrpc]
>  [<ffffffffa067b9fd>] ? ldlm_lock_change_resource+0x12d/0x3f0
[ptlrpc]
>  [<ffffffffa067e6f9>] ? ldlm_resource_get+0xe9/0xc00 [ptlrpc]
>  [<ffffffffa067e1c3>] ? ldlm_resource_putref+0x73/0x430 [ptlrpc]
>  [<ffffffffa067c7b3>] ? ldlm_lock_match+0x273/0x8f0 [ptlrpc]
>  [<ffffffff8116b342>] ? find_inode+0x62/0xb0
>  [<ffffffffa097dea2>] ? ll_update_inode+0x3a2/0x1140 [lustre]
>  [<ffffffffa099d150>] ? fid_test_inode+0x0/0x80 [lustre]
>  [<ffffffff8116c766>] ? ifind+0x66/0xc0
>  [<ffffffffa099d150>] ? fid_test_inode+0x0/0x80 [lustre]
>  [<ffffffffa06764d1>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0
[ptlrpc]
>  [<ffffffffa0676846>] ? ldlm_lock_add_to_lru+0x46/0x110 [ptlrpc]
>  [<ffffffffa067e6f9>] ? ldlm_resource_get+0xe9/0xc00 [ptlrpc]
>  [<ffffffffa067664d>] ? ldlm_lock_remove_from_lru_nolock+0x3d/0xe0
[ptlrpc]
>  [<ffffffffa0676951>] ? ldlm_lock_remove_from_lru+0x41/0x110 [ptlrpc]
>  [<ffffffffa067e1c3>] ? ldlm_resource_putref+0x73/0x430 [ptlrpc]
>  [<ffffffffa0676a41>] ? ldlm_lock_addref_internal_nolock+0x21/0xa0
[ptlrpc]
>  [<ffffffffa0677866>] ? search_queue+0xc6/0x170 [ptlrpc]
>  [<ffffffffa067c62a>] ? ldlm_lock_match+0xea/0x8f0 [ptlrpc]
>  [<ffffffffa043369e>] ? cfs_free+0xe/0x10 [libcfs]
>  [<ffffffffa06aedad>] ? __ptlrpc_req_finished+0x59d/0xb30 [ptlrpc]
>  [<ffffffffa09579f0>] ? ll_lookup_finish_locks+0x80/0x140 [lustre]
>  [<ffffffffa06764d1>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0
[ptlrpc]
>  [<ffffffffa0676846>] ? ldlm_lock_add_to_lru+0x46/0x110 [ptlrpc]
>  [<ffffffffa067bfb0>] ? ldlm_lock_decref_internal+0x2f0/0x880
[ptlrpc]
>  [<ffffffffa0678b8f>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc]
>  [<ffffffffa0678b8f>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc]
>  [<ffffffffa067cfa1>] ? ldlm_lock_decref+0x41/0xb0 [ptlrpc]
>  [<ffffffffa08e92f4>] ? mdc_set_lock_data+0xd4/0x270 [mdc]
>  [<ffffffffa0957950>] ? ll_intent_drop_lock+0xa0/0xc0 [lustre]
>  [<ffffffff8116966a>] ? d_kill+0xaa/0x110
>  [<ffffffffa09579f0>] ? ll_lookup_finish_locks+0x80/0x140 [lustre]
>  [<ffffffffa099d7ee>] ? ll_prepare_mdc_op_data+0xbe/0x120 [lustre]
>  [<ffffffffa04390c3>] ? ts_kernel_list_record_file_line+0x123/0x3c0
[libcfs]
>  [<ffffffffa0963f41>] ? __ll_inode_revalidate_it+0x191/0x6e0 [lustre]
>  [<ffffffffa099d990>] ? ll_mdc_blocking_ast+0x0/0x890 [lustre]
>  [<ffffffff8116a03b>] ? dput+0x9b/0x190
>  [<ffffffff8108854f>] ? up+0x2f/0x50
>  [<ffffffff814abb8e>] ? common_interrupt+0xe/0x13
>  [<ffffffffa099c9bb>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre]
>  [<ffffffff8116fb1f>] ? mntput_no_expire+0x4f/0x1c0
>  [<ffffffff8116fcad>] ? mntput+0x1d/0x30
>  [<ffffffff8115d8b2>] ? path_put+0x22/0x30
>  [<ffffffff81157b23>] ? vfs_fstatat+0x73/0x80
>  [<ffffffff81157b54>] ? sys_newfstatat+0x24/0x50
>  [<ffffffff8100bf82>] ? system_call_fastpath+0x16/0x1b
>  
> Am I to understand that mntput_no_expire was running, then an interrupt was
called (i.e. common_interrupt).  This then made some calls, including a call to
ll_mdc_blocking_ast.  Is that right?
>  
> Is ll_mdc_blocking_ast supposed to run under an interrupt ???
No, the Linux stack traces are terrible, they just print anything that looks
like the address of a kernel or module function.  That includes function
addresses that are passed as function parameters, such as callback functions. 
It must have hit an interrupt at one point, but I think it is just random
garbage on the stack.
> Look how deep the stack goes on after that.  No wonder there is a lockup! 
Normally, I would expect an interrupt service routine to do something quickly,
and then start a worker thread if something more substantial needed to be done.
>  
> Perhaps I am completely misunderstanding this stack trace.  Can someone
please advise me?
>  
> Also, how do I interpret a line like this:
>  
> [<ffffffffa099d990>] ? ll_mdc_blocking_ast+0x0/0x890 [lustre]
> 
> Why is there a question mark?
Not sure.
>  What are the two addresses separated by the slash?
That is the relative location within the function.  In this case I think this is
a function passed as a parameter, since it can hardly have called another
function at 0 bytes offset.  Another example shows:
> ll_lookup_finish_locks+0x80/0x140
which means it is 0x80 bytes into the 0x140 bytes of this function.  This can be
useful with gdb:

# gdb /lib/modules/2.6.32.26-175.fc12.x86_64/updates/kernel/fs/lustre/lustre.ko
gdb> list *(ll_lookup_finish_locks+0x80)
0x1900 is in ll_lookup_finish_locks
(/root/rpmbuild/BUILD/lustre-1.8.6/lustre/llite/dcache.c:361)

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.

Roger Spellman

2011-Aug-18 13:50 UTC

head link

[Lustre-discuss] [wc-discuss] System Deadlock

Andreas, 
Thanks for you reply.  It was very helpful.  See my responses, below.
> > I am in the process of porting Lustre client 1.8.4 to a recent
kernel,> 2.6.38.8.
> 
> That is somewhat an unfortunate starting point, since 1.8.6 clients at
> least work with 2.6.32 kernels.
I understand.  I started this project before 1.8.6 came out, and I
wanted to stick with 1.8.4, in case any problems came up with 1.8.6.  As
soon as I am done with 1.8.4, I will port my patch to 1.8.6.  
> It''s difficult to make any kind of assessment without knowing what
changes> you have made to the client.  It would be useful if you would submit a
> series of patches so that we can take a look at your patches.
My plan was to get it working, then post the patch to anyone who wanted
it.  That should be pretty soon.  I''m assuming that other people are
wanting to run Lustre with recent kernels.
> No, the Linux stack traces are terrible, they just print anything that
> looks like the address of a kernel or module function.  That includes
> function addresses that are passed as function parameters, such as
> callback functions.  It must have hit an interrupt at one point, but I
> think it is just random garbage on the stack.
Too bad.  I compiled the Kernel with Frame Pointers, so I hoped that the
kernel could unwind the stack properly.  

Now that I know to ignore the Stack Trace, I can instrument the code to
track down this problem. 

Thanks.

Roger Spellman
Staff Engineer
Terascala, Inc.
508-588-1501
www.terascala.com <http://www.terascala.com/>

Andreas Dilger

2011-Aug-18 17:09 UTC

head link

[Lustre-discuss] [wc-discuss] System Deadlock

On 2011-08-18, at 7:50 AM, "Roger Spellman" <Roger.Spellman at
terascala.com> wrote:
> Andreas, 
> Thanks for you reply.  It was very helpful.  See my responses, below.
> 
>>> I am in the process of porting Lustre client 1.8.4 to a recent
> kernel,
>> 2.6.38.8.
>> 
>> That is somewhat an unfortunate starting point, since 1.8.6 clients at
>> least work with 2.6.32 kernels.
> 
> I understand.  I started this project before 1.8.6 came out, and I
> wanted to stick with 1.8.4, in case any problems came up with 1.8.6.  As
> soon as I am done with 1.8.4, I will port my patch to 1.8.6.  
> 
>> It''s difficult to make any kind of assessment without knowing
what
> changes
>> you have made to the client.  It would be useful if you would submit a
>> series of patches so that we can take a look at your patches.
> 
> My plan was to get it working, then post the patch to anyone who wanted
> it.  That should be pretty soon.  I''m assuming that other people
are
> wanting to run Lustre with recent kernels.
> 
>> No, the Linux stack traces are terrible, they just print anything that
>> looks like the address of a kernel or module function.  That includes
>> function addresses that are passed as function parameters, such as
>> callback functions.  It must have hit an interrupt at one point, but I
>> think it is just random garbage on the stack.
> 
> Too bad.  I compiled the Kernel with Frame Pointers, so I hoped that the
> kernel could unwind the stack properly.  
That helps, but AFAIK it isn''t 100% correct even then. 
> Now that I know to ignore the Stack Trace, I can instrument the code to
> track down this problem. 
I don''t think you need to ignore the stack, just treat it with caution
and look for a valid callpath through the listed functions.

Cheers, Andreas

Lustre discuss - Aug 2011 - System Deadlock

[Lustre-discuss] System Deadlock

[Lustre-discuss] [wc-discuss] System Deadlock

[Lustre-discuss] [wc-discuss] System Deadlock

[Lustre-discuss] [wc-discuss] System Deadlock