Hi, I am in the process of porting Lustre client 1.8.4 to a recent kernel, 2.6.38.8. This has been a challenge for a variety of reasons, such as the dcache_lock being removed from the kernel. I am pretty close to having it working, but I still can generate system lockup. So far, I have only tested Lustre over TCP over ethernet. Using magic-sysrq, I was able to get a call trace. But, I am not sure if I fully understand it. Can someone please validate my analysis? Here is one of the running threads: rm R running task 0 2039 2030 0x00000088 ffff88011cfbb578 ffffffffa059c8df 0000000000000000 0000000000000000 ffff88011caf8880 000000000001ce7d 00000000000000c1 0000000000000000 ffff88011cfbb548 ffffffffa06bfe34 0000000000000000 ffff88011ce9c800 Call Trace: [<ffffffffa059c8df>] ? LNetMDUnlink+0x6f/0x110 [lnet] [<ffffffffa06bfe34>] ? lustre_msg_get_slv+0x94/0x100 [ptlrpc] [<ffffffff8104ce53>] ? __wake_up+0x53/0x70 [<ffffffffa0698c29>] ? ldlm_completion_ast+0x349/0x8d0 [ptlrpc] [<ffffffffa067af48>] ? ldlm_lock_enqueue+0x228/0xbb0 [ptlrpc] [<ffffffffa0675148>] ? lock_res_and_lock+0x58/0xe0 [ptlrpc] [<ffffffffa067b9fd>] ? ldlm_lock_change_resource+0x12d/0x3f0 [ptlrpc] [<ffffffffa067e6f9>] ? ldlm_resource_get+0xe9/0xc00 [ptlrpc] [<ffffffffa067e1c3>] ? ldlm_resource_putref+0x73/0x430 [ptlrpc] [<ffffffffa067c7b3>] ? ldlm_lock_match+0x273/0x8f0 [ptlrpc] [<ffffffff8116b342>] ? find_inode+0x62/0xb0 [<ffffffffa097dea2>] ? ll_update_inode+0x3a2/0x1140 [lustre] [<ffffffffa099d150>] ? fid_test_inode+0x0/0x80 [lustre] [<ffffffff8116c766>] ? ifind+0x66/0xc0 [<ffffffffa099d150>] ? fid_test_inode+0x0/0x80 [lustre] [<ffffffffa06764d1>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0 [ptlrpc] [<ffffffffa0676846>] ? ldlm_lock_add_to_lru+0x46/0x110 [ptlrpc] [<ffffffffa067e6f9>] ? ldlm_resource_get+0xe9/0xc00 [ptlrpc] [<ffffffffa067664d>] ? ldlm_lock_remove_from_lru_nolock+0x3d/0xe0 [ptlrpc] [<ffffffffa0676951>] ? ldlm_lock_remove_from_lru+0x41/0x110 [ptlrpc] [<ffffffffa067e1c3>] ? ldlm_resource_putref+0x73/0x430 [ptlrpc] [<ffffffffa0676a41>] ? ldlm_lock_addref_internal_nolock+0x21/0xa0 [ptlrpc] [<ffffffffa0677866>] ? search_queue+0xc6/0x170 [ptlrpc] [<ffffffffa067c62a>] ? ldlm_lock_match+0xea/0x8f0 [ptlrpc] [<ffffffffa043369e>] ? cfs_free+0xe/0x10 [libcfs] [<ffffffffa06aedad>] ? __ptlrpc_req_finished+0x59d/0xb30 [ptlrpc] [<ffffffffa09579f0>] ? ll_lookup_finish_locks+0x80/0x140 [lustre] [<ffffffffa06764d1>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0 [ptlrpc] [<ffffffffa0676846>] ? ldlm_lock_add_to_lru+0x46/0x110 [ptlrpc] [<ffffffffa067bfb0>] ? ldlm_lock_decref_internal+0x2f0/0x880 [ptlrpc] [<ffffffffa0678b8f>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc] [<ffffffffa0678b8f>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc] [<ffffffffa067cfa1>] ? ldlm_lock_decref+0x41/0xb0 [ptlrpc] [<ffffffffa08e92f4>] ? mdc_set_lock_data+0xd4/0x270 [mdc] [<ffffffffa0957950>] ? ll_intent_drop_lock+0xa0/0xc0 [lustre] [<ffffffff8116966a>] ? d_kill+0xaa/0x110 [<ffffffffa09579f0>] ? ll_lookup_finish_locks+0x80/0x140 [lustre] [<ffffffffa099d7ee>] ? ll_prepare_mdc_op_data+0xbe/0x120 [lustre] [<ffffffffa04390c3>] ? ts_kernel_list_record_file_line+0x123/0x3c0 [libcfs] [<ffffffffa0963f41>] ? __ll_inode_revalidate_it+0x191/0x6e0 [lustre] [<ffffffffa099d990>] ? ll_mdc_blocking_ast+0x0/0x890 [lustre] [<ffffffff8116a03b>] ? dput+0x9b/0x190 [<ffffffff8108854f>] ? up+0x2f/0x50 [<ffffffff814abb8e>] ? common_interrupt+0xe/0x13 [<ffffffffa099c9bb>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre] [<ffffffff8116fb1f>] ? mntput_no_expire+0x4f/0x1c0 [<ffffffff8116fcad>] ? mntput+0x1d/0x30 [<ffffffff8115d8b2>] ? path_put+0x22/0x30 [<ffffffff81157b23>] ? vfs_fstatat+0x73/0x80 [<ffffffff81157b54>] ? sys_newfstatat+0x24/0x50 [<ffffffff8100bf82>] ? system_call_fastpath+0x16/0x1b Am I to understand that mntput_no_expire was running, then an interrupt was called (i.e. common_interrupt). This then made some calls, including a call to ll_mdc_blocking_ast. Is that right? Is ll_mdc_blocking_ast supposed to run under an interrupt ??? Look how deep the stack goes on after that. No wonder there is a lockup! Normally, I would expect an interrupt service routine to do something quickly, and then start a worker thread if something more substantial needed to be done. Perhaps I am completely misunderstanding this stack trace. Can someone please advise me? Also, how do I interpret a line like this: [<ffffffffa099d990>] ? ll_mdc_blocking_ast+0x0/0x890 [lustre] Why is there a question mark? What are the two addresses separated by the slash? Thanks. Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com http://www.terascala.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110817/c4656d33/attachment.html
On 2011-08-17, at 2:23 PM, Roger Spellman wrote:> I am in the process of porting Lustre client 1.8.4 to a recent kernel, 2.6.38.8.That is somewhat an unfortunate starting point, since 1.8.6 clients at least work with 2.6.32 kernels.> This has been a challenge for a variety of reasons, such as the dcache_lock being removed from the kernel.Yes, the dcache change is a major obstacle to newer kernels.> I am pretty close to having it working, but I still can generate system lockup. So far, I have only tested Lustre over TCP over ethernet.It''s difficult to make any kind of assessment without knowing what changes you have made to the client. It would be useful if you would submit a series of patches so that we can take a look at your patches.> Using magic-sysrq, I was able to get a call trace. But, I am not sure if I fully understand it. Can someone please validate my analysis? > > Here is one of the running threads: > > rm R running task 0 2039 2030 0x00000088 > ffff88011cfbb578 ffffffffa059c8df 0000000000000000 0000000000000000 > ffff88011caf8880 000000000001ce7d 00000000000000c1 0000000000000000 > ffff88011cfbb548 ffffffffa06bfe34 0000000000000000 ffff88011ce9c800 > Call Trace: > [<ffffffffa059c8df>] ? LNetMDUnlink+0x6f/0x110 [lnet] > [<ffffffffa06bfe34>] ? lustre_msg_get_slv+0x94/0x100 [ptlrpc] > [<ffffffff8104ce53>] ? __wake_up+0x53/0x70 > [<ffffffffa0698c29>] ? ldlm_completion_ast+0x349/0x8d0 [ptlrpc] > [<ffffffffa067af48>] ? ldlm_lock_enqueue+0x228/0xbb0 [ptlrpc] > [<ffffffffa0675148>] ? lock_res_and_lock+0x58/0xe0 [ptlrpc] > [<ffffffffa067b9fd>] ? ldlm_lock_change_resource+0x12d/0x3f0 [ptlrpc] > [<ffffffffa067e6f9>] ? ldlm_resource_get+0xe9/0xc00 [ptlrpc] > [<ffffffffa067e1c3>] ? ldlm_resource_putref+0x73/0x430 [ptlrpc] > [<ffffffffa067c7b3>] ? ldlm_lock_match+0x273/0x8f0 [ptlrpc] > [<ffffffff8116b342>] ? find_inode+0x62/0xb0 > [<ffffffffa097dea2>] ? ll_update_inode+0x3a2/0x1140 [lustre] > [<ffffffffa099d150>] ? fid_test_inode+0x0/0x80 [lustre] > [<ffffffff8116c766>] ? ifind+0x66/0xc0 > [<ffffffffa099d150>] ? fid_test_inode+0x0/0x80 [lustre] > [<ffffffffa06764d1>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0 [ptlrpc] > [<ffffffffa0676846>] ? ldlm_lock_add_to_lru+0x46/0x110 [ptlrpc] > [<ffffffffa067e6f9>] ? ldlm_resource_get+0xe9/0xc00 [ptlrpc] > [<ffffffffa067664d>] ? ldlm_lock_remove_from_lru_nolock+0x3d/0xe0 [ptlrpc] > [<ffffffffa0676951>] ? ldlm_lock_remove_from_lru+0x41/0x110 [ptlrpc] > [<ffffffffa067e1c3>] ? ldlm_resource_putref+0x73/0x430 [ptlrpc] > [<ffffffffa0676a41>] ? ldlm_lock_addref_internal_nolock+0x21/0xa0 [ptlrpc] > [<ffffffffa0677866>] ? search_queue+0xc6/0x170 [ptlrpc] > [<ffffffffa067c62a>] ? ldlm_lock_match+0xea/0x8f0 [ptlrpc] > [<ffffffffa043369e>] ? cfs_free+0xe/0x10 [libcfs] > [<ffffffffa06aedad>] ? __ptlrpc_req_finished+0x59d/0xb30 [ptlrpc] > [<ffffffffa09579f0>] ? ll_lookup_finish_locks+0x80/0x140 [lustre] > [<ffffffffa06764d1>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0 [ptlrpc] > [<ffffffffa0676846>] ? ldlm_lock_add_to_lru+0x46/0x110 [ptlrpc] > [<ffffffffa067bfb0>] ? ldlm_lock_decref_internal+0x2f0/0x880 [ptlrpc] > [<ffffffffa0678b8f>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc] > [<ffffffffa0678b8f>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc] > [<ffffffffa067cfa1>] ? ldlm_lock_decref+0x41/0xb0 [ptlrpc] > [<ffffffffa08e92f4>] ? mdc_set_lock_data+0xd4/0x270 [mdc] > [<ffffffffa0957950>] ? ll_intent_drop_lock+0xa0/0xc0 [lustre] > [<ffffffff8116966a>] ? d_kill+0xaa/0x110 > [<ffffffffa09579f0>] ? ll_lookup_finish_locks+0x80/0x140 [lustre] > [<ffffffffa099d7ee>] ? ll_prepare_mdc_op_data+0xbe/0x120 [lustre] > [<ffffffffa04390c3>] ? ts_kernel_list_record_file_line+0x123/0x3c0 [libcfs] > [<ffffffffa0963f41>] ? __ll_inode_revalidate_it+0x191/0x6e0 [lustre] > [<ffffffffa099d990>] ? ll_mdc_blocking_ast+0x0/0x890 [lustre] > [<ffffffff8116a03b>] ? dput+0x9b/0x190 > [<ffffffff8108854f>] ? up+0x2f/0x50 > [<ffffffff814abb8e>] ? common_interrupt+0xe/0x13 > [<ffffffffa099c9bb>] ? ll_stats_ops_tally+0x6b/0xd0 [lustre] > [<ffffffff8116fb1f>] ? mntput_no_expire+0x4f/0x1c0 > [<ffffffff8116fcad>] ? mntput+0x1d/0x30 > [<ffffffff8115d8b2>] ? path_put+0x22/0x30 > [<ffffffff81157b23>] ? vfs_fstatat+0x73/0x80 > [<ffffffff81157b54>] ? sys_newfstatat+0x24/0x50 > [<ffffffff8100bf82>] ? system_call_fastpath+0x16/0x1b > > Am I to understand that mntput_no_expire was running, then an interrupt was called (i.e. common_interrupt). This then made some calls, including a call to ll_mdc_blocking_ast. Is that right? > > Is ll_mdc_blocking_ast supposed to run under an interrupt ???No, the Linux stack traces are terrible, they just print anything that looks like the address of a kernel or module function. That includes function addresses that are passed as function parameters, such as callback functions. It must have hit an interrupt at one point, but I think it is just random garbage on the stack.> Look how deep the stack goes on after that. No wonder there is a lockup! Normally, I would expect an interrupt service routine to do something quickly, and then start a worker thread if something more substantial needed to be done. > > Perhaps I am completely misunderstanding this stack trace. Can someone please advise me? > > Also, how do I interpret a line like this: > > [<ffffffffa099d990>] ? ll_mdc_blocking_ast+0x0/0x890 [lustre] > > Why is there a question mark?Not sure.> What are the two addresses separated by the slash?That is the relative location within the function. In this case I think this is a function passed as a parameter, since it can hardly have called another function at 0 bytes offset. Another example shows:> ll_lookup_finish_locks+0x80/0x140which means it is 0x80 bytes into the 0x140 bytes of this function. This can be useful with gdb: # gdb /lib/modules/2.6.32.26-175.fc12.x86_64/updates/kernel/fs/lustre/lustre.ko gdb> list *(ll_lookup_finish_locks+0x80) 0x1900 is in ll_lookup_finish_locks (/root/rpmbuild/BUILD/lustre-1.8.6/lustre/llite/dcache.c:361) Cheers, Andreas -- Andreas Dilger Principal Engineer Whamcloud, Inc.
Andreas, Thanks for you reply. It was very helpful. See my responses, below.> > I am in the process of porting Lustre client 1.8.4 to a recentkernel,> 2.6.38.8. > > That is somewhat an unfortunate starting point, since 1.8.6 clients at > least work with 2.6.32 kernels.I understand. I started this project before 1.8.6 came out, and I wanted to stick with 1.8.4, in case any problems came up with 1.8.6. As soon as I am done with 1.8.4, I will port my patch to 1.8.6.> It''s difficult to make any kind of assessment without knowing whatchanges> you have made to the client. It would be useful if you would submit a > series of patches so that we can take a look at your patches.My plan was to get it working, then post the patch to anyone who wanted it. That should be pretty soon. I''m assuming that other people are wanting to run Lustre with recent kernels.> No, the Linux stack traces are terrible, they just print anything that > looks like the address of a kernel or module function. That includes > function addresses that are passed as function parameters, such as > callback functions. It must have hit an interrupt at one point, but I > think it is just random garbage on the stack.Too bad. I compiled the Kernel with Frame Pointers, so I hoped that the kernel could unwind the stack properly. Now that I know to ignore the Stack Trace, I can instrument the code to track down this problem. Thanks. Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com <http://www.terascala.com/>
On 2011-08-18, at 7:50 AM, "Roger Spellman" <Roger.Spellman at terascala.com> wrote:> Andreas, > Thanks for you reply. It was very helpful. See my responses, below. > >>> I am in the process of porting Lustre client 1.8.4 to a recent > kernel, >> 2.6.38.8. >> >> That is somewhat an unfortunate starting point, since 1.8.6 clients at >> least work with 2.6.32 kernels. > > I understand. I started this project before 1.8.6 came out, and I > wanted to stick with 1.8.4, in case any problems came up with 1.8.6. As > soon as I am done with 1.8.4, I will port my patch to 1.8.6. > >> It''s difficult to make any kind of assessment without knowing what > changes >> you have made to the client. It would be useful if you would submit a >> series of patches so that we can take a look at your patches. > > My plan was to get it working, then post the patch to anyone who wanted > it. That should be pretty soon. I''m assuming that other people are > wanting to run Lustre with recent kernels. > >> No, the Linux stack traces are terrible, they just print anything that >> looks like the address of a kernel or module function. That includes >> function addresses that are passed as function parameters, such as >> callback functions. It must have hit an interrupt at one point, but I >> think it is just random garbage on the stack. > > Too bad. I compiled the Kernel with Frame Pointers, so I hoped that the > kernel could unwind the stack properly.That helps, but AFAIK it isn''t 100% correct even then.> Now that I know to ignore the Stack Trace, I can instrument the code to > track down this problem.I don''t think you need to ignore the stack, just treat it with caution and look for a valid callpath through the listed functions. Cheers, Andreas