anil kumar
2009-Apr-22 11:20 UTC
[Lustre-discuss] Luster LBUG Not Healthy ON MDT with 1.6.7 & 1.6.5.1
Hi, We have issues with LBUG not healthy on MDT, following are the error messages logged installed 1.6.7 and got the following, so we had to revert back to 1.6.5.1, even with 1.6.5.1 i see the same issue now, Apr 20 06:18:45 dadbdd01 kernel: LustreError: 15772:0:(mds_open.c:1097:mds_open()) ASSERTION(!mds_inode_is_orphan(dchild->d_inode)) failed:dchild 29f8cba:16b5b5ab (ffff8105c4b50660) inode ffff8105c71c3d48/44010682/381007275 Apr 20 06:18:45 dadbdd01 kernel: LustreError: 15772:0:(mds_open.c:1097:mds_open()) LBUG Apr 20 06:18:45 dadbdd01 kernel: Lustre: 15772:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for process 15772 Apr 20 06:18:45 dadbdd01 kernel: ll_mdt_50 R running task 0 15772 1 15773 15771 (L-TLB) Apr 20 06:18:45 dadbdd01 kernel: ffff8107cf93de50 0000000000000046 ffff8107fe1bf1c8 ffffffff8006b6c9 Apr 20 06:18:45 dadbdd01 kernel: ffff8107cfcb8840 ffffffff885e46c1 ffff8107fe1bf000 ffff8107fe1bf0e0 Apr 20 06:18:45 dadbdd01 kernel: ffff8108009fdc40 ffffffff885e23d6 ffff8107fe1bf188 0000000000000000 Apr 20 06:18:45 dadbdd01 kernel: Call Trace: Apr 20 06:18:45 dadbdd01 kernel: [<ffffffff8006b6c9>] do_gettimeofday+0x50/0x92 Apr 20 06:18:45 dadbdd01 kernel: [<ffffffff885e23d6>] :libcfs:lcw_update_time+0x16/0x100 Apr 20 06:18:45 dadbdd01 kernel: [<ffffffff800868b0>] __wake_up_common+0x3e/0x68 Apr 20 06:18:45 dadbdd01 kernel: [<ffffffff88734efc>] :ptlrpc:ptlrpc_main+0xdcc/0xf50 Apr 20 06:18:45 dadbdd01 kernel: [<ffffffff80088432>] default_wake_function+0x0/0xe Apr 20 06:18:45 dadbdd01 kernel: [<ffffffff8005bfb1>] child_rip+0xa/0x11 Apr 20 06:18:45 dadbdd01 kernel: [<ffffffff88734130>] :ptlrpc:ptlrpc_main+0x0/0xf50 Apr 20 06:18:45 dadbdd01 kernel: [<ffffffff8005bfa7>] child_rip+0x0/0x11 Apr 20 06:18:45 dadbdd01 kernel: Apr 20 06:18:45 dadbdd01 kernel: LustreError: dumping log to /tmp/lustre-log.1240233525.15772 Apr 20 06:20:25 dadbdd01 kernel: Lustre: 15763:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000: dc9c3080-04b8-5b51-f82e-9e218a37b675 reconnecting Apr 20 06:20:25 dadbdd01 kernel: Lustre: 15763:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse reconnection from dc9c3080-04b8-5b51-f82e-9e218a37b675 at 10.229.168.36@tcp to 0xffff8107fa566000; still busy with 2 active RPCs Apr 20 06:20:25 dadbdd01 kernel: LustreError: 15763:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff8107fa72dc00 x277188558/t0 o38->dc9c3080-04b8-5b51-f82e-9e218a37b675 at NET_0x200000ae5a824_UUID:0/0 lens 304/200 e 0 to 0 dl 1240233725 ref 1 fl Interpret:/0/0 rc -16/0 Apr 20 06:20:50 dadbdd01 kernel: Lustre: 15769:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000: dc9c3080-04b8-5b51-f82e-9e218a37b675 reconnecting Apr 20 06:20:50 dadbdd01 kernel: Lustre: 15769:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse reconnection from dc9c3080-04b8-5b51-f82e-9e218a37b675 at 10.229.168.36@tcp to 0xffff8107fa566000; still busy with 2 active RPCs Apr 20 06:20:50 dadbdd01 kernel: LustreError: 15769:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff81055a96ac00 x277188624/t0 o38->dc9c3080-04b8-5b51-f82e-9e218a37b675 at NET_0x200000ae5a824_UUID:0/0 lens 304/200 e 0 to 0 dl 1240233750 ref 1 fl Interpret:/0/0 rc -16/0 Apr 20 06:21:15 dadbdd01 kernel: Lustre: 15741:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000: dc9c3080-04b8-5b51-f82e-9e218a37b675 reconnecting Apr 20 06:21:15 dadbdd01 kernel: Lustre: 15741:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse reconnection from dc9c3080-04b8-5b51-f82e-9e218a37b675 at 10.229.168.36@tcp to 0xffff8107fa566000; still busy with 2 active RPCs Apr 20 06:21:15 dadbdd01 kernel: LustreError: 15741:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff81078039f800 x277188703/t0 o38->dc9c3080-04b8-5b51-f82e-9e218a37b675 at NET_0x200000ae5a824_UUID:0/0 lens 304/200 e 0 to 0 dl 1240233775 ref 1 fl Interpret:/0/0 rc -16/0 Apr 20 06:21:40 dadbdd01 kernel: Lustre: 15953:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000: dc9c3080-04b8-5b51-f82e-9e218a37b675 reconnecting Apr 20 06:21:40 dadbdd01 kernel: Lustre: 15953:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse reconnection from dc9c3080-04b8-5b51-f82e-9e218a37b675 at 10.229.168.36@tcp to 0xffff8107fa566000; still busy with 2 active RPCs Apr 20 06:21:40 dadbdd01 kernel: LustreError: 15953:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff810515284200 x277188781/t0 o38->dc9c3080-04b8-5b51-f82e-9e218a37b675 at NET_0x200000ae5a824_UUID:0/0 lens 304/200 e 0 to 0 dl 1240233800 ref 1 fl Interpret:/0/0 rc -16/0 Apr 20 06:21:54 dadbdd01 kernel: LustreError: 0:0:(ldlm_lockd.c:234:waiting_locks_callback()) ### lock callback timer expired after 7485s: evicting client at 10.229.168.36 at tcp ns: mds-farmres-MDT0000_UUID lock: ffff8106ba114600/0x7cd8e2e59918c380 lrc: 1/0,0 mode: CR/CR res: 53842204/380418865 bits 0x3 rrc: 3 type: IBT flags: 4000020 remote: 0x6628125da30deb02 expref: 41753 pid 15774 Apr 20 06:22:05 dadbdd01 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 15772: it was inactive for 200s Apr 20 06:22:05 dadbdd01 kernel: Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for process 15772 Apr 20 06:22:05 dadbdd01 kernel: ll_mdt_50 D ffff8107fdb66528 0 15772 1 15773 15771 (L-TLB) Apr 20 06:22:05 dadbdd01 kernel: ffff8107cf93d700 0000000000000046 ffffffff885e4026 ffffffff885e47f0 Apr 20 06:22:05 dadbdd01 kernel: 000000000000000a ffff810828fe57a0 ffff81011cb24100 000232d0fd32a434 Apr 20 06:22:05 dadbdd01 kernel: 000000000000212e ffff810828fe5988 ffff810700000007 ffffffff8003b127 Apr 20 06:22:05 dadbdd01 kernel: Call Trace: Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8003b127>] remove_wait_queue+0x1c/0x2c Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff80088432>] default_wake_function+0x0/0xe Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff885d9c6b>] :libcfs:lbug_with_loc+0xbb/0xc0 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff889a48d7>] :mds:mds_open+0x1f57/0x322b Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff800868b0>] __wake_up_common+0x3e/0x68 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff887a52a1>] :ksocklnd:ksocknal_queue_tx_locked+0x4f1/0x550 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff886e719e>] :ptlrpc:lock_res_and_lock+0xbe/0xe0 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff88981bf9>] :mds:mds_reint_rec+0x1d9/0x2b0 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff889a7b43>] :mds:mds_open_unpack+0x2f3/0x410 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff889a622f>] :mds:mds_update_unpack+0x20f/0x2b0 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8897465a>] :mds:mds_reint+0x35a/0x420 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff88973432>] :mds:fixup_handle_for_resent_req+0x52/0x200 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff88978303>] :mds:mds_intent_policy+0x453/0xc10 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff886274ec>] :lnet:LNetMDBind+0x2ac/0x400 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff886ef386>] :ptlrpc:ldlm_resource_putref+0x1b6/0x3a0 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff886ecbe6>] :ptlrpc:ldlm_lock_enqueue+0x186/0x990 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff886e9dad>] :ptlrpc:ldlm_lock_create+0x9ad/0x9e0 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8870de20>] :ptlrpc:ldlm_server_completion_ast+0x0/0x5b0 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8870b745>] :ptlrpc:ldlm_handle_enqueue+0xc95/0x1280 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8870e3d0>] :ptlrpc:ldlm_server_blocking_ast+0x0/0x6b0 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8897cb07>] :mds:mds_handle+0x4047/0x4d10 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8014090e>] __next_cpu+0x19/0x28 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff80073331>] smp_send_reschedule+0x4e/0x53 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff88686d31>] :obdclass:class_handle2object+0xd1/0x160 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff88725b7f>] :ptlrpc:lustre_msg_get_conn_cnt+0x4f/0x100 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8872fe9a>] :ptlrpc:ptlrpc_check_req+0x1a/0x110 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff88731fe2>] :ptlrpc:ptlrpc_server_handle_request+0x992/0x1030 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8006b6c9>] do_gettimeofday+0x50/0x92 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff885e23d6>] :libcfs:lcw_update_time+0x16/0x100 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff800868b0>] __wake_up_common+0x3e/0x68 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff88734efc>] :ptlrpc:ptlrpc_main+0xdcc/0xf50 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff80088432>] default_wake_function+0x0/0xe Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8005bfb1>] child_rip+0xa/0x11 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff88734130>] :ptlrpc:ptlrpc_main+0x0/0xf50 Apr 20 06:22:05 dadbdd01 kernel: [<ffffffff8005bfa7>] child_rip+0x0/0x11 Apr 20 06:22:05 dadbdd01 kernel: Apr 20 06:22:05 dadbdd01 kernel: LustreError: dumping log to /tmp/lustre-log.1240233725.15772 Can someone suggest on this, Thanks, Anil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090422/6b950040/attachment.html
Brian J. Murrell
2009-Apr-22 12:30 UTC
[Lustre-discuss] Luster LBUG Not Healthy ON MDT with 1.6.7 & 1.6.5.1
On Wed, 2009-04-22 at 16:50 +0530, anil kumar wrote:> Hi, > > We have issues with LBUG not healthy on MDT, following are the error > messages logged > > installed 1.6.7 and got the following, so we had to revert back to > 1.6.5.1, even with 1.6.5.1 i see the same issue now, > > Apr 20 06:18:45 dadbdd01 kernel: LustreError: > 15772:0:(mds_open.c:1097:mds_open()) ASSERTION(! > mds_inode_is_orphan(dchild->d_inode)) failed:dchild 29f8cba:16b5b5ab > (ffff8105c4b50660) inode ffff8105c71c3d48/44010682/381007275 > Apr 20 06:18:45 dadbdd01 kernel: LustreError: > 15772:0:(mds_open.c:1097:mds_open()) LBUGThis could be bug 17764 which is still open. If you can reproduce this failure, you could try to assist on that bug as we have been unable to reproduce it ourselves. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090422/de991508/attachment.bin
Andreas Dilger
2009-Apr-22 18:37 UTC
[Lustre-discuss] Luster LBUG Not Healthy ON MDT with 1.6.7 & 1.6.5.1
On Apr 22, 2009 16:50 +0530, anil kumar wrote:> We have issues with LBUG not healthy on MDT, following are the error > messages logged > > installed 1.6.7 and got the following, so we had to revert back to 1.6.5.1, > even with 1.6.5.1 i see the same issue now,Note that 1.6.7 is not safe to use on the MDS, and 1.6.7.1 was released in its place. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.