Hi, I have following issues with 1.6.5.1 & 1.6.7.1, please let us know if there are any work around or fix for this. I notice time out on OST regularly on different OST/OSS May 10 08:17:57 dadbdd01 kernel: LustreError: 30058:0:(mds_open.c:1097:mds_open()) ASSERTION(!mds_inode_is_orphan(dchild->d_inode)) failed:dchild 57d21b1:7baa128 1 (ffff8106bf064df8) inode ffff8105d85f9668/92086705/2074743425 May 10 08:17:57 dadbdd01 kernel: LustreError: 30058:0:(mds_open.c:1097:mds_open()) LBUG May 10 08:17:57 dadbdd01 kernel: Lustre: 30058:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for process 30058 May 10 08:17:57 dadbdd01 kernel: ll_mdt_61 R running task 0 30058 1 30210 30057 (L-TLB) May 10 08:17:57 dadbdd01 kernel: ffff810183e1fe50 0000000000000046 ffff81081fa21800 ffffffff8006b6c9 May 10 08:17:57 dadbdd01 kernel: ffff810314fdd540 ffffffff885e06c1 ffff8105b9099c00 ffff8105b9099ce0 May 10 08:17:57 dadbdd01 kernel: ffff81065bf8a800 ffffffff885de3d6 ffff8105b9099d88 0000000000000000 May 10 08:17:57 dadbdd01 kernel: Call Trace: May 10 08:17:57 dadbdd01 kernel: [<ffffffff8006b6c9>] do_gettimeofday+0x50/0x92 May 10 08:17:57 dadbdd01 kernel: [<ffffffff885de3d6>] :libcfs:lcw_update_time+0x16/0x100 May 10 08:17:57 dadbdd01 kernel: [<ffffffff800868b0>] __wake_up_common+0x3e/0x68 May 10 08:17:57 dadbdd01 kernel: [<ffffffff88730efc>] :ptlrpc:ptlrpc_main+0xdcc/0xf50 May 10 08:17:57 dadbdd01 kernel: [<ffffffff80088432>] default_wake_function+0x0/0xe May 10 08:17:57 dadbdd01 kernel: [<ffffffff8005bfb1>] child_rip+0xa/0x11 May 10 08:17:57 dadbdd01 kernel: [<ffffffff88730130>] :ptlrpc:ptlrpc_main+0x0/0xf50 May 10 08:17:57 dadbdd01 kernel: [<ffffffff8005bfa7>] child_rip+0x0/0x11 May 10 08:17:57 dadbdd01 kernel: May 10 08:17:57 dadbdd01 kernel: LustreError: dumping log to /tmp/lustre-log.1241968677.30058 May 10 08:19:37 dadbdd01 kernel: Lustre: 29978:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000: f34601eb-c4bd-2d2f-ae73-0dde300eb530 reconnecting May 10 08:19:37 dadbdd01 kernel: Lustre: 29978:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse reconnection from f34601eb-c4bd-2d2f-ae73-0dde3 00eb530 at 10.229.168.37@tcp to 0xffff8105b4004000; still busy with 2 active RPCs May 10 08:19:37 dadbdd01 kernel: LustreError: 29978:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff8104966c4600 x236853107/t0 o3 8->f34601eb-c4bd-2d2f-ae73-0dde300eb530 at NET_0x200000ae5a825_UUID:0/0 lens 304/200 e 0 to 0 dl 1241968877 ref 1 fl Interpret:/0/0 rc -16/0 May 10 08:19:37 dadbdd01 kernel: LustreError: 29978:0:(ldlm_lib.c:1536:target_send_reply_msg()) Skipped 1 previous similar message May 10 08:20:02 dadbdd01 kernel: Lustre: 30213:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000: f34601eb-c4bd-2d2f-ae73-0dde300eb530 reconnecting May 10 08:20:02 dadbdd01 kernel: Lustre: 30213:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse reconnection from f34601eb-c4bd-2d2f-ae73-0dde3 00eb530 at 10.229.168.37@tcp to 0xffff8105b4004000; still busy with 2 active RPCs May 10 08:20:02 dadbdd01 kernel: LustreError: 30213:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff8105fd78de00 x236853112/t0 o3 8->f34601eb-c4bd-2d2f-ae73-0dde300eb530 at NET_0x200000ae5a825_UUID:0/0 lens 304/200 e 0 to 0 dl 1241968902 ref 1 fl Interpret:/0/0 rc -16/0 May 10 08:20:27 dadbdd01 kernel: Lustre: 15599:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000: f34601eb-c4bd-2d2f-ae73-0dde300eb530 reconnecting May 10 08:20:27 dadbdd01 kernel: Lustre: 15599:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse reconnection from f34601eb-c4bd-2d2f-ae73-0dde3 00eb530 at 10.229.168.37@tcp to 0xffff8105b4004000; still busy with 2 active RPCs May 10 08:20:27 dadbdd01 kernel: LustreError: 15599:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff8101a4eeba00 x236853130/t0 o3 8->f34601eb-c4bd-2d2f-ae73-0dde300eb530 at NET_0x200000ae5a825_UUID:0/0 lens 304/200 e 0 to 0 dl 1241968927 ref 1 fl Interpret:/0/0 rc -16/0 May 10 08:20:52 dadbdd01 kernel: Lustre: 29999:0:(ldlm_lib.c:525:target_handle_reconnect()) farmres-MDT0000: f34601eb-c4bd-2d2f-ae73-0dde300eb530 reconnecting May 10 08:20:52 dadbdd01 kernel: Lustre: 29999:0:(ldlm_lib.c:760:target_handle_connect()) farmres-MDT0000: refuse reconnection from f34601eb-c4bd-2d2f-ae73-0dde3 00eb530 at 10.229.168.37@tcp to 0xffff8105b4004000; still busy with 2 active RPCs May 10 08:20:52 dadbdd01 kernel: LustreError: 29999:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at ffff81026a206000 x236853142/t0 o3 8->f34601eb-c4bd-2d2f-ae73-0dde300eb530 at NET_0x200000ae5a825_UUID:0/0 lens 304/200 e 0 to 0 dl 1241968952 ref 1 fl Interpret:/0/0 rc -16/0 May 10 08:21:03 dadbdd01 kernel: LustreError: 0:0:(ldlm_lockd.c:234:waiting_locks_callback()) ### lock callback timer expired after 1244s: evicting client at 10. 229.168.37 at tcp ns: mds-farmres-MDT0000_UUID lock: ffff81082085e200/0xa87c36a1c15f5013 lrc: 1/0,0 mode: CR/CR res: 93848281/2073576669 bits 0x3 rrc: 7 type: IBT flags: 4000020 remote: 0xff97ebb5aa42e7c4 expref: 69961 pid 15621 May 10 08:21:17 dadbdd01 kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 30058: it was inactive for 200s May 10 08:21:17 dadbdd01 kernel: Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for process 30058 May 10 08:21:17 dadbdd01 kernel: ll_mdt_61 D ffff8104fcbb0528 0 30058 1 30210 30057 (L-TLB) May 10 08:21:17 dadbdd01 kernel: ffff810183e1f700 0000000000000046 ffffffff885e0026 ffffffff885e07f0 May 10 08:21:17 dadbdd01 kernel: 000000000000000a ffff81082b91c7a0 ffff81082fe9e100 0004b26ed485e4e0 May 10 08:21:17 dadbdd01 kernel: 000000000000293f ffff81082b91c988 ffff810100000005 ffffffff8003b127 May 10 08:21:17 dadbdd01 kernel: Call Trace: May 10 08:21:17 dadbdd01 kernel: [<ffffffff8003b127>] remove_wait_queue+0x1c/0x2c May 10 08:21:17 dadbdd01 kernel: [<ffffffff80088432>] default_wake_function+0x0/0xe May 10 08:21:17 dadbdd01 kernel: [<ffffffff885d5c6b>] :libcfs:lbug_with_loc+0xbb/0xc0 May 10 08:21:17 dadbdd01 kernel: [<ffffffff889a28d7>] :mds:mds_open+0x1f57/0x322b May 10 08:21:17 dadbdd01 kernel: [<ffffffff800868b0>] __wake_up_common+0x3e/0x68 May 10 08:21:17 dadbdd01 kernel: [<ffffffff887a32a1>] :ksocklnd:ksocknal_queue_tx_locked+0x4f1/0x550 May 10 08:21:17 dadbdd01 kernel: [<ffffffff886e319e>] :ptlrpc:lock_res_and_lock+0xbe/0xe0 May 10 08:21:17 dadbdd01 kernel: [<ffffffff8897fbf9>] :mds:mds_reint_rec+0x1d9/0x2b0 May 10 08:21:17 dadbdd01 kernel: [<ffffffff889a5b43>] :mds:mds_open_unpack+0x2f3/0x410 May 10 08:21:17 dadbdd01 kernel: [<ffffffff889a422f>] :mds:mds_update_unpack+0x20f/0x2b0 May 10 08:21:17 dadbdd01 kernel: [<ffffffff8897265a>] :mds:mds_reint+0x35a/0x420 May 10 08:21:17 dadbdd01 kernel: [<ffffffff88971432>] :mds:fixup_handle_for_resent_req+0x52/0x200 May 10 08:21:17 dadbdd01 kernel: [<ffffffff88976303>] :mds:mds_intent_policy+0x453/0xc10 May 10 08:21:17 dadbdd01 kernel: [<ffffffff886eb386>] :ptlrpc:ldlm_resource_putref+0x1b6/0x3a0 May 10 08:21:17 dadbdd01 kernel: [<ffffffff886e8be6>] :ptlrpc:ldlm_lock_enqueue+0x186/0x990 May 10 08:21:17 dadbdd01 kernel: [<ffffffff886e5dad>] :ptlrpc:ldlm_lock_create+0x9ad/0x9e0 May 10 08:21:17 dadbdd01 kernel: [<ffffffff88709e20>] :ptlrpc:ldlm_server_completion_ast+0x0/0x5b0 May 10 08:21:17 dadbdd01 kernel: [<ffffffff88707745>] :ptlrpc:ldlm_handle_enqueue+0xc95/0x1280 May 10 08:21:17 dadbdd01 kernel: [<ffffffff8870a3d0>] :ptlrpc:ldlm_server_blocking_ast+0x0/0x6b0 May 10 08:21:17 dadbdd01 kernel: [<ffffffff8897ab07>] :mds:mds_handle+0x4047/0x4d10 May 10 08:21:17 dadbdd01 kernel: [<ffffffff8014090e>] __next_cpu+0x19/0x28 May 10 08:21:17 dadbdd01 kernel: [<ffffffff80073331>] smp_send_reschedule+0x4e/0x53 May 10 08:21:17 dadbdd01 kernel: [<ffffffff88682d31>] :obdclass:class_handle2object+0xd1/0x160 May 10 08:21:17 dadbdd01 kernel: [<ffffffff88721b7f>] :ptlrpc:lustre_msg_get_conn_cnt+0x4f/0x100 May 10 08:21:17 dadbdd01 kernel: [<ffffffff8872be9a>] :ptlrpc:ptlrpc_check_req+0x1a/0x110 May 10 08:21:17 dadbdd01 kernel: [<ffffffff8872dfe2>] :ptlrpc:ptlrpc_server_handle_request+0x992/0x1030 May 10 08:21:17 dadbdd01 kernel: [<ffffffff8006b6c9>] do_gettimeofday+0x50/0x92 May 10 08:21:17 dadbdd01 kernel: [<ffffffff885de3d6>] :libcfs:lcw_update_time+0x16/0x100 May 10 08:21:17 dadbdd01 kernel: [<ffffffff800868b0>] __wake_up_common+0x3e/0x68 May 10 08:21:17 dadbdd01 kernel: [<ffffffff88730efc>] :ptlrpc:ptlrpc_main+0xdcc/0xf50 May 10 08:21:17 dadbdd01 kernel: [<ffffffff80088432>] default_wake_function+0x0/0xe May 10 08:21:17 dadbdd01 kernel: [<ffffffff8005bfb1>] child_rip+0xa/0x11 May 10 08:21:17 dadbdd01 kernel: [<ffffffff88730130>] :ptlrpc:ptlrpc_main+0x0/0xf50 May 10 08:21:17 dadbdd01 kernel: [<ffffffff8005bfa7>] child_rip+0x0/0x11 May 10 08:21:17 dadbdd01 kernel: May 10 08:21:17 dadbdd01 kernel: LustreError: dumping log to /tmp/lustre-log.1241968877.30058 Thanks Anil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090511/b95e64d4/attachment-0001.html
On Mon, 2009-05-11 at 15:19 +0530, anil kumar wrote:> Hi, > > I have following issues with 1.6.5.1 & 1.6.7.1, please let us know if > there are any work around or fix for this. > I notice time out on OST regularly on different OST/OSS > > > May 10 08:17:57 dadbdd01 kernel: LustreError: > 30058:0:(mds_open.c:1097:mds_open()) ASSERTION(! > mds_inode_is_orphan(dchild->d_inode)) failed:dchild 57d21b1:7baa128 > 1 (ffff8106bf064df8) inode ffff8105d85f9668/92086705/2074743425Have you tried searching BZ for this one? It looks familiar and I''d bet you find a bug about it already filed. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090511/8a4f6f5a/attachment.bin