uname -a Linux OSS2_MASTER 2.6.9-67.0.7.EL_lustre.1.6.5smp #1 SMP Mon May 12 22:02:50 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Please explain why happened on below messages, Thank you! Oct 6 20:52:08 OSS2_MASTER kernel: LustreError: 7090:0:(ldlm_lock.c: 430:__ldlm_handle2lock()) ASSERTION(lock->l_resource != NULL) failed Oct 6 20:52:08 OSS2_MASTER kernel: LustreError: 7090:0:(tracefile.c: 432:libcfs_assertion_failed()) LBUG Oct 6 20:52:08 OSS2_MASTER kernel: Lustre: 7090:0:(linux-debug.c: 167:libcfs_debug_dumpstack()) showing stack for process 7090 Oct 6 20:52:08 OSS2_MASTER kernel: ldlm_cn_06 R running task 0 7090 1 7091 7089 (L-TLB) Oct 6 20:52:08 OSS2_MASTER kernel: 0000000000000000 ffffffffa03204c9 000001013b822200 0000000000000000 Oct 6 20:52:08 OSS2_MASTER kernel: 000001013b514100 ffffffffa01f345e 000001013b5efd98 0000000000000001 Oct 6 20:52:08 OSS2_MASTER kernel: 0000010043575ea0 0000000000000000 Oct 6 20:52:08 OSS2_MASTER kernel: Call Trace:<ffffffffa03204c9>{:ptlrpc:ptlrpc_server_handle_request+2457} Oct 6 20:52:08 OSS2_MASTER kernel: <ffffffffa01f345e>{:libcfs:lcw_update_time+30} <ffffffff80133855>{__wake_up_common+67} Oct 6 20:52:08 OSS2_MASTER kernel: <ffffffffa0322ba5>{:ptlrpc:ptlrpc_main+3989} <ffffffffa0321110>{:ptlrpc:ptlrpc_retry_rqbds+0} Oct 6 20:52:08 OSS2_MASTER kernel: <ffffffffa0321110>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa0321110>{:ptlrpc:ptlrpc_retry_rqbds+0} Oct 6 20:52:08 OSS2_MASTER kernel: <ffffffff80110de3>{child_rip +8} <ffffffffa0321c10>{:ptlrpc:ptlrpc_main+0} Oct 6 20:52:08 OSS2_MASTER kernel: <ffffffff80110ddb>{child_rip +0} Oct 6 20:52:08 OSS2_MASTER kernel: LustreError: dumping log to /tmp/ lustre-log.1223297528.7090 Oct 6 20:53:48 OSS2_MASTER kernel: Lustre: 7181:0:(ldlm_lib.c: 525:target_handle_reconnect()) lenovo-OST0007: c3eabd86-7217-9168-fc97- ea08b634e1ad reconnecting Oct 6 20:53:48 OSS2_MASTER kernel: Lustre: 7181:0:(ldlm_lib.c: 760:target_handle_connect()) lenovo-OST0007: refuse reconnection from c3eabd86-7217-9168-fc97-ea08b634e1ad at 192.168.1.101@tcp to 0x0000010127a9e000; still busy with 2 active RPCs Oct 6 20:53:48 OSS2_MASTER kernel: LustreError: 7181:0:(ldlm_lib.c: 1536:target_send_reply_msg()) @@@ processing error (-16) req at 000001013a7ee400 x25502749/t0 o8->c3eabd86-7217-9168-fc97- ea08b634e1ad at NET_0x20000c0a80165_UUID:0/0 lens 304/200 e 0 to 0 dl 1223297728 ref 1 fl Interpret:/0/0 rc -16/0 Oct 6 20:53:48 OSS2_MASTER kernel: LustreError: 7181:0:(ldlm_lib.c: 1536:target_send_reply_msg()) Skipped 1 previous similar message Oct 6 20:54:35 OSS2_MASTER kernel: Lustre: Request x31540325 sent from lenovo-OST0007 to NID 192.168.1.101 at tcp 20s ago has timed out (limit 20s). Oct 6 20:53:48 OSS2_MASTER kernel: LustreError: 7181:0:(ldlm_lib.c: 1536:target_send_reply_msg()) Skipped 1 previous similar message Oct 6 20:54:35 OSS2_MASTER kernel: Lustre: Request x31540325 sent from lenovo-OST0007 to NID 192.168.1.101 at tcp 20s ago has timed out (limit 20s). Oct 6 20:54:35 OSS2_MASTER kernel: LustreError: 138-a: lenovo- OST0007: A client on nid 192.168.1.101 at tcp was evicted due to a lock glimpse callback to 192.168.1.101 at tcp timed out: rc -110 Oct 6 20:54:40 OSS2_MASTER kernel: Lustre: Request x31540908 sent from lenovo-OST0007 to NID 192.168.1.101 at tcp 20s ago has timed out (limit 20s). Oct 6 20:54:53 OSS2_MASTER kernel: Lustre: Request x31542339 sent from lenovo-OST0007 to NID 192.168.1.101 at tcp 20s ago has timed out (limit 20s). Oct 6 20:54:55 OSS2_MASTER kernel: Lustre: Request x31542458 sent from lenovo-OST0007 to NID 192.168.1.101 at tcp 20s ago has timed out (limit 20s). Oct 6 21:02:08 OSS2_MASTER kernel: Lustre: 0:0:(watchdog.c: 130:lcw_cb()) Watchdog triggered for pid 7090: it was inactive for 600s Oct 6 21:02:08 OSS2_MASTER kernel: Lustre: 0:0:(linux-debug.c: 167:libcfs_debug_dumpstack()) showing stack for process 7090 Oct 6 21:02:08 OSS2_MASTER kernel: ldlm_cn_06 D 0000000000000001 0 7090 1 7091 7089 (L-TLB) Oct 6 21:02:08 OSS2_MASTER kernel: 0000010120021b38 0000000000000046 0000000000000000 ffffffffa0206728 Oct 6 21:02:08 OSS2_MASTER kernel: 0000000000000700 0000010120021ac8 00000000000001b0 00000005a01e9a58 Oct 6 21:02:08 OSS2_MASTER kernel: 000001013341e030 000000000000033a Oct 6 21:02:08 OSS2_MASTER kernel: Call Trace:<ffffffffa01ee014>{:libcfs:libcfs_debug_dumplog+292} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa01e9bb6>{:libcfs:lbug_with_loc+182} <ffffffffa01f0b44>{:libcfs:libcfs_assertion_failed+84} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa02d94e8>{:ptlrpc:__ldlm_handle2lock+328} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa03191f4>{:ptlrpc:lustre_msg_set_timeout+52} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa03174c7>{:ptlrpc:lustre_msg_get_flags+87} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa02f682d>{:ptlrpc:ldlm_request_cancel+525} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa0314d79>{:ptlrpc:lustre_pack_reply+41} <ffffffffa031a890>{:ptlrpc:lustre_swab_ldlm_request+0} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa02f7e34>{:ptlrpc:ldlm_handle_cancel+532} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa0317dcf>{:ptlrpc:lustre_msg_get_opc+95} <ffffffffa03141af>{:ptlrpc:lustre_msg_get_conn_cnt+95} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa02fa3ba>{:ptlrpc:ldlm_cancel_handler+730} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa031e2f1>{:ptlrpc:ptlrpc_check_req+17} <ffffffffa0317baf>{:ptlrpc:lustre_msg_get_handle+79} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa03204c9>{:ptlrpc:ptlrpc_server_handle_request+2457} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa01f345e>{:libcfs:lcw_update_time+30} <ffffffff80133855>{__wake_up_common+67} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa0322ba5>{:ptlrpc:ptlrpc_main+3989} <ffffffffa0321110>{:ptlrpc:ptlrpc_retry_rqbds+0} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa0321110>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa0321110>{:ptlrpc:ptlrpc_retry_rqbds+0} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffffa0321110>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa0321110>{:ptlrpc:ptlrpc_retry_rqbds+0} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffff80110de3>{child_rip +8} <ffffffffa0321c10>{:ptlrpc:ptlrpc_main+0} Oct 6 21:02:08 OSS2_MASTER kernel: <ffffffff80110ddb>{child_rip +0} Oct 6 21:02:08 OSS2_MASTER kernel: LustreError: dumping log to /tmp/ lustre-log.1223298128.7090
On Wed, Oct 08, 2008 at 12:38:05AM -0700, Johnlya wrote:> Oct 6 20:52:08 OSS2_MASTER kernel: LustreError: 7090:0:(ldlm_lock.c: > 430:__ldlm_handle2lock()) ASSERTION(lock->l_resource != NULL) failedThis problem will be fixed in 1.6.6. Meanwhile, you can apply the patch attached to bug 16496. https://bugzilla.lustre.org/show_bug.cgi?id=16496 Johann