When the system resouce of Client is not enough, the OSS display some errors: Lustre: lenovo-OST0002: haven''t heard from client 24bdc118- cf78-9d56-190c-bb9a2836bd41 (at 192.168.1.251 at tcp) in 227 seconds. I think it''s dead, and I am evicting it. Lustre: 6613:0:(ldlm_lib.c:525:target_handle_reconnect()) lenovo- OST0000: 24bdc118-cf78-9d56-190c-bb9a2836bd41 reconnecting Lustre: 6613:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 1 previous similar message LustreError: 5881:0:(ldlm_resource.c:767:ldlm_resource_add()) lvbo_init failed for resource 2207359: rc -2 LustreError: 6969:0:(ldlm_lock.c:430:__ldlm_handle2lock()) ASSERTION(lock->l_resource != NULL) failed LustreError: 6969:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG Lustre: 6969:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for process 6969 ldlm_cn_13 R running task 0 6969 1 6970 6968 (L-TLB) 0000000000000000 ffffffffa031b4c9 0000010005fe1a00 0000000000000000 00000100bffab240 ffffffffa01ee45e 0000010005eed598 0000000000000001 0000010082899ea0 0000000000000000 Call Trace:<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request +2457} <ffffffffa01ee45e>{:libcfs:lcw_update_time+30} <ffffffff80133855>{__wake_up_common+67} <ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989} <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff80110de3>{child_rip+8} <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0} <ffffffff80110ddb>{child_rip+0} LustreError: dumping log to /tmp/lustre-log.1216640103.6969 Lustre: 6495:0:(ldlm_lib.c:525:target_handle_reconnect()) lenovo- OST0002: 440eafce-9f15-16a6-4764-7f54d92f9204 reconnecting Lustre: 6495:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 2 previous similar messages Lustre: 6495:0:(ldlm_lib.c:760:target_handle_connect()) lenovo- OST0002: refuse reconnection from 440eafce-9f15-16a6-4764-7f54d92f9204 at 192.168.1.102@tcp to 0x0000010058fde000; still busy with 2 active RPCs LustreError: 6495:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ processing error (-16) req at 0000010137f4b400 x68793117/t0 o8->440eafce-9f15-16a6-4764-7f54d92f9204 at NET_0x20000c0a80166_UUID:0/0lens 304/200 e 0 to 0 dl 1216640303 ref 1 fl Interpret:/0/0 rc -16/0 Lustre: Request x103723701 sent from lenovo-OST0002 to NID 192.168.1.102 at tcp 20s ago has timed out (limit 20s). Lustre: Skipped 6 previous similar messages LustreError: 138-a: lenovo-OST0002: A client on nid 192.168.1.102 at tcp was evicted due to a lock glimpse callback to 192.168.1.102 at tcp timed out: rc -110 Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 6969: it was inactive for 600s Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack for process 6969 ldlm_cn_13 D 0000000000000001 0 6969 1 6970 6968 (L-TLB) 000001008289db38 0000000000000046 0000000000000000 ffffffffa0201728 0000000000000700 000001008289dac8 00000000000001b0 00000000a01e4a58 0000010083163030 00000000000002d9 Call Trace:<ffffffffa01e9014>{:libcfs:libcfs_debug_dumplog+292} <ffffffffa01e4bb6>{:libcfs:lbug_with_loc+182} <ffffffffa01ebb44>{:libcfs:libcfs_assertion_failed+84} <ffffffffa02d44e8>{:ptlrpc:__ldlm_handle2lock+328} <ffffffffa03141f4>{:ptlrpc:lustre_msg_set_timeout+52} <ffffffffa03124c7>{:ptlrpc:lustre_msg_get_flags+87} <ffffffffa02f182d>{:ptlrpc:ldlm_request_cancel+525} <ffffffffa030fd79>{:ptlrpc:lustre_pack_reply+41} <ffffffffa0315890>{:ptlrpc:lustre_swab_ldlm_request+0} <ffffffffa02f2e34>{:ptlrpc:ldlm_handle_cancel+532} <ffffffffa0312dcf>{:ptlrpc:lustre_msg_get_opc+95} <ffffffffa030f1af>{:ptlrpc:lustre_msg_get_conn_cnt+95} <ffffffffa02f53ba>{:ptlrpc:ldlm_cancel_handler+730} <ffffffffa03192f1>{:ptlrpc:ptlrpc_check_req+17} <ffffffffa0312baf>{:ptlrpc:lustre_msg_get_handle+79} <ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request+2457} <ffffffffa01ee45e>{:libcfs:lcw_update_time+30} <ffffffff80133855>{__wake_up_common+67} <ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989} <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff80110de3>{child_rip+8} <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0} <ffffffff80110ddb>{child_rip+0} LustreError: dumping log to /tmp/lustre-log.1216640703.6969 LustreError: 6701:0:(ldlm_resource.c:767:ldlm_resource_add()) lvbo_init failed for resource 3973448: rc -2
Lustre version is 1.6.5.1 [root at OSS1_MASTER ~]# uname -a Linux OSS1_MASTER 2.6.9-67.0.7.EL_lustre.1.6.5smp #1 SMP Mon May 12 22:02:50 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux On Mon, 2008-08-04, at 03:06 PM, Johnlya <john... at gmail.com> wrote:> When the system resouce of Client is not enough, the OSS display some > errors: > > Lustre: lenovo-OST0002: haven''t heard from client 24bdc118- > cf78-9d56-190c-bb9a2836bd41 (at 192.168.1.251 at tcp) in 227 seconds. I > think it''s dead, and I am evicting it. > Lustre: 6613:0:(ldlm_lib.c:525:target_handle_reconnect()) lenovo- > OST0000: 24bdc118-cf78-9d56-190c-bb9a2836bd41 reconnecting > Lustre: 6613:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 1 > previous similar message > LustreError: 5881:0:(ldlm_resource.c:767:ldlm_resource_add()) > lvbo_init failed for resource 2207359: rc -2 > LustreError: 6969:0:(ldlm_lock.c:430:__ldlm_handle2lock()) > ASSERTION(lock->l_resource != NULL) failed > LustreError: 6969:0:(tracefile.c:432:libcfs_assertion_failed()) LBUG > Lustre: 6969:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing > stack for process 6969 > ldlm_cn_13 ? ?R ?running task ? ? ? 0 ?6969 ? ? ?1 ? ? ? ? ?6970 ?6968 > (L-TLB) > 0000000000000000 ffffffffa031b4c9 0000010005fe1a00 0000000000000000 > ? ? ? ?00000100bffab240 ffffffffa01ee45e 0000010005eed598 > 0000000000000001 > ? ? ? ?0000010082899ea0 0000000000000000 > Call Trace:<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request > +2457} > ? ? ? ?<ffffffffa01ee45e>{:libcfs:lcw_update_time+30} > <ffffffff80133855>{__wake_up_common+67} > ? ? ? ?<ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989} > <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} > ? ? ? ?<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} > <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} > ? ? ? ?<ffffffff80110de3>{child_rip+8} > <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0} > ? ? ? ?<ffffffff80110ddb>{child_rip+0} > LustreError: dumping log to /tmp/lustre-log.1216640103.6969 > Lustre: 6495:0:(ldlm_lib.c:525:target_handle_reconnect()) lenovo- > OST0002: 440eafce-9f15-16a6-4764-7f54d92f9204 reconnecting > Lustre: 6495:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 2 > previous similar messages > Lustre: 6495:0:(ldlm_lib.c:760:target_handle_connect()) lenovo- > OST0002: refuse reconnection from > 440eafce-9f15-16a6-4764-7f54d92f9... at 192.168.1.102@tcp to > 0x0000010058fde000; still busy with 2 active RPCs > LustreError: 6495:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@ > processing error (-16) ?req at 0000010137f4b400 x68793117/t0 o8->440eafce-9f15-16a6-4764-7f54d92f9204 at NET_0x20000c0a80166_UUID:0/0 > > lens 304/200 e 0 to 0 dl 1216640303 ref 1 fl Interpret:/0/0 rc -16/0 > Lustre: Request x103723701 sent from lenovo-OST0002 to NID > 192.168.1.102 at tcp 20s ago has timed out (limit 20s). > Lustre: Skipped 6 previous similar messages > LustreError: 138-a: lenovo-OST0002: A client on nid 192.168.1.102 at tcp > was evicted due to a lock glimpse callback to 192.168.1.102 at tcp timed > out: rc -110 > Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 6969: > it was inactive for 600s > Lustre: 0:0:(linux-debug.c:167:libcfs_debug_dumpstack()) showing stack > for process 6969 > ldlm_cn_13 ? ?D 0000000000000001 ? ? 0 ?6969 ? ? ?1 ? ? ? ? ?6970 > 6968 (L-TLB) > 000001008289db38 0000000000000046 0000000000000000 ffffffffa0201728 > ? ? ? ?0000000000000700 000001008289dac8 00000000000001b0 > 00000000a01e4a58 > ? ? ? ?0000010083163030 00000000000002d9 > Call Trace:<ffffffffa01e9014>{:libcfs:libcfs_debug_dumplog+292} > ? ? ? ?<ffffffffa01e4bb6>{:libcfs:lbug_with_loc+182} > <ffffffffa01ebb44>{:libcfs:libcfs_assertion_failed+84} > ? ? ? ?<ffffffffa02d44e8>{:ptlrpc:__ldlm_handle2lock+328} > ? ? ? ?<ffffffffa03141f4>{:ptlrpc:lustre_msg_set_timeout+52} > ? ? ? ?<ffffffffa03124c7>{:ptlrpc:lustre_msg_get_flags+87} > ? ? ? ?<ffffffffa02f182d>{:ptlrpc:ldlm_request_cancel+525} > ? ? ? ?<ffffffffa030fd79>{:ptlrpc:lustre_pack_reply+41} > <ffffffffa0315890>{:ptlrpc:lustre_swab_ldlm_request+0} > ? ? ? ?<ffffffffa02f2e34>{:ptlrpc:ldlm_handle_cancel+532} > ? ? ? ?<ffffffffa0312dcf>{:ptlrpc:lustre_msg_get_opc+95} > <ffffffffa030f1af>{:ptlrpc:lustre_msg_get_conn_cnt+95} > ? ? ? ?<ffffffffa02f53ba>{:ptlrpc:ldlm_cancel_handler+730} > ? ? ? ?<ffffffffa03192f1>{:ptlrpc:ptlrpc_check_req+17} > <ffffffffa0312baf>{:ptlrpc:lustre_msg_get_handle+79} > ? ? ? ?<ffffffffa031b4c9>{:ptlrpc:ptlrpc_server_handle_request+2457} > ? ? ? ?<ffffffffa01ee45e>{:libcfs:lcw_update_time+30} > <ffffffff80133855>{__wake_up_common+67} > ? ? ? ?<ffffffffa031dba5>{:ptlrpc:ptlrpc_main+3989} > <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} > ? ? ? ?<ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} > <ffffffffa031c110>{:ptlrpc:ptlrpc_retry_rqbds+0} > ? ? ? ?<ffffffff80110de3>{child_rip+8} > <ffffffffa031cc10>{:ptlrpc:ptlrpc_main+0} > ? ? ? ?<ffffffff80110ddb>{child_rip+0} > LustreError: dumping log to /tmp/lustre-log.1216640703.6969 > LustreError: 6701:0:(ldlm_resource.c:767:ldlm_resource_add()) > lvbo_init failed for resource 3973448: rc -2 > _______________________________________________ > Lustre-discuss mailing list > Lustre-disc... at lists.lustre.orghttp://lists.lustre.org/mailman/listinfo/lustre-discuss
On Mon, 2008-08-04 at 00:06 -0700, Johnlya wrote:> When the system resouce of Client is not enough, the OSS display some > errors: > > Lustre: lenovo-OST0002: haven''t heard from client 24bdc118- > cf78-9d56-190c-bb9a2836bd41 (at 192.168.1.251 at tcp) in 227 seconds. I > think it''s dead, and I am evicting it. > Lustre: 6613:0:(ldlm_lib.c:525:target_handle_reconnect()) lenovo- > OST0000: 24bdc118-cf78-9d56-190c-bb9a2836bd41 reconnecting > Lustre: 6613:0:(ldlm_lib.c:525:target_handle_reconnect()) Skipped 1 > previous similar message > LustreError: 5881:0:(ldlm_resource.c:767:ldlm_resource_add()) > lvbo_init failed for resource 2207359: rc -2 > LustreError: 6969:0:(ldlm_lock.c:430:__ldlm_handle2lock()) > ASSERTION(lock->l_resource != NULL) failed > LustreError: 6969:0:(tracefile.c:432:libcfs_assertion_failed()) LBUGBug 16496, fixed in 1.6.6. A (hopefully) helpful hint: if you have specific error content, searching bugzilla will usually get you an answer quicker than posting here and waiting for somebody to answer. This is especially true if you post outside of typical North American day time hours. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080806/2cda02c7/attachment.bin