Hi, Recently 3 OSTs started showing up as IN when lctl dl is run. I cannot get the to activate and indicate UP, no data is being written to them but we can read from them. I''ve tried lctl conf_param as well as the lctl --device 9 activate method. How else can I activate these? Hopefuly it''s straight forward as all other OSTs are at 99%! Thanks, Dan -- Sent from my Palm Pre -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100408/755b123d/attachment.html
Christopher J. Morrone
2010-Apr-09 20:54 UTC
[Lustre-discuss] Unable to activate inactive OSTs
Dan wrote:> Hi, > > Recently 3 OSTs started showing up as IN when lctl dl is run. I cannot get the to activate and indicate UP, no data is being written to them but we can read from them. > > I''ve tried lctl conf_param as well as the lctl --device 9 activate method. How else can I activate these? Hopefuly it''s straight forward as all other OSTs are at 99%! > > Thanks, > > DanWhat errors do you see on the console of the MDS? Did you change anything recently, like upgrade from 1.6 to 1.8?
Hi, I haven''t made any changes in some time. Current config is RHEL 4 with Lustre 1.6.7.2. I echoed "1" to /proc/fs/lustre/osc/feline-OST0013-osc/active and it sticks but the lctl dl results still show IN (lfs setstripe will not assign a file or directory to one of the three OST indexes in question still). Also, lctl conf_param exits with "invalid argument" when run against one of the 3 OSTs in question, works against all others. I''ll reply with a more complete list of errors and their details. Here is what I have now. no handle for file close no 7208962 (repeated many times) ksocknal_recv_hello()) unknown protocol version 2.x expected processing error (-116) setparam error -22 Dan -------------- next part -------------- An embedded message was scrubbed... From: "Christopher J. Morrone" <morrone2 at llnl.gov> Subject: Re: [Lustre-discuss] Unable to activate inactive OSTs Date: Fri, 09 Apr 2010 13:54:55 -0700 Size: 1930 Url: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100410/11f7bdb5/attachment.mht
Chris, I''ve not upgraded or changed configuration. Running RHEL 4 w/ Lustre 1.6.7.2. An OSS crasshed and some OSTs show a fail to recover on the MDT but the OSS looks fine, interesting? There are countless pages of errors - here is a good sample of what I''m seeing. Apr 11 04:04: 19 gto kernel: LustreError: 4228:0:(mds_open.c: I567:mds_close()) Skipped 5 previous similar messages Apr II 04:04: 19 gto kernel: LustreError: 4228:0:(ldlm_lib.c: 1643 :targeuend_reply-msgO) @@ @ processing error (-116) req at OOOOOI 0 120cd8400 x I 15633406/tO o35->dd3dbaa4fd91- 7e4c-a254-6ccc5b050949 at NET_Ox2000080ae02c6_UUID:0/0 lens 296/1456 e 0 to 0 dl 1270983959 ref 1 fl Interpret/2/0 rc -116/0 Apr 1104:04:19gtokernel:LustreError:4228:0:(ldlm_lib.c:1643 _msg())Skipped5previous similarmessages Apr 11 04:05:59 gto kernel: Lustre: 5309:0:(ldlm_lib.c:54l:target_handle_reconnect()) feline-MDTOOOO: dd3dbaa4-fd91-7e4c-a254-6ccc5b050949 reconnecting Apr 11 04:05:59 gto kernel: Lustre: 5309:0:(ldlm_lib.c:541:targechandle_reconnectO) Skipped 5 previous similar messages Apr 11 04:14:19 gto kernel: LustreError: 5911 :O:(mds_open.c:1567:mds_close()) @@@ no handle for file close ino 7208962: cookie Oxb9c67340d2497975 reg at 000001005bb03000 x115633406/tO 035->dd3dbaa4-fd91-7e4c-a254-6ccc5b050949 at NET_Ox2000080ae02c6_UU1D:O/O lens 296/1456 e 0 to 0 dl 1270984559 ref I fl Interpret/2/O rc 0/0 Apr 415:41:15gto kernel:LustreError:32555 :0:(lov_request.c:692lov_update_create_set(?error creatingfid Ox780Ice sub-objectonOSTidx 16/1:rc=-110 Apr 5 II :10:14 gto kernel: LustreError: 32581:0:(1l0g_obd.c:226:11og_add()) Skipped 2 previous similar messages Apr 5 11:10:14gtokernel:LustreError:32581:O:(Iov_Iog.c:118:lov_llog_origin_add(?Can''taddllog (rc = -19) for stripe 0 Apr 511:10:14 gto kernel: LustreError: 32581:0:(lov_log.c:118:lov_llog_origin_addO) Skipped 2 previous similar messages Apr 511:10:15gtokernel:LustreError:32566:0:(llog_obd.c:226:11og_add())Noctxt Apr 5Il:!0:15gtokernel:LustreError:32566:0:(llog_obd.c:226: llog_add())Skipped 71previoussimilarmessages Apr 5 11:1 0: 15 gto kernel: LustreError: 32566:0:(lov_Iog.c: 118:lov_llog_origin_add()) Can''t add llog (rc =-19) for stripe 0 Apr 5 II:10:15gtokernel:LustreError:32566:0:(Iov_Iog.c:118:lov_llog_origin_add())Skipped 71previoussimilarmessages Apr 5 11: 10:16 gto kernel: LustreError: 32561 No ctxt Apr 6 15: 14: 16 gto kernel: LustreError: 32557:0:(ldlm_lib.c: 1643:targecsend_reply-msg()) @@@ processing error (-16) req at OOOOOlOld3976000 x1655 litO 038->6271429a-Ie255630- 4b4c-42a685104c79 @NELOx2000080ae0297_UUID:0/0 lens 304/200 e 0 to 0 dl 1270592156 ref I f1 Interpret:lOIO rc -16/0 Apr 6 15: 14:16 gto kernel: Lustre: 32552:0:(service.c:1317:ptlrpc_servechandle_requestO) @@ @ Request x16479 took longer than estimated (l00+50s); client may timeout. req at OOOOOI002958dOOO x16479/t744881971 0 101->6271429a-Ie25-5630-4b4c-42a685104c79 at NET_Ox2000080ae0297_UUID:O/O lens 512/472 eO to 0 dl 1270592006 ref I f1 Complete:/O/O rc 3011301 Apr 6 15:50: 19 gto kernel: LustreError: 11-0: an error occurred while communicating with 128.174.2.107 at tcp. The ost_connect operation failed with -19 Apr 6 15:55:44 gto kernel: LustreError: 32553:0:(1ov_request.c:692:lov_update_create_set()) error creating fid Oxd7039b sub-object on OST idx 8/1: rc = -110 Apr 6 15:59:52 gto kernel: LustreError: 32569:0:(1dlm_lib.c:I643:targecsend_reply-msg()) @@@ processing error (-16) req at 0000010007229800 xI8429/tO 038->6271429a-Ie255630- 4b4c-42a685I04c79 @NET_Ox2000080ae0297_UUID:0/0lens304/200e0to0dl 1270594892refI f1 Interpret:!O/O rc -16/0 Apr 6 15:59:52 gto kernel: LustreError: 32569:0:(1dlm_lib.c:I643:targecsend_reply-msg()) @@@ processing error (-16) req at 0000010007229800 xI8429/tO 038->6271429a-Ie255630- 4b4c-42a685I04c79 @NET_Ox2000080ae0297_UUID:0/0lens304/200e0to0dl 1270594892refI f1 Interpret:!O/O rc -16/0 Apr 616:56:57 gto kernel: LustreError: 1437:0:(events.c:66:requescout_callback()) @@@ type 4, status req at 00000100a4bc6000 xl0737832/tO 08->felineOST0005_ UUID@ 128.174.2.192 at tcp:28/4 lens 304/456 e 0 to 1 dl 1270598222 ref 2 f1 Rpc:N/O/O rc 0/0 Apr 616:56:57 gto kernel: LustreError: 1437:0:(events.c:66:requescout_callback()) @@@ type 4, status req at 00000100a4bc6000 xl0737832/tO 08->felineOST0005_ UUID@ 128.174.2.192 at tcp:28/4 lens 304/456 e 0 to 1 dl 1270598222 ref 2 f1 Rpc:N/O/O rc 0/0 Apr 6 17:40:09 gto kernel: LustreError: 5203 lov_llog_init err Apr 6 17:40:09 gto kernel: LustreError: 5203:0:(1l0R-obd.c:439:1l0g_caUnitialize()) rc: -2 Apr 617:40:09gtokernel:Lustre:530I:0:(mds_open.c:841 :mds_open_by_fid())Orphand286f8:75f5909ffound andopenedin PENDINGdirectory Apr 6 17:40:13gtokernel:Lustre:feline-MDTOOOO:sendingdelayedrepliestorecoveredclients Apr 6 17:40:13gtokernel: Lustre:5315:0:(mds_unlink_open.c:266:mds_cleanup_pending())feline-MDTOOOO:orphand286f8:75f5909fre-openedduring recovery Apr 6 17:40: 13 gto kernel: Lustre: 5315:0:(quota_master.c: 1678:mds_quota_recovery()) Not all osts are active, abort quota recovery Apr 6 17:40: 13 gto kernel: Lustre: feline-MDTOOOO: recovery complete: rc 0 Apr 6 17:40: 13 gto kernel: LustreError: 2:llog_lvfs_create()) error looking up logfile Ox625001 a:Ox76682f22: rc -2 Apr 617:40:13 gto kernel: LustreError: 5480:0: 612:11og_lvfsJreate()) Skipped I previous similar message Apr 6 17:40: 13 gto kernel: LustreError: 5480:0:(1log_cat.c: I72:11og_catjd2handle()) error opening log id Ox62500 Ia:76682f22: rc -2 Apr 6 17:40: 13 gto kernel: LustreError: 5480:0:(llog_cat.c: I72:11og_caUd2handle(? Skipped I previous similar message Apr 6 17:40: 13 gto kernel: LustreError: 5480:0:(1log_obd.c:279:caccancel_cb()) Cannot find handle for log Ox62500la Apr 617:40:13 gto kernel: LustreError: 5480:0:(llog_obd.c:279:caccanceLcb()) Skipped I previous similar message Apr 6 17:40: 13 gto kernel: LustreError: 5479:0:(llog_obd.c:350:llog_obd_origin_setup()) with cat_canceLcb failed: -2 Apr 6 17:40: 13 gto kernel: LustreError: 5479:0:(Ilog_obd.c:350:llog_obd_origin_setup(? Skipped I previous similar message Apr 6 17:40:13 gto kernel: LustreError: 5479:0:(Iov_log.c:243:lov_llog_init()) Skipped I previous similar message Apr 6 17:40: 13 gto kernel: LustreError: 5479:0:(mds_log.c:219:mds_lloK-init()) 10v_1I0g_init err-2 Apr 617:40:13 gtokernel:LustreError:5479:0:(mds_log.c:219:mds_llog_init())Skipped1previous similarmessage Apr 617:40:13gtokernel:LustreError:5479:0:(1I0g_obd.c:439:llog_caUnitialize())rc:-2 Apr 6 17:40: 13 gto kernel: LustreError: 5479:0:(1I0g_obd.c:439:llog_caUnitialize()) Skipped 1 previous similar message Apr 6 17:40:13 gto kernel: LustreError: 5479:0:(mds_Iov.c:918:_mds_lov_synchronize()) feline-OSTOOI3_UUlD failed at update_mds:-2 Apr 617:40: 13gtokernel:LustreError:5479:0:(mds_lov.c:960:_ mds_lov_synchronize())feline-OSTOOI3_UUlD syncfailed-2,deactivating Apr 6 17:40:13 gto kernel: LustreError: 5460:0:(mds_Iov.c:552:mds_lov_updale_mds()) Failed to get objid --3 6 1:LustreError: 546D:0:(mds_Iov.c:918:_mds_lov_synchronizeO) feline-OSTOOOO_UUlD failed at update_mds: -3 Apr 6 17:40: 13 gto kernel: LustreError: 5460:0:(mds_lov.c:960:_mds_lov_synchronize()) feline-OSTOOOO_UUlD sync failed -3, deactivating Apr 617:40:13 gto kernel: Lustre: MDS feline-MDTOOOO: feline-OSTOOI0_UUlD now active, resetting orphans Apr 617:40:13 gto kernel : Lustre: MDS feline-MDTOOOO: feline-OSTOOOCUUlD now active, resetting orphans Apr 6 17:40: 13 gto kernel: LustreError: 5461 :O:(mds_Iov.c:552:mds_Iov_update_mds()) Failed to get objid --3 Apr 6 17:40: 13 gto kernel: LustreError: 5464:0:(osccreate.c:362:osc_create()) feline-OST0004-osc: oscc recovery failed: -II Apr 617:40: 13gtokernel :LustreError:5464:0:(1ov_obd.c:1048:lov_clear_orphans())error inorphanrecovery onOSTidx4/20: rc=-11 Apr 6 17:40:13 gto kernel: LustreError: 5464:0:(mds_lov.c:95I:_mds_Iov_synchronize()) feline-OST0004_UUlD failed at mds_IovJlear_orphans:-II Apr 6 17:40: 13 gto kernel: LustreError: 5465:0:(osccreate.c:362:osccreate()) feline-OSTOOO5-osc: oscc recovery failed: -11 Apr 617:40: 13gtokernel:LustreError:5465:0:(Iov_obd.c:1048:lov_cleacorphans())error inorphanrecovery onOSTidx 5120: rc = -II Apr 617:40:13 gto kernel :LustreError: 5465:0:(mds_Iov.c:95l:_ mds_lov_synchronize()) feline-OST0005_UUlD failed at mds_Iov_clear_orphans: -11 Apr 6 17:40: 13 gto kernel: LustreError: 5466:0:(osc_create.c:362:osc_create()) feline-OST0006-osc: oscc recovery failed: -II Apr 6 17:40: 13 gto kernel: LustreError: 5467:0:(osc_create.c:362:osc_create()) feline-OST0007-osc: oscc recovery failed: -11 Apr 6 17:40: 13 gto kernel: LustreError: 5468:0:(osc_create.c:362:osc_create()) feline-OST0008-osc: oscc recovery failed: -II Apr 6 17:40: 13 gto kernel: LustreError: 5469:0:(osccreate.c:362:osc_create()) feline-OST0009-osc : oscc recovery failed: -11 pr 6 17:47: 18 gto kernel: LustreError: 5865:0:(Ilog_lvfs.c:612:1I0g_1 vfs_create()) error looking up logfile Ox6250020:0x76682f2b: rc -2 Apr 6 17:47:18 gto kernel: LustreError: 5865:0:(llog_lvfs.c:612:1I0g_lvfs_createO) Skipped 2 previous similar messages Apr 617:47: 18gtokernel:LustreError:5865:0:(llog_cat.c:I72:11og_caUd2handleO)error openinglog idOx6250020:76682f2b:rc-2 Apr 617:47 :18gtokernel:LustreError:5865:0:(1l0g_cat.c:172:11og_caUd2handleO)Skipped2previoussimilarmessages Apr 6 17:47 :18 gto kernel: LustreError: 5865:0:(1l0g_obd.c:279:caccancel_cbO) Cannot find handle for log Ox6250020 Apr 6 17:47: 18 gto kernel: LustreError: 5865:0:(1I0g_obd.c:279:caccanceIJb()) Skipped 2 previous similar messages Apr 6 17:47:18 gto kernel: LustreError: 5863:0:(llog_obd.c:350:110K-obd_origin_setupO) with failed: -2 Apr 6 17:47: 18 gto kernel: LustreError: Skipped 2 previous similar messages Thanks, Dan -------------- next part -------------- An embedded message was scrubbed... From: "Christopher J. Morrone" <morrone2 at llnl.gov> Subject: Re: [Lustre-discuss] Unable to activate inactive OSTs Date: Fri, 09 Apr 2010 13:54:55 -0700 Size: 1930 Url: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100414/f9788208/attachment.mht