Patrick Shopbell
2013-Feb-26 15:43 UTC
[Lustre-discuss] OSTs not activating following MGS/MDS move
Hello everyone, I am having an odd problem here, on our small Lustre installation. We have a single MGS/MDS and 3 OSS''s with 7 OSTs total. I just tried moving the MDS/MGS to a faster machine, following the instructions in sections 17.3 and 17.4 of the Lustre manual: with the system offline, I mounted the file systems as "ldiskfs" and then used the Lustre tar command to make a copy of everything. I checked a bunch of the xattrs - all looked to match fine. Finally, I reset the system configs on the MDS/MGS with:> tunefs.lustre --writeconf /dev/md126and on the OSSs with something like:> tunefs.lustre --writeconf /dev/sdb > tunefs.lustre --erase-param --mgsnode=192.168.30.113 at tcp --index=0 --writeconf /dev/sdbwhere I kept the indices the same as in my original setup. I can now mount the MGS/MDS, and then mount the OSTs. However, I get these three errors on the MGS, when an OST mounts: Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 6 (-107) Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0006_UUID failed at llog_origin_connect: -107 Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0006_UUID sync failed -107, deactivating And when I run ''lctl dl'', the OSTs are apparently all inactive: 5 IN osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 Any ideas what I need to do to activate these? I am running Lustre 2.3 on all nodes. I can see the file system on a client and, it seems like, read files, but I cannot create any new files, presumably because the OSTs are not active. Thanks for your suggestions, Patrick *---------------------------------------------------------------* | Patrick Shopbell Department of Astronomy | | pls at astro.caltech.edu Mail Code 249-17 | | (626) 395-4097 California Institute of Technology | | (626) 568-9352 (FAX) Pasadena, CA 91125 | | WWW: http://www.astro.caltech.edu/~pls/ | *---------------------------------------------------------------*
Colin Faber
2013-Feb-26 17:30 UTC
[Lustre-discuss] OSTs not activating following MGS/MDS move
Hi, As a follow up (for archival reasons) the issue Patrick experienced was CATALOG file corruption. Truncation of the CATALOG file on the MDS via ldiskfs mount corrected his problem. -cf On 02/26/2013 08:43 AM, Patrick Shopbell wrote:> Hello everyone, > I am having an odd problem here, on our small Lustre > installation. We have a single MGS/MDS and 3 OSS''s with > 7 OSTs total. I just tried moving the MDS/MGS to a faster > machine, following the instructions in sections 17.3 and 17.4 > of the Lustre manual: with the system offline, I mounted > the file systems as "ldiskfs" and then used the Lustre tar > command to make a copy of everything. I checked a bunch of the > xattrs - all looked to match fine. > > Finally, I reset the system configs on the MDS/MGS with: > >> tunefs.lustre --writeconf /dev/md126 > and on the OSSs with something like: > >> tunefs.lustre --writeconf /dev/sdb >> tunefs.lustre --erase-param --mgsnode=192.168.30.113 at tcp --index=0 --writeconf /dev/sdb > where I kept the indices the same as in my original setup. > > I can now mount the MGS/MDS, and then mount the OSTs. However, > I get these three errors on the MGS, when an OST mounts: > > Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 6 (-107) > Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0006_UUID failed at llog_origin_connect: -107 > Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0006_UUID sync failed -107, deactivating > > And when I run ''lctl dl'', the OSTs are apparently all inactive: > > 5 IN osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 > > Any ideas what I need to do to activate these? I am running > Lustre 2.3 on all nodes. I can see the file system on a client > and, it seems like, read files, but I cannot create any new > files, presumably because the OSTs are not active. > > Thanks for your suggestions, > Patrick > > *---------------------------------------------------------------* > | Patrick Shopbell Department of Astronomy | > | pls at astro.caltech.edu Mail Code 249-17 | > | (626) 395-4097 California Institute of Technology | > | (626) 568-9352 (FAX) Pasadena, CA 91125 | > | WWW: http://www.astro.caltech.edu/~pls/ | > *---------------------------------------------------------------* > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Christopher J. Walker
2013-Mar-07 15:31 UTC
[Lustre-discuss] OSTs not activating following MGS/MDS move
On 26/02/13 17:30, Colin Faber wrote:> Hi, > > > As a follow up (for archival reasons) the issue Patrick experienced was > CATALOG file corruption. Truncation of the CATALOG file on the MDS via > ldiskfs mount corrected his problem. >Thanks for the follow up. As I''m about to undertake a similar move (though on 1.8.8-wc1), and would like to avoid similar problems. It would be useful to know if the CATALOG file corruption was caused by the procedure, or if it was a coincidence. Chris> -cf > > > On 02/26/2013 08:43 AM, Patrick Shopbell wrote: >> Hello everyone, >> I am having an odd problem here, on our small Lustre >> installation. We have a single MGS/MDS and 3 OSS''s with >> 7 OSTs total. I just tried moving the MDS/MGS to a faster >> machine, following the instructions in sections 17.3 and 17.4 >> of the Lustre manual: with the system offline, I mounted >> the file systems as "ldiskfs" and then used the Lustre tar >> command to make a copy of everything. I checked a bunch of the >> xattrs - all looked to match fine. >> >> Finally, I reset the system configs on the MDS/MGS with: >> >>> tunefs.lustre --writeconf /dev/md126 >> and on the OSSs with something like: >> >>> tunefs.lustre --writeconf /dev/sdb >>> tunefs.lustre --erase-param --mgsnode=192.168.30.113 at tcp --index=0 --writeconf /dev/sdb >> where I kept the indices the same as in my original setup. >> >> I can now mount the MGS/MDS, and then mount the OSTs. However, >> I get these three errors on the MGS, when an OST mounts: >> >> Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 6 (-107) >> Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0006_UUID failed at llog_origin_connect: -107 >> Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0006_UUID sync failed -107, deactivating >> >> And when I run ''lctl dl'', the OSTs are apparently all inactive: >> >> 5 IN osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 >> >> Any ideas what I need to do to activate these? I am running >> Lustre 2.3 on all nodes. I can see the file system on a client >> and, it seems like, read files, but I cannot create any new >> files, presumably because the OSTs are not active. >> >> Thanks for your suggestions, >> Patrick >> >> *---------------------------------------------------------------* >> | Patrick Shopbell Department of Astronomy | >> | pls at astro.caltech.edu Mail Code 249-17 | >> | (626) 395-4097 California Institute of Technology | >> | (626) 568-9352 (FAX) Pasadena, CA 91125 | >> | WWW: http://www.astro.caltech.edu/~pls/ | >> *---------------------------------------------------------------* >> >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Colin Faber
2013-Mar-07 15:51 UTC
[Lustre-discuss] OSTs not activating following MGS/MDS move
Hi Christopher, In general this can happen when your initial remount of the various services is in the wrong order. Such as MGS -> OST -> MDT -> Client. or MGS -> MDT -> Clients -> OST, etc. During initial mount and registration it''s critical that your mount be in the correct order: MGS -> MDT -> OST(s) -> Client(s) CATALOG corruption, or out of order sequence is more rare on active file system, but is possible. The simple fix here as described below is to just truncate it and all should be well again. -cf On 03/07/2013 08:31 AM, Christopher J. Walker wrote:> On 26/02/13 17:30, Colin Faber wrote: >> Hi, >> >> >> As a follow up (for archival reasons) the issue Patrick experienced was >> CATALOG file corruption. Truncation of the CATALOG file on the MDS via >> ldiskfs mount corrected his problem. >> > Thanks for the follow up. > > As I''m about to undertake a similar move (though on 1.8.8-wc1), and > would like to avoid similar problems. It would be useful to know if the > CATALOG file corruption was caused by the procedure, or if it was a > coincidence. > > Chris > > >> -cf >> >> >> On 02/26/2013 08:43 AM, Patrick Shopbell wrote: >>> Hello everyone, >>> I am having an odd problem here, on our small Lustre >>> installation. We have a single MGS/MDS and 3 OSS''s with >>> 7 OSTs total. I just tried moving the MDS/MGS to a faster >>> machine, following the instructions in sections 17.3 and 17.4 >>> of the Lustre manual: with the system offline, I mounted >>> the file systems as "ldiskfs" and then used the Lustre tar >>> command to make a copy of everything. I checked a bunch of the >>> xattrs - all looked to match fine. >>> >>> Finally, I reset the system configs on the MDS/MGS with: >>> >>>> tunefs.lustre --writeconf /dev/md126 >>> and on the OSSs with something like: >>> >>>> tunefs.lustre --writeconf /dev/sdb >>>> tunefs.lustre --erase-param --mgsnode=192.168.30.113 at tcp --index=0 --writeconf /dev/sdb >>> where I kept the indices the same as in my original setup. >>> >>> I can now mount the MGS/MDS, and then mount the OSTs. However, >>> I get these three errors on the MGS, when an OST mounts: >>> >>> Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(lov_log.c:155:lov_llog_origin_connect()) error osc_llog_connect tgt 6 (-107) >>> Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(mds_lov.c:873:__mds_lov_synchronize()) lustre-OST0006_UUID failed at llog_origin_connect: -107 >>> Feb 25 22:38:38 yupana kernel: LustreError: 3636:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0006_UUID sync failed -107, deactivating >>> >>> And when I run ''lctl dl'', the OSTs are apparently all inactive: >>> >>> 5 IN osc lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 5 >>> >>> Any ideas what I need to do to activate these? I am running >>> Lustre 2.3 on all nodes. I can see the file system on a client >>> and, it seems like, read files, but I cannot create any new >>> files, presumably because the OSTs are not active. >>> >>> Thanks for your suggestions, >>> Patrick >>> >>> *---------------------------------------------------------------* >>> | Patrick Shopbell Department of Astronomy | >>> | pls at astro.caltech.edu Mail Code 249-17 | >>> | (626) 395-4097 California Institute of Technology | >>> | (626) 568-9352 (FAX) Pasadena, CA 91125 | >>> | WWW: http://www.astro.caltech.edu/~pls/ | >>> *---------------------------------------------------------------* >>> >>> >>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Alex Kulyavtsev
2013-Mar-07 16:39 UTC
[Lustre-discuss] lustre startup sequence Re: OSTs not activating following MGS/MDS move
Hi Colin. This is not what the manual says. Shall it be corrected then? Or, add description for startup sequence in different situations (first start, restart). The manual (or online information) does not describe graceful shutdown sequence for separate MGS/MDT configuration, it will be nice to add that too. Alex. E.g. http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html#50438194_24122 and similar http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.50438194_24122> 13.2 Starting Lustre > > The startup order of Lustre components depends on whether you have a combined MGS/MDT or these components are separate. > > If you have a combined MGS/MDT, the recommended startup order is OSTs, then the MGS/MDT, and then clients. > > If the MGS and MDT are separate, the recommended startup order is: MGS, then OSTs, then the MDT, and then clients.On Mar 7, 2013, at 9:51 AM, Colin Faber wrote:> Hi Christopher, > > In general this can happen when your initial remount of the various > services is in the wrong order. > > Such as MGS -> OST -> MDT -> Client. or MGS -> MDT -> Clients -> OST, etc. > > During initial mount and registration it''s critical that your mount be > in the correct order: > > MGS -> MDT -> OST(s) -> Client(s) > > CATALOG corruption, or out of order sequence is more rare on active file > system, but is possible. The simple fix here as described below is to > just truncate it and all should be well again. > > -cf > > ailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20130307/498bdb61/attachment.html
Colin Faber
2013-Mar-07 16:48 UTC
[Lustre-discuss] lustre startup sequence Re: OSTs not activating following MGS/MDS move
Hi Yes, Thanks for finding this Alex. The manual should be updated with the correct order. -cf On 03/07/2013 09:39 AM, Alex Kulyavtsev wrote:> Hi Colin. > This is not what the manual says. > > Shall it be corrected then? Or, add description for startup sequence > in different situations (first start, restart). > > The manual (or online information) does not describe graceful shutdown > sequence for separate MGS/MDT configuration, it will be nice to add > that too. > > Alex. > > E.g. > http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html#50438194_24122 > and similar > http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.50438194_24122 >> >> >> 13.2 Starting Lustre >> >> The startup order of Lustre components depends on whether you have a >> combined MGS/MDT or these components are separate. >> >> * If you have a combined MGS/MDT, the recommended startup order is >> OSTs, then the MGS/MDT, and then clients. >> >> * If the MGS and MDT are separate, the recommended startup order >> is: *MGS, then OSTs, then the MDT, and then clients.* >> > > > > On Mar 7, 2013, at 9:51 AM, Colin Faber wrote: > >> Hi Christopher, >> >> In general this can happen when your initial remount of the various >> services is in thewrong order. >> >> Such as MGS -> OST -> MDT -> Client. or MGS -> MDT -> Clients -> OST, >> etc. >> >> During initial mount and registration it''s critical that your mount be >> in the correct order: >> >> MGS -> MDT -> OST(s) -> Client(s) >> >> CATALOG corruption, or out of order sequence is more rare on active file >> system, but is possible. The simple fix here as described below is to >> just truncate it and all should be well again. >> >> -cf >> >> ailing list >> Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Jones, Peter A
2013-Mar-07 16:50 UTC
[Lustre-discuss] lustre startup sequence Re: OSTs not activating following MGS/MDS move
Colin Could you please open an LUDOC JIRA ticket to track this correction? Thanks Peter On 3/7/13 8:48 AM, "Colin Faber" <colin_faber at xyratex.com> wrote:>Hi Yes, > >Thanks for finding this Alex. The manual should be updated with the >correct order. > >-cf > > > >On 03/07/2013 09:39 AM, Alex Kulyavtsev wrote: >> Hi Colin. >> This is not what the manual says. >> >> Shall it be corrected then? Or, add description for startup sequence >> in different situations (first start, restart). >> >> The manual (or online information) does not describe graceful shutdown >> sequence for separate MGS/MDT configuration, it will be nice to add >> that too. >> >> Alex. >> >> E.g. >> >>http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html#5 >>0438194_24122 >> and similar >> >>http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact >>/lustre_manual.xhtml#dbdoclet.50438194_24122 >>> >>> >>> 13.2 Starting Lustre >>> >>> The startup order of Lustre components depends on whether you have a >>> combined MGS/MDT or these components are separate. >>> >>> * If you have a combined MGS/MDT, the recommended startup order is >>> OSTs, then the MGS/MDT, and then clients. >>> >>> * If the MGS and MDT are separate, the recommended startup order >>> is: *MGS, then OSTs, then the MDT, and then clients.* >>> >> >> >> >> On Mar 7, 2013, at 9:51 AM, Colin Faber wrote: >> >>> Hi Christopher, >>> >>> In general this can happen when your initial remount of the various >>> services is in thewrong order. >>> >>> Such as MGS -> OST -> MDT -> Client. or MGS -> MDT -> Clients -> OST, >>> etc. >>> >>> During initial mount and registration it''s critical that your mount be >>> in the correct order: >>> >>> MGS -> MDT -> OST(s) -> Client(s) >>> >>> CATALOG corruption, or out of order sequence is more rare on active >>>file >>> system, but is possible. The simple fix here as described below is to >>> just truncate it and all should be well again. >>> >>> -cf >>> >>> ailing list >>> Lustre-discuss at lists.lustre.org >>><mailto:Lustre-discuss at lists.lustre.org> >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >_______________________________________________ >Lustre-discuss mailing list >Lustre-discuss at lists.lustre.org >http://lists.lustre.org/mailman/listinfo/lustre-discuss
DEGREMONT Aurelien
2013-Mar-07 16:52 UTC
[Lustre-discuss] lustre startup sequence Re: OSTs not activating following MGS/MDS move
Hello AFAIK there is 2 orders: - If you are started your filesystem for the first time (or using --writeconf), order is : MGS, MDS, OST, Clients - On normal start MGS, OST, MDS, Clients There is a patch on some recent Lustre release to be able to use the first order any time but I would advise to use the second one anyway as it avoids starting MDS first, lacking connection to OST, and then reconnecting to them when they are really started. Aur?lien Le 07/03/2013 17:48, Colin Faber a ?crit :> Hi Yes, > > Thanks for finding this Alex. The manual should be updated with the > correct order. > > -cf > > > > On 03/07/2013 09:39 AM, Alex Kulyavtsev wrote: >> Hi Colin. >> This is not what the manual says. >> >> Shall it be corrected then? Or, add description for startup sequence >> in different situations (first start, restart). >> >> The manual (or online information) does not describe graceful shutdown >> sequence for separate MGS/MDT configuration, it will be nice to add >> that too. >> >> Alex. >> >> E.g. >> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html#50438194_24122 >> and similar >> http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.50438194_24122 >>> >>> 13.2 Starting Lustre >>> >>> The startup order of Lustre components depends on whether you have a >>> combined MGS/MDT or these components are separate. >>> >>> * If you have a combined MGS/MDT, the recommended startup order is >>> OSTs, then the MGS/MDT, and then clients. >>> >>> * If the MGS and MDT are separate, the recommended startup order >>> is: *MGS, then OSTs, then the MDT, and then clients.* >>> >> >> >> On Mar 7, 2013, at 9:51 AM, Colin Faber wrote: >> >>> Hi Christopher, >>> >>> In general this can happen when your initial remount of the various >>> services is in thewrong order. >>> >>> Such as MGS -> OST -> MDT -> Client. or MGS -> MDT -> Clients -> OST, >>> etc. >>> >>> During initial mount and registration it''s critical that your mount be >>> in the correct order: >>> >>> MGS -> MDT -> OST(s) -> Client(s) >>> >>> CATALOG corruption, or out of order sequence is more rare on active file >>> system, but is possible. The simple fix here as described below is to >>> just truncate it and all should be well again. >>> >>> -cf >>> >>> ailing list >>> Lustre-discuss at lists.lustre.org <mailto:Lustre-discuss at lists.lustre.org> >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Colin Faber
2013-Mar-07 16:53 UTC
[Lustre-discuss] lustre startup sequence Re: OSTs not activating following MGS/MDS move
I should make this clear, This is only critical for initial start up. Successive startups don''t matter so much as services have already been registered. -cf On 03/07/2013 09:52 AM, DEGREMONT Aurelien wrote:> Hello > > AFAIK there is 2 orders: > - If you are started your filesystem for the first time (or using > --writeconf), order is : > MGS, MDS, OST, Clients > - On normal start > MGS, OST, MDS, Clients > > There is a patch on some recent Lustre release to be able to use the > first order any time but I would advise to use the second one anyway > as it avoids starting MDS first, lacking connection to OST, and then > reconnecting to them when they are really started. > > > Aur?lien > > > Le 07/03/2013 17:48, Colin Faber a ?crit : >> Hi Yes, >> >> Thanks for finding this Alex. The manual should be updated with the >> correct order. >> >> -cf >> >> >> >> On 03/07/2013 09:39 AM, Alex Kulyavtsev wrote: >>> Hi Colin. >>> This is not what the manual says. >>> >>> Shall it be corrected then? Or, add description for startup sequence >>> in different situations (first start, restart). >>> >>> The manual (or online information) does not describe graceful shutdown >>> sequence for separate MGS/MDT configuration, it will be nice to add >>> that too. >>> >>> Alex. >>> >>> E.g. >>> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html#50438194_24122 >>> >>> and similar >>> http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.50438194_24122 >>> >>>> >>>> 13.2 Starting Lustre >>>> >>>> The startup order of Lustre components depends on whether you have a >>>> combined MGS/MDT or these components are separate. >>>> >>>> * If you have a combined MGS/MDT, the recommended startup order is >>>> OSTs, then the MGS/MDT, and then clients. >>>> >>>> * If the MGS and MDT are separate, the recommended startup order >>>> is: *MGS, then OSTs, then the MDT, and then clients.* >>>> >>> >>> >>> On Mar 7, 2013, at 9:51 AM, Colin Faber wrote: >>> >>>> Hi Christopher, >>>> >>>> In general this can happen when your initial remount of the various >>>> services is in thewrong order. >>>> >>>> Such as MGS -> OST -> MDT -> Client. or MGS -> MDT -> Clients -> OST, >>>> etc. >>>> >>>> During initial mount and registration it''s critical that your mount be >>>> in the correct order: >>>> >>>> MGS -> MDT -> OST(s) -> Client(s) >>>> >>>> CATALOG corruption, or out of order sequence is more rare on active >>>> file >>>> system, but is possible. The simple fix here as described below is to >>>> just truncate it and all should be well again. >>>> >>>> -cf >>>> >>>> ailing list >>>> Lustre-discuss at lists.lustre.org >>>> <mailto:Lustre-discuss at lists.lustre.org> >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Patrick Shopbell
2013-Mar-07 21:26 UTC
[Lustre-discuss] lustre startup sequence Re: OSTs not activating following MGS/MDS move
Hi all - As the original poster of this thread, I should probably just weigh in that it is indeed possible that something was out of order when I brought up our setup with the new MGS+MDS. I *thought* I did it right, since I was following the instructions in section 14.5 of the manual (Changing a Server NID), and that section does indeed advise the proper initial order: MGS, MDS, OST, Clients But maybe I got a client or something in there too early. I also had some issues with the NIDs of the OSTs pointing to an old ethernet interface first, so maybe that confused things. The solution was perfect, though. Thanks to Colin and this list. -- Patrick On 3/7/13 8:53 AM, Colin Faber wrote:> I should make this clear, > > This is only critical for initial start up. Successive startups don''t > matter so much as services have already been registered. > > -cf > > On 03/07/2013 09:52 AM, DEGREMONT Aurelien wrote: >> Hello >> >> AFAIK there is 2 orders: >> - If you are started your filesystem for the first time (or using >> --writeconf), order is : >> MGS, MDS, OST, Clients >> - On normal start >> MGS, OST, MDS, Clients >> >> There is a patch on some recent Lustre release to be able to use the >> first order any time but I would advise to use the second one anyway >> as it avoids starting MDS first, lacking connection to OST, and then >> reconnecting to them when they are really started. >> >> >> Aur?lien >> >> >> Le 07/03/2013 17:48, Colin Faber a ?crit : >>> Hi Yes, >>> >>> Thanks for finding this Alex. The manual should be updated with the >>> correct order. >>> >>> -cf >>> >>> >>> >>> On 03/07/2013 09:39 AM, Alex Kulyavtsev wrote: >>>> Hi Colin. >>>> This is not what the manual says. >>>> >>>> Shall it be corrected then? Or, add description for startup sequence >>>> in different situations (first start, restart). >>>> >>>> The manual (or online information) does not describe graceful shutdown >>>> sequence for separate MGS/MDT configuration, it will be nice to add >>>> that too. >>>> >>>> Alex. >>>> >>>> E.g. >>>> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreOperations.html#50438194_24122 >>>> >>>> and similar >>>> http://build.whamcloud.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.xhtml#dbdoclet.50438194_24122 >>>> >>>>> >>>>> 13.2 Starting Lustre >>>>> >>>>> The startup order of Lustre components depends on whether you have a >>>>> combined MGS/MDT or these components are separate. >>>>> >>>>> * If you have a combined MGS/MDT, the recommended startup order is >>>>> OSTs, then the MGS/MDT, and then clients. >>>>> >>>>> * If the MGS and MDT are separate, the recommended startup order >>>>> is: *MGS, then OSTs, then the MDT, and then clients.* >>>>> >>>> >>>> >>>> On Mar 7, 2013, at 9:51 AM, Colin Faber wrote: >>>> >>>>> Hi Christopher, >>>>> >>>>> In general this can happen when your initial remount of the various >>>>> services is in thewrong order. >>>>> >>>>> Such as MGS -> OST -> MDT -> Client. or MGS -> MDT -> Clients -> OST, >>>>> etc. >>>>> >>>>> During initial mount and registration it''s critical that your mount be >>>>> in the correct order: >>>>> >>>>> MGS -> MDT -> OST(s) -> Client(s) >>>>> >>>>> CATALOG corruption, or out of order sequence is more rare on active >>>>> file >>>>> system, but is possible. The simple fix here as described below is to >>>>> just truncate it and all should be well again. >>>>> >>>>> -cf >>>>> >>>>> ailing list >>>>> Lustre-discuss at lists.lustre.org >>>>> <mailto:Lustre-discuss at lists.lustre.org> >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- *--------------------------------------------------------------------* | Patrick Shopbell Department of Astronomy | | pls at astro.caltech.edu Mail Code 249-17 | | (626) 395-4097 California Institute of Technology | | (626) 568-9352 (FAX) Pasadena, CA 91125 | | WWW: http://www.astro.caltech.edu/~pls/ | *--------------------------------------------------------------------*