We use Lustre 1.8.7. Our environment has many Lustre clients spread out accross several networks. When an emergency happes, like a power outage, where we need to quickly shutdown the Lustre servers we frequently are unable to shutdown the clients first. I know that the documentation recommends shutting down Lustre in this order: unmount clients unmount MDT unmount OSTs So my question is, what would the recommended procedure be if one cannot shutdown all the clients first? Would it just be unmount MDT unmount OSTs Or is there something else that should be done because we cannot get the clients shutdown first? -- K. Scott Rowe -- Linux Grouop Lead Array Operations Center, National Radio Astronomy Observatory krowe-+dJpgsE4VdE@public.gmane.org -- http://www.aoc.nrao.edu/~krowe/ 1.575.835.7000 -- 1003 Lopezville Socorro, NM 87801
Scott, What is preventing the clients from being shut down? Unable to unmount the file system? If that is the case, please try "umount -f" instead of umount. This will result in an unclean unmount, but the client *will* be able to unmount Lustre. If other, please advise. Regarding the order of shutdown, I''ve been advocating: Clients MDT OSTs As the MDS is a client of the OSTs (the MDS runs an OSC for each OST). However, using this method has seemingly triggered the recovery process on targets when bringing up the file system. I understand that in version 2.4, OSSs will need to communicate with the MDSs, seemingly creating a two way dependency (possibly my confusion on the matter, but clarification here is requested ). However, as you are using 1.8.7, this latter point should not cause you any problems. -- Brett Lee Sr. Systems Engineer Intel High Performance Data Division> -----Original Message----- > From: lustre-discuss-bounces-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org [mailto:lustre-discuss- > bounces-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org] On Behalf Of K. Scott Rowe > Sent: Tuesday, October 22, 2013 8:51 AM > To: lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > Subject: [Lustre-discuss] Proper shutdown sans clients > > > We use Lustre 1.8.7. Our environment has many Lustre clients spread out > accross several networks. When an emergency happes, like a power outage, > where we need to quickly shutdown the Lustre servers we frequently are > unable to shutdown the clients first. I know that the documentation > recommends shutting down Lustre in this order: > > unmount clients > unmount MDT > unmount OSTs > > So my question is, what would the recommended procedure be if one cannot > shutdown all the clients first? Would it just be > > unmount MDT > unmount OSTs > > Or is there something else that should be done because we cannot get the > clients shutdown first? > > -- > K. Scott Rowe -- Linux Grouop Lead > Array Operations Center, National Radio Astronomy Observatory > krowe-+dJpgsE4VdE@public.gmane.org -- http://www.aoc.nrao.edu/~krowe/ > 1.575.835.7000 -- 1003 Lopezville Socorro, NM 87801 > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
The problem is many of our clients are workstations on people''s desks as well as servers and clusters. I don''t have a good list of all the Lustre clients so I don''t know all the clients to unmount. I''ve thought about making a script to parse lshowmount and unmount clients that way but haven''t had the time. I was hoping that ignoring the clients would be acceptable and just let them recover once Lustre is back up. So, I guess it is a question of risk. If the consensus is that shutting down the clients first is very important to prevent data loss and/or corruption then I will get a script working to do that. But, if the risk is minimal given a recommended way to shutdown Lustre without dealing with the clients, I would be interested in that. On Oct 22 15:06, Lee, Brett wrote: }Scott, } }What is preventing the clients from being shut down? Unable to unmount the file system? If that is the case, please try "umount -f" instead of umount. This will result in an unclean unmount, but the client *will* be able to unmount Lustre. If other, please advise. } }Regarding the order of shutdown, I''ve been advocating: } }Clients }MDT }OSTs } }As the MDS is a client of the OSTs (the MDS runs an OSC for each OST). } }However, using this method has seemingly triggered the recovery process on targets when bringing up the file system. } }I understand that in version 2.4, OSSs will need to communicate with the MDSs, seemingly creating a two way dependency (possibly my confusion on the matter, but clarification here is requested ). However, as you are using 1.8.7, this latter point should not cause you any problems. } }-- }Brett Lee }Sr. Systems Engineer }Intel High Performance Data Division } } }> -----Original Message----- }> From: lustre-discuss-bounces-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org [mailto:lustre-discuss- }> bounces-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org] On Behalf Of K. Scott Rowe }> Sent: Tuesday, October 22, 2013 8:51 AM }> To: lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org }> Subject: [Lustre-discuss] Proper shutdown sans clients }> }> }> We use Lustre 1.8.7. Our environment has many Lustre clients spread out }> accross several networks. When an emergency happes, like a power outage, }> where we need to quickly shutdown the Lustre servers we frequently are }> unable to shutdown the clients first. I know that the documentation }> recommends shutting down Lustre in this order: }> }> unmount clients }> unmount MDT }> unmount OSTs }> }> So my question is, what would the recommended procedure be if one cannot }> shutdown all the clients first? Would it just be }> }> unmount MDT }> unmount OSTs }> }> Or is there something else that should be done because we cannot get the }> clients shutdown first? }> }> -- }> K. Scott Rowe -- Linux Grouop Lead }> Array Operations Center, National Radio Astronomy Observatory }> krowe-+dJpgsE4VdE@public.gmane.org -- http://www.aoc.nrao.edu/~krowe/ }> 1.575.835.7000 -- 1003 Lopezville Socorro, NM 87801 }> _______________________________________________ }> Lustre-discuss mailing list }> Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org }> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Scott, Given that the clients may be workstations, we should probably not discuss this in terms of "shutting down the clients" but instead think "unmount the Lustre file system on all the clients." Since we are discussing the "proper shutdown", the best and recommended way would be to unmount Lustre on all of the clients. So, your idea of writing a script seems a valuable tool to determine which machines are Lustre clients (e.g. have mounted the Lustre file system). If not possible to connect to all the workstations to unmount Lustre, and the storage targets (MDT, OSTs) need to be unmounted, then the clients should be "evicted" (documented in the Lustre manual) from all the servers with mounted targets. This way the targets can be umounted knowing there is no client IO occurring. Note that this is not recommended as a long term solution. In either case, especially the latter, I would also leverage some iptables rules to prevent clients from re-mounting the Lustre file system. -- Brett Lee Sr. Systems Engineer Intel High Performance Data Division> -----Original Message----- > From: K. Scott Rowe [mailto:krowe-+dJpgsE4VdE@public.gmane.org] > Sent: Tuesday, October 22, 2013 9:21 AM > To: Lee, Brett > Cc: K. Scott Rowe; lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > Subject: Re: [Lustre-discuss] Proper shutdown sans clients > > The problem is many of our clients are workstations on people''s desks as well > as servers and clusters. I don''t have a good list of all the Lustre clients so I > don''t know all the clients to unmount. I''ve thought about making a script to > parse lshowmount and unmount clients that way but haven''t had the time. I > was hoping that ignoring the clients would be acceptable and just let them > recover once Lustre is back up. > > So, I guess it is a question of risk. If the consensus is that shutting down the > clients first is very important to prevent data loss and/or corruption then I will > get a script working to do that. But, if the risk is minimal given a > recommended way to shutdown Lustre without dealing with the clients, I would > be interested in that. > > > On Oct 22 15:06, Lee, Brett wrote: > }Scott, > } > }What is preventing the clients from being shut down? Unable to unmount the > file system? If that is the case, please try "umount -f" instead of umount. This > will result in an unclean unmount, but the client *will* be able to unmount > Lustre. If other, please advise. > } > }Regarding the order of shutdown, I''ve been advocating: > } > }Clients > }MDT > }OSTs > } > }As the MDS is a client of the OSTs (the MDS runs an OSC for each OST). > } > }However, using this method has seemingly triggered the recovery process on > targets when bringing up the file system. > } > }I understand that in version 2.4, OSSs will need to communicate with the > MDSs, seemingly creating a two way dependency (possibly my confusion on the > matter, but clarification here is requested ). However, as you are using 1.8.7, > this latter point should not cause you any problems. > } > }-- > }Brett Lee > }Sr. Systems Engineer > }Intel High Performance Data Division > } > } > }> -----Original Message----- > }> From: lustre-discuss-bounces-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org [mailto:lustre-discuss- }> > bounces-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org] On Behalf Of K. Scott Rowe }> Sent: Tuesday, October > 22, 2013 8:51 AM }> To: lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org }> Subject: [Lustre- > discuss] Proper shutdown sans clients }> }> }> We use Lustre 1.8.7. Our > environment has many Lustre clients spread out }> accross several networks. > When an emergency happes, like a power outage, }> where we need to quickly > shutdown the Lustre servers we frequently are }> unable to shutdown the > clients first. I know that the documentation }> recommends shutting down > Lustre in this order: > }> > }> unmount clients > }> unmount MDT > }> unmount OSTs > }> > }> So my question is, what would the recommended procedure be if one cannot > }> shutdown all the clients first? Would it just be }> > }> unmount MDT > }> unmount OSTs > }> > }> Or is there something else that should be done because we cannot get the }> > clients shutdown first? > }> > }> -- > }> K. Scott Rowe -- Linux Grouop Lead > }> Array Operations Center, National Radio Astronomy Observatory }> > krowe-+dJpgsE4VdE@public.gmane.org -- http://www.aoc.nrao.edu/~krowe/ }> 1.575.835.7000 -- > 1003 Lopezville Socorro, NM 87801 }> > _______________________________________________ > }> Lustre-discuss mailing list > }> Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > }> http://lists.lustre.org/mailman/listinfo/lustre-discuss
There is no particular danger to the filesystem if clients fail to unmount. Clients have no direct ability to modify filesystem metadata, so they should never be able to corrupt the filesystem. This is no different than if clients crash or if the network fails, or whatever else bad happens to large computers on a regular basis. If the MDT(s) are unmounted first then the OSTs it at least avoids one smal bit of recovery. If you know that the clients will not be coming back (e.g. power outage and servers are running on UPS) then "umount -f" of the servers will evict all of the clients immediately and it will avoid recovery when they are remounted. The same can be achieved at mount time with "-o abort_recov". If you are doing some minor administration on the server, normal "umount" is enough, and allows the clients to recover and possibly complete their IO after the servers have restarted. For major releases (e.g. 1.8 to 2.x) the clients need to unmount cleanly or will be automatically be evicted after the upgrade. Cheers, Andreas On 2013-10-22, at 8:52, "K. Scott Rowe" <krowe-+dJpgsE4VdE@public.gmane.org> wrote:> > We use Lustre 1.8.7. Our environment has many Lustre clients spread > out accross several networks. When an emergency happes, like a power > outage, where we need to quickly shutdown the Lustre servers we > frequently are unable to shutdown the clients first. I know that the > documentation recommends shutting down Lustre in this order: > > unmount clients > unmount MDT > unmount OSTs > > So my question is, what would the recommended procedure be if one > cannot shutdown all the clients first? Would it just be > > unmount MDT > unmount OSTs > > Or is there something else that should be done because we cannot get > the clients shutdown first? > > -- > K. Scott Rowe -- Linux Grouop Lead > Array Operations Center, National Radio Astronomy Observatory > krowe-+dJpgsE4VdE@public.gmane.org -- http://www.aoc.nrao.edu/~krowe/ > 1.575.835.7000 -- 1003 Lopezville Socorro, NM 87801 > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss