Hi, the lustre manual says: 2.2.1.5 Stopping a Server To stop a server: $ umount -f /mnt/test/ost0 The ''-f'' flag means "force"; force the server to stop WITHOUT RECOVERY. Without the ''-f'' flag, "failover" is implied, meaning the next time the server is started it goes through the recovery procedure. So we were tempted to use "umount -f" when doing a failover of OSTs, but we see problems (I/O errors on clients) during the failover when doing this. Without the "-f" flag we get no I/O errors. Is there a recommended way of dealing with the umount at failover? Best regards, Erich
Nathaniel Rutman
2008-May-19 20:09 UTC
[Lustre-discuss] forced umount of OST in failover case?
Erich Focht wrote:> Hi, > > the lustre manual says: > > 2.2.1.5 Stopping a Server > To stop a server: > $ umount -f /mnt/test/ost0 > The ''-f'' flag means "force"; force the server to stop WITHOUT RECOVERY. > Without the ''-f'' flag, "failover" is > implied, meaning the next time the server is started it goes through the > recovery procedure. > > So we were tempted to use "umount -f" when doing a failover of OSTs, but we > see problems (I/O errors on clients) during the failover when doing this. > Without the "-f" flag we get no I/O errors. >yes. That is the difference between "forced" or not. Forced means stop with errors for clients, unforced means take more time and do recovery at restart.> Is there a recommended way of dealing with the umount at failover? >Don''t use -f> Best regards, > Erich > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Andreas Dilger
2008-May-20 05:59 UTC
[Lustre-discuss] forced umount of OST in failover case?
On May 19, 2008 13:09 -0700, Nathaniel Rutman wrote:> Erich Focht wrote: > > the lustre manual says: > > > > 2.2.1.5 Stopping a Server > > To stop a server: > > $ umount -f /mnt/test/ost0 > > The ''-f'' flag means "force"; force the server to stop WITHOUT RECOVERY. > > Without the ''-f'' flag, "failover" is > > implied, meaning the next time the server is started it goes through the > > recovery procedure. > > > > So we were tempted to use "umount -f" when doing a failover of OSTs, but we > > see problems (I/O errors on clients) during the failover when doing this. > > Without the "-f" flag we get no I/O errors. > > yes. That is the difference between "forced" or not. Forced means stop > with errors for clients, unforced means take more time and do recovery > at restart. > > Is there a recommended way of dealing with the umount at failover? > > > Don''t use -fPerhaps it makes sense to clarify the manual a bit? It doesn''t really make sense to have the manual specify "-f" as the default action, IMHO, since this isn''t what 99% of users or scripts will do. Something like: To stop a server: # umount /mnt/test/ost0 This preserves the state of the connected clients, and the next time the server is started it will wait for clients to reconnect and go through the recovery procedure. If the ''-f'' ("force") flag is given, the server will evict all clients and stop WITHOUT RECOVERY. The server will not wait for recovery upon restart. Any currently connected clients will get IO errors until they reconnect. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Thanks very much for the clarification! The improvement of the manual as proposed by Andreas makes things easier to understand. Best regards, Erich On Montag 19 Mai 2008, Nathaniel Rutman wrote:> Erich Focht wrote: > > Hi, > > > > the lustre manual says: > > > > 2.2.1.5 Stopping a Server > > To stop a server: > > $ umount -f /mnt/test/ost0 > > The ''-f'' flag means "force"; force the server to stop WITHOUT RECOVERY. > > Without the ''-f'' flag, "failover" is > > implied, meaning the next time the server is started it goes through the > > recovery procedure. > > > > So we were tempted to use "umount -f" when doing a failover of OSTs, but we > > see problems (I/O errors on clients) during the failover when doing this. > > Without the "-f" flag we get no I/O errors. > > > yes. That is the difference between "forced" or not. Forced means stop > with errors for clients, unforced means take more time and do recovery > at restart. > > Is there a recommended way of dealing with the umount at failover? > > > Don''t use -f > > Best regards, > > Erich
Sheila Barthel
2008-May-20 13:06 UTC
[Lustre-discuss] forced umount of OST in failover case?
Andreas - I just opened BZ 15854 to track this update to the Lustre manual. It will be included in the next edition. Sheila Andreas Dilger wrote:> On May 19, 2008 13:09 -0700, Nathaniel Rutman wrote: > >> Erich Focht wrote: >> >>> the lustre manual says: >>> >>> 2.2.1.5 Stopping a Server >>> To stop a server: >>> $ umount -f /mnt/test/ost0 >>> The ''-f'' flag means "force"; force the server to stop WITHOUT RECOVERY. >>> Without the ''-f'' flag, "failover" is >>> implied, meaning the next time the server is started it goes through the >>> recovery procedure. >>> >>> So we were tempted to use "umount -f" when doing a failover of OSTs, but we >>> see problems (I/O errors on clients) during the failover when doing this. >>> Without the "-f" flag we get no I/O errors. >>> >> >> yes. That is the difference between "forced" or not. Forced means stop >> with errors for clients, unforced means take more time and do recovery >> at restart. >> >>> Is there a recommended way of dealing with the umount at failover? >>> >>> >> Don''t use -f >> > > Perhaps it makes sense to clarify the manual a bit? It doesn''t really > make sense to have the manual specify "-f" as the default action, IMHO, > since this isn''t what 99% of users or scripts will do. > > Something like: > > To stop a server: > > # umount /mnt/test/ost0 > > This preserves the state of the connected clients, and the next time the > server is started it will wait for clients to reconnect and go through > the recovery procedure. > > If the ''-f'' ("force") flag is given, the server will evict all clients and > stop WITHOUT RECOVERY. The server will not wait for recovery upon restart. > Any currently connected clients will get IO errors until they reconnect. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >