Hi, when shutting down our OSS''s and then MDS''s we often wait 330s for each set of umount''s to finish eg. Feb 2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting for 330 secs... Feb 2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting for 325 secs... ... is there a way to speed this up? we''re interested in the (perhaps unusual) case where all clients are gone because the power has failed, and the Lustre servers are running on UPS and need to be shut down ASAP. the tangible reward for a quick shutdown is that we can buy a lower capacity (cheaper) UPS if we can reliably and cleanly shutdown all the Lustre servers in <10mins, and preferably <3 minutes. if we''re tweaking timeouts to do this then hopefully we can tweak them just before the shutdown and avoid running short timeouts in normal operation. I''m probably missing something obvious, but I have looked through a bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the Operations Manual and I can''t actually see where the default 330s comes from... ??? it seems to be quite repeatable for both OSS''s and MDS''s. we''re using Lustre 1.6.6 or 1.6.5.1 on servers and patchless 1.6.4.3 on clients with x86_64 RHEL 5.2 everywhere. thanks for any help! cheers, robin
If I''m not mistaken, "umount -f" will unmount your ost''s (in fact any lustre mount be it mds, mgs, ost, client) without delay. Ron Jerome National Reseach Council Canada> -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss- > bounces at lists.lustre.org] On Behalf Of Robin Humble > Sent: February 8, 2009 11:29 PM > To: lustre-discuss at lists.lustre.org > Subject: [Lustre-discuss] speedy server shutdown > > Hi, > > when shutting down our OSS''s and then MDS''s we often wait 330s foreach> set of umount''s to finish eg. > Feb 2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68refs,> waiting for 330 secs... > Feb 2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68refs,> waiting for 325 secs... > ... > is there a way to speed this up? > > we''re interested in the (perhaps unusual) case where all clients are > gone > because the power has failed, and the Lustre servers are running onUPS> and need to be shut down ASAP. > > the tangible reward for a quick shutdown is that we can buy a lower > capacity (cheaper) UPS if we can reliably and cleanly shutdown all the > Lustre servers in <10mins, and preferably <3 minutes. if we''retweaking> timeouts to do this then hopefully we can tweak them just before the > shutdown and avoid running short timeouts in normal operation. > > I''m probably missing something obvious, but I have looked through a > bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the > Operations Manual and I can''t actually see where the default 330scomes> from... ??? > it seems to be quite repeatable for both OSS''s and MDS''s. > > we''re using Lustre 1.6.6 or 1.6.5.1 on servers and patchless 1.6.4.3on> clients with x86_64 RHEL 5.2 everywhere. > thanks for any help! > > cheers, > robin > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Feb 08, 2009 23:28 -0500, Robin Humble wrote:> when shutting down our OSS''s and then MDS''s we often wait 330s for each > set of umount''s to finish eg. > Feb 2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting for 330 secs... > Feb 2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting for 325 secs... > ... > is there a way to speed this up?Please search bugzilla for this, I think there was a bug fixed in more recent versions.> we''re interested in the (perhaps unusual) case where all clients are gone > because the power has failed, and the Lustre servers are running on UPS > and need to be shut down ASAP. > > the tangible reward for a quick shutdown is that we can buy a lower > capacity (cheaper) UPS if we can reliably and cleanly shutdown all the > Lustre servers in <10mins, and preferably <3 minutes. if we''re tweaking > timeouts to do this then hopefully we can tweak them just before the > shutdown and avoid running short timeouts in normal operation. > > I''m probably missing something obvious, but I have looked through a > bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the > Operations Manual and I can''t actually see where the default 330s comes > from... ??? > it seems to be quite repeatable for both OSS''s and MDS''s. > > we''re using Lustre 1.6.6 or 1.6.5.1 on servers and patchless 1.6.4.3 on > clients with x86_64 RHEL 5.2 everywhere. > thanks for any help! > > cheers, > robin > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.