thr3ads.net - Lustre discuss - [Lustre-discuss] speedy server shutdown [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Robin Humble

2009-Feb-09 04:28 UTC

[Lustre-discuss] speedy server shutdown

Hi,

when shutting down our OSS''s and then MDS''s we often wait 330s
for each
set of umount''s to finish eg.
  Feb  2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting
for 330 secs...
  Feb  2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68 refs, waiting
for 325 secs...
  ...
is there a way to speed this up?

we''re interested in the (perhaps unusual) case where all clients are
gone
because the power has failed, and the Lustre servers are running on UPS
and need to be shut down ASAP.

the tangible reward for a quick shutdown is that we can buy a lower
capacity (cheaper) UPS if we can reliably and cleanly shutdown all the
Lustre servers in <10mins, and preferably <3 minutes. if we''re
tweaking
timeouts to do this then hopefully we can tweak them just before the
shutdown and avoid running short timeouts in normal operation.

I''m probably missing something obvious, but I have looked through a
bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the
Operations Manual and I can''t actually see where the default 330s comes
from... ???
it seems to be quite repeatable for both OSS''s and MDS''s.

we''re using Lustre 1.6.6 or 1.6.5.1 on servers and patchless 1.6.4.3 on
clients with x86_64 RHEL 5.2 everywhere.
thanks for any help!

cheers,
robin

Jerome, Ron

2009-Feb-09 14:17 UTC

head link

[Lustre-discuss] speedy server shutdown

If I''m not mistaken, "umount -f" will unmount your
ost''s (in fact any
lustre mount be it mds, mgs, ost, client) without delay.


Ron Jerome
National Reseach Council Canada

> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-
> bounces at lists.lustre.org] On Behalf Of Robin Humble
> Sent: February 8, 2009 11:29 PM
> To: lustre-discuss at lists.lustre.org
> Subject: [Lustre-discuss] speedy server shutdown
> 
> Hi,
> 
> when shutting down our OSS''s and then MDS''s we often wait
330s for
each> set of umount''s to finish eg.
>   Feb  2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68
refs,> waiting for 330 secs...
>   Feb  2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68
refs,> waiting for 325 secs...
>   ...
> is there a way to speed this up?
> 
> we''re interested in the (perhaps unusual) case where all clients
are
> gone
> because the power has failed, and the Lustre servers are running on
UPS> and need to be shut down ASAP.
> 
> the tangible reward for a quick shutdown is that we can buy a lower
> capacity (cheaper) UPS if we can reliably and cleanly shutdown all the
> Lustre servers in <10mins, and preferably <3 minutes. if
we''re
tweaking> timeouts to do this then hopefully we can tweak them just before the
> shutdown and avoid running short timeouts in normal operation.
> 
> I''m probably missing something obvious, but I have looked through
a
> bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the
> Operations Manual and I can''t actually see where the default 330s
comes> from... ???
> it seems to be quite repeatable for both OSS''s and MDS''s.
> 
> we''re using Lustre 1.6.6 or 1.6.5.1 on servers and patchless
1.6.4.3
on> clients with x86_64 RHEL 5.2 everywhere.
> thanks for any help!
> 
> cheers,
> robin
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2009-Feb-10 06:42 UTC

head link

[Lustre-discuss] speedy server shutdown

On Feb 08, 2009  23:28 -0500, Robin Humble wrote:> when shutting down our OSS''s and then MDS''s we often wait
330s for each
> set of umount''s to finish eg.
>   Feb  2 03:20:06 xemds2 kernel: Lustre: Mount still busy with 68 refs,
waiting for 330 secs...
>   Feb  2 03:20:11 xemds2 kernel: Lustre: Mount still busy with 68 refs,
waiting for 325 secs...
>   ...
> is there a way to speed this up?
Please search bugzilla for this, I think there was a bug fixed in more
recent versions.
> we''re interested in the (perhaps unusual) case where all clients
are gone
> because the power has failed, and the Lustre servers are running on UPS
> and need to be shut down ASAP.
> 
> the tangible reward for a quick shutdown is that we can buy a lower
> capacity (cheaper) UPS if we can reliably and cleanly shutdown all the
> Lustre servers in <10mins, and preferably <3 minutes. if
we''re tweaking
> timeouts to do this then hopefully we can tweak them just before the
> shutdown and avoid running short timeouts in normal operation.
> 
> I''m probably missing something obvious, but I have looked through
a
> bunch of /proc/{fs/lustre,sys/lnet,sys/lustre} entries and the
> Operations Manual and I can''t actually see where the default 330s
comes
> from... ???
> it seems to be quite repeatable for both OSS''s and MDS''s.
> 
> we''re using Lustre 1.6.6 or 1.6.5.1 on servers and patchless
1.6.4.3 on
> clients with x86_64 RHEL 5.2 everywhere.
> thanks for any help!
> 
> cheers,
> robin
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Feb 2009 - speedy server shutdown

[Lustre-discuss] speedy server shutdown

[Lustre-discuss] speedy server shutdown

[Lustre-discuss] speedy server shutdown