thr3ads.net - Lustre discuss - [Lustre-discuss] best practice for lustre clustre startup [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Lisa Giacchetti

2010-Jul-01 16:29 UTC

[Lustre-discuss] best practice for lustre clustre startup

Hello,
  I have recently installed a lustre cluster which is in a test phase 
now but will potentially be in 24x7 production if its accepted.
  I would like input from the list on what the recommendations/best 
practices are for configuration of a lustre cluster startup.
  Is it advisable to have lustre on the various server pieces 
(mgs/mdt/oss''s) start automatically? If not why not?
  If you try to start it and there is a very serious problem will it 
abort the startup or just continue on blindly?

  Again this is going to need to be a 24x7 service for a compute 
facility that which has global access (ie someone is always
  up and running something). We''d like  to be able to at least get the 
service back up in an automated way if at all possible and then debug
  problems when the support staff are awake/available.

  Lisa Giacchetti
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lisa.vcf
Type: text/x-vcard
Size: 275 bytes
Desc: not available
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100701/b287c83a/attachment.vcf

Kevin Van Maren

2010-Jul-01 17:17 UTC

head link

[Lustre-discuss] best practice for lustre clustre startup

My (personal) opinion:

Lustre clients should always start (mount) automatically.

Lustre servers should have their services started through heartbeat (or 
other HA package), if failover is possible (be sure to configure stonith).

If heartbeat starts automatically, do ensure auto-failback is NOT 
enabled: fail the resources back manually after you verify the rebooted 
server is healthy.
Whether heartbeat starts automatically seems to be a preference issue.

While unlikely, it is possible for an issue to cause Lustre to not start 
successfully, resulting in a node crash or other issue preventing a 
login.  So if it does start automatically you''ll want to be prepared to
reboot w/o Lustre (eg, single-user mode).

Kevin

Lisa Giacchetti wrote:> Hello,
>  I have recently installed a lustre cluster which is in a test phase 
> now but will potentially be in 24x7 production if its accepted.
>  I would like input from the list on what the recommendations/best 
> practices are for configuration of a lustre cluster startup.
>  Is it advisable to have lustre on the various server pieces 
> (mgs/mdt/oss''s) start automatically? If not why not?
>  If you try to start it and there is a very serious problem will it 
> abort the startup or just continue on blindly?
>
>  Again this is going to need to be a 24x7 service for a compute 
> facility that which has global access (ie someone is always
>  up and running something). We''d like  to be able to at least get
the
> service back up in an automated way if at all possible and then debug
>  problems when the support staff are awake/available.
>
>  Lisa Giacchetti
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Craig Prescott

2010-Jul-01 17:52 UTC

head link

[Lustre-discuss] best practice for lustre clustre startup

Hi Lisa;

We don''t start the services automatically on our servers.  We
don''t have
so many Lustre servers that this is a big problem (17 total), and it is 
pretty rare for one of them to go down unexpectedly.

If one of our Lustre server node does go down unexpectedly, we fsck the 
associated OSTs/MDT before starting up Lustre services again.  I think 
you will want to do the same.

We do the fsck from the command line and look at the output.  If there 
were no filesystem modifications (this is the usual case), we then start 
the Lustre services interactively.  If there were modifications from 
fsck, we''ll generally fsck it again and verify there were no further 
modifications.  If ''fsck -f -p'' fails, we''ll fsck
interactively or just
go whole hog and ''fsck -f -y''.

I imagine you could achieve an "automated startup following failure"
at
least most of the time with an init script that does an ''fsck -f
-p'' on
the associated OSTs/MDT if the node is coming back up from a crash or 
power outage.  If there aren''t any modifications made by fsck, your
init
script could mount the storage.  If ''fsck -f -p'' bails out,
you might
send out an "I need help" email or something.

Cheers,
Craig Prescott
UF HPC Center

We once ran a cluster with lustre
We bought from a guy named Buster
It ran for a year with nary a tear
A complaint we could not muster

Lisa Giacchetti wrote:> Hello,
>  I have recently installed a lustre cluster which is in a test phase now 
> but will potentially be in 24x7 production if its accepted.
>  I would like input from the list on what the recommendations/best 
> practices are for configuration of a lustre cluster startup.
>  Is it advisable to have lustre on the various server pieces 
> (mgs/mdt/oss''s) start automatically? If not why not?
>  If you try to start it and there is a very serious problem will it 
> abort the startup or just continue on blindly?
> 
>  Again this is going to need to be a 24x7 service for a compute facility 
> that which has global access (ie someone is always
>  up and running something). We''d like  to be able to at least get
the
> service back up in an automated way if at all possible and then debug
>  problems when the support staff are awake/available.
> 
>  Lisa Giacchetti
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Robin Humble

2010-Jul-01 18:21 UTC

head link

[Lustre-discuss] best practice for lustre clustre startup

On Thu, Jul 01, 2010 at 11:17:31AM -0600, Kevin Van Maren
wrote:>My (personal) opinion:
>
>Lustre clients should always start (mount) automatically.
yup
>Lustre servers should have their services started through heartbeat (or 
>other HA package), if failover is possible (be sure to configure stonith).
IMHO that''s a bad idea. servers should not start automatically.

my objections to automated mount/failover are not Lustre related, but
to all layers underneath - as Kevin well knows, mptsas drivers can and
do and have screwed up majorly and I''m sure other drivers have too. md
is far from smart, and disks are broken in such an infinite amount of
weird and wonderful ways that no driver or OS can reasonably be
expected to deal with them all :-/

if you have the simple setup of singly-attached storage and a Lustre
server just crashed, then why wouldn''t it just crash again? we have had
that happen. automated startup seems silly in this case - especially if
you don''t know what the problem was to start with. worst case is if the
hardware started corrupting data and crashed the machine, is it really
a good idea to reboot, remount, continue corrupting data more, and then
keep rebooting until dawn?

if you have a more elaborate Lustre setup with HA failover pairs then
the above applies, and additionally there are inherent races in both
nodes in a pair trying to mount a set of disks if you do not have a
third impartial member participating in a failover chorum - not a
common HA setup for Lustre, although it probably should be.
if a sw raid is assembled on both machines at the same time because of
a HA race, then it''s likely data will be lost. Lustre mmp should save
you from multi-mounting the OST, but obviously not from corruption if
the underlying raid is pre-trashed.

overall without diagnosing why a machine crashed I fail to see how an
automated reboot or failover can possibly be a safe course of action.

cheers,
robin
>If heartbeat starts automatically, do ensure auto-failback is NOT 
>enabled: fail the resources back manually after you verify the rebooted 
>server is healthy.
>Whether heartbeat starts automatically seems to be a preference issue.
>
>While unlikely, it is possible for an issue to cause Lustre to not start 
>successfully, resulting in a node crash or other issue preventing a 
>login.  So if it does start automatically you''ll want to be
prepared to
>reboot w/o Lustre (eg, single-user mode).
>
>Kevin
>
>

Andreas Dilger

2010-Jul-02 07:01 UTC

head link

[Lustre-discuss] best practice for lustre clustre startup

On 2010-07-01, at 11:52, Craig Prescott <prescott at hpc.ufl.edu>
wrote:> We do the fsck from the command line and look at the output.  If there 
> were no filesystem modifications (this is the usual case), we then start 
> the Lustre services interactively.  
Note that if you are not running with writeback cache enabled on the disks, then
you shouldn''t have to run an fsck on the filesystems after a crash.
That should only be needed if the storage is faulty, or if it is using writeback
cache without mirroring and battery backup.
> If there were modifications from 
> fsck, we''ll generally fsck it again and verify there were no
further
> modifications.  If ''fsck -f -p'' fails, we''ll
fsck interactively or just
> go whole hog and ''fsck -f -y''.
It''s always a good idea to run fsck in a manner that logs the output,
either under ''script'' or similar tool.
> I imagine you could achieve an "automated startup following
failure" at
> least most of the time with an init script that does an ''fsck -f
-p'' on
> the associated OSTs/MDT if the node is coming back up from a crash or 
> power outage.  
Note that if you do this you should run fsck under the control of the HA
manager, to avoid both nodes running fsck at the same time. The Lustre-patched
e2fsck will refuse to do this if you have mmp enabled (which is done
automatically if the Lustre filesystems are formatted with failover enabled, but
can also be enabled manually afterward.

Also note that if you are using software  RAID or LVM that it should also only
be configured under the control of the HA manager.
> We once ran a cluster with lustre
> We bought from a guy named Buster
> It ran for a year
> with nary a tear
> A complaint we could not muster
Awesome. :-)
> Lisa Giacchetti wrote:
>> Hello,
>> I have recently installed a lustre cluster which is in a test phase now
>> but will potentially be in 24x7 production if its accepted.
>> I would like input from the list on what the recommendations/best 
>> practices are for configuration of a lustre cluster startup.
>> Is it advisable to have lustre on the various server pieces 
>> (mgs/mdt/oss''s) start automatically? If not why not?
>> If you try to start it and there is a very serious problem will it 
>> abort the startup or just continue on blindly?
>> 
>> Again this is going to need to be a 24x7 service for a compute facility
>> that which has global access (ie someone is always
>> up and running something). We''d like  to be able to at least
get the
>> service back up in an automated way if at all possible and then debug
>> problems when the support staff are awake/available.
>> 
>> Lisa Giacchetti
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Peter Grandi

2010-Jul-03 21:02 UTC

head link

[Lustre-discuss] best practice for lustre clustre startup

[ ... ]
>> We do the fsck from the command line and look at the output.
>> If there were no filesystem modifications (this is the usual
>> case), we then start the Lustre services interactively.
> Note that if you are not running with writeback cache enabled
> on the disks, then you shouldn''t have to run an fsck on the
> filesystems after a crash.
This seems to me extremely bad advice, based on these rather
extraordinarily optimistic assumptions:
> That should only be needed if the storage is faulty, or if it
> is using writeback cache without mirroring and battery backup.
This reminds me of the immortal statement "as far as we know in
our datacenter we never had an undetected error".

How do you know whether "storage is faulty" or many of the other
reaosn why metadata can get corrupted never happened?

''fsck'' does metadata auditing and garbage collection and a
full
scan, at least every now and then, is essential to give some
confidence that no hidden problem has been eating the metadata.

And if there is a way to at least sample check data integrity
(e.g. run ''gzip -t'' on a subset of compressed files) I would
run
that periodically too. Experience with storage systems induces
distrusts, never mind CERN''s experiences:

  http://storagemojo.com/2007/09/19/cerns-data-corruption-research/

Admittedly "happy go lucky", as the investment banks have shown
in the past several years with derivaties, can be a profitable
strategy (until it blows up :->).

[ ... ]

Andreas Dilger

2010-Jul-04 05:00 UTC

head link

[Lustre-discuss] best practice for lustre clustre startup

On 2010-07-03, at 15:02, pg_lus at lus.for.sabi.co.UK
wrote:>> Note that if you are not running with writeback cache enabled
>> on the disks, then you shouldn''t have to run an fsck on the
>> filesystems after a crash.
> 
> This seems to me extremely bad advice, based on these rather
> extraordinarily optimistic assumptions:
> 
>> That should only be needed if the storage is faulty, or if it
>> is using writeback cache without mirroring and battery backup.
> 
> This reminds me of the immortal statement "as far as we know in
> our datacenter we never had an undetected error".
I think my record speaks for itself in terms of advocating running fsck on
filesystems on a regular basis. I think you are making assumptions about what my
statement says or does not say. What it says is that you shouldn''t need
to run fsck after a crash, if this wasn''t involving e.g. RAID
controller failure or the loss of writeback cache.

It doesn''t say that you should never run fsck, and in fact I always
recommend a full fsck in case on RAID failure or if the filesystem has detected
inconsistencies.

My point was that if there are uptime requirements that running a full fsck
after an unplanned outage of one node  is probably a bad use of time. It would
be better to run a full fsck on ALL of the filesystems during scheduled
maintenance windows, since they can be run in parallel and wouldn''t
take longer than a single node.

I have also written the lvcheck tool to run fsck on LVM snapshots via cron on a
regular basis so that you don''t need to wait for a crash before
validating whether your hardware is faulty.
> a full scan, at least every now and then, is essential to give some
> confidence that no hidden problem has been eating the metadata.
I''ve been a staunch advocate among the ext4 developers for keeping the 
periodic fsck at mount time to catch those places that never fsck on their own.
If that bothers people because of the unexpected delay in startup, I point them
at the script so they can check the snapshot and reset the fsck counters before
they expire.

Possibly Parallel Threads

Search for more maybe matching threads

Lustre discuss - Jul 2010 - best practice for lustre clustre startup

[Lustre-discuss] best practice for lustre clustre startup

[Lustre-discuss] best practice for lustre clustre startup

[Lustre-discuss] best practice for lustre clustre startup

[Lustre-discuss] best practice for lustre clustre startup

[Lustre-discuss] best practice for lustre clustre startup

[Lustre-discuss] best practice for lustre clustre startup

[Lustre-discuss] best practice for lustre clustre startup

Possibly Parallel Threads