thr3ads.net - Lustre discuss - [Lustre-discuss] how to define 60 failnodes [Nov 2009]

If this information is useful, please help other people find it:
Share via:

lelustre

2009-Nov-09 08:25 UTC

[Lustre-discuss] how to define 60 failnodes

I have 60 nodes to use as OSS ?I have made an experiment : I use a disk?iscsi?
to be an OST ? if I do not define the failnodes when I use the mkfs.luster
command, I mount this ost and at the client node ,lfs df -h can see this OST,
but when I umount it and mount the ost to another OSS ?
lfs df -h can not see it again. But if I define the failnodes in the mkfs.luster
command, and do the operations above, we can see the OST at the client node
using lfs df -h command.
    So my question is :if I want an OST to failover to any OSS (one of sixty
nodes),should I need to defined 60 failnodes when I format the disk?or can I use
pacemaker to select an oss and modify something to notify client that the disk
is on some OSS?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091109/a5cbf7ab/attachment.html

Brian J. Murrell

2009-Nov-09 15:31 UTC

head link

[Lustre-discuss] how to define 60 failnodes

On Mon, 2009-11-09 at 16:25 +0800, lelustre wrote: >     I have 60 nodes to use as OSS ?I have made an experiment : I use
> a disk?iscsi? to be an OST ? if I do not define the failnodes when
> I use the mkfs.luster command, I mount this ost and at the client
> node ,lfs df -h can see this OST, but when I umount it and mount the
> ost to another OSS ?
> lfs df -h can not see it again.
Right.  Because you have to tell the client which other nodes might make
that OST available so that it can find the one actually making it
available.  If you don''t give the client any alternate nodes, it
doesn''t
know other nodes and doesn''t try any but the one node the OST was
configured on.
> But if I define the failnodes in the mkfs.luster command, and do the
> operations above, we can see the OST at the client node using lfs df
> -h command. 
Right.
>     So my question is :if I want an OST to failover to any OSS (one of
> sixty nodes),should I need to defined 60 failnodes when I format the
> disk?
Theoretically.  I had discussed this briefly with another engineer a
while ago and IIRC, the result of the discussion was that there was
nothing inherent in the configuration logic that would prevent one from
having more than two ("primary" and "failover") OSSes
providing service
to an OST.  Two nodes per OST is how just about everyone that wants
failover configures Lustre.

I''m not really sure that 60 nodes for every OST is really practical
though.  When an OSS does fail, the process of finding the OST on a
failover node is serial and linear.  That is, when the OSS providing an
OST dies, the client cycles through the OST''s failover list trying each
OSS, serially, until it finds the OST.  The time given to each discovery
attempt is not trivial (i.e. a few seconds or less) so hunting through
60 of them will take considerable time.
> or can I use pacemaker to select an oss and modify something to notify
> client that the disk is on some OSS?
No.  There is currently no way to push a client towards an OSS for a
given OST.

b.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091109/317f8ca1/attachment.bin

Bernd Schubert

2009-Nov-09 15:36 UTC

head link

[Lustre-discuss] how to define 60 failnodes

On Monday 09 November 2009, Brian J. Murrell wrote:> 
> Theoretically.  I had discussed this briefly with another engineer a
> while ago and IIRC, the result of the discussion was that there was
> nothing inherent in the configuration logic that would prevent one from
> having more than two ("primary" and "failover") OSSes
providing service
> to an OST.  Two nodes per OST is how just about everyone that wants
> failover configures Lustre.
Not everyone ;) And especially it doesn''t make sense to have a 2 node
failover
scheme with pacemaker:

https://bugzilla.lustre.org/show_bug.cgi?id=20964



-- 
Bernd Schubert
DataDirect Networks

Michael Schwartzkopff

2009-Nov-09 15:44 UTC

head link

[Lustre-discuss] how to define 60 failnodes

Am Montag, 9. November 2009 16:36:15 schrieb Bernd
Schubert:> On Monday 09 November 2009, Brian J. Murrell wrote:
> > Theoretically.  I had discussed this briefly with another engineer a
> > while ago and IIRC, the result of the discussion was that there was
> > nothing inherent in the configuration logic that would prevent one
from
> > having more than two ("primary" and "failover")
OSSes providing service
> > to an OST.  Two nodes per OST is how just about everyone that wants
> > failover configures Lustre.
>
> Not everyone ;) And especially it doesn''t make sense to have a 2
node
> failover scheme with pacemaker:
>
> https://bugzilla.lustre.org/show_bug.cgi?id=20964
the problem is that pacemaker does not understand about the applications it 
does cluster. pacemaker is made to provide high availability for ANY service, 
not only for a cluster FS.

So if you want to pin some resources (i.e. FS1) to a special node, you have to 
add a location constraint. But this contradicts the logic of pacemaker a 
little bit. Why should a resource run on this node, if all nodes are equal?

Basically I had the same problem with my lustre cluster I had the following 
solution:

- make colocation constratins so that filesystems do not like to run in the 
same node.

And theoretically with openais as a cluster stack the number of nodes is not 
limited to 16 any more like in heartbeat. You can build larger clusters.

Greetings,

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: misch at multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht M?nchen HRB 114375
Gesch?ftsf?hrer: G?nter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42

Andreas Dilger

2009-Nov-09 18:09 UTC

head link

[Lustre-discuss] how to define 60 failnodes

On 2009-11-09, at 08:31, Brian J. Murrell wrote:> On Mon, 2009-11-09 at 16:25 +0800, lelustre wrote:
>> So my question is :if I want an OST to failover to any OSS (one of
>> sixty nodes),should I need to defined 60 failnodes when I format the
>> disk?
>
> I''m not really sure that 60 nodes for every OST is really
practical
> though.  When an OSS does fail, the process of finding the OST on a
> failover node is serial and linear.  That is, when the OSS providing  
> an
> OST dies, the client cycles through the OST''s failover list trying
> each
> OSS, serially, until it finds the OST.  The time given to each  
> discovery
> attempt is not trivial (i.e. a few seconds or less) so hunting through
> 60 of them will take considerable time.
>
>> or can I use pacemaker to select an oss and modify something to  
>> notify
>> client that the disk is on some OSS?
>
> No.  There is currently no way to push a client towards an OSS for a
> given OST.

That is what the "Imperative Recovery" feature is - having the
failover
server notify the client that it has taken over an OST/MDT filesystem,
rather than waiting for the client to time out its RPC and poke around
trying to find which of the failover servers is controlling the OST/MDT.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

lelustre

2009-Nov-10 06:13 UTC

head link

[Lustre-discuss] how to define 60 failnodes

Hi,
First,thanks everyone.
I have thought an idea: I use pacemaker for HA, and iscsi method to find SAN
disk.The service of HA is when an OSS fail, pacemaker select another OSS, and
resource agent script on selected OSS can discovery the OST disk and mount it to
an directory, then I can use #MDT> tunefs.lustre   --writeconf <mount
point> using pdsh in the script (Lustre manual : changing a server NID), so
the client can know where the OST is.
But I really do not know if writeconf is damage to the data or the fs? I have
not test the idea now, it is only an idea
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091110/7014414a/attachment.html

Brian J. Murrell

2009-Nov-10 16:15 UTC

head link

[Lustre-discuss] how to define 60 failnodes

On Tue, 2009-11-10 at 14:13 +0800, lelustre wrote: > Hi,
Hi,
> The service of HA is when an OSS fail, pacemaker select another OSS,
> and resource agent script on selected OSS can discovery the OST disk
> and mount it to an directory,
Only one of the OSSes which have been configured as the failover servers
for the OST should mount the OST, or the clients won''t be able to find
it.
> then I can use #MDT> tunefs.lustre   --writeconf <mount point>
using
> pdsh in the script (Lustre manual : changing a server NID), so the
> client can know where the OST is.
No.  DO NOT do this.  Please don''t try to re-invent how Lustre failover
works.
> But I really do not know if writeconf is damage to the data or the fs?
You should not use writeconf in this manner.  I believe the instructions
to which you are referring (changing a server NID) explicitly says that
you must shut down the entire filesystem before you do any writeconfs
and then you must bring the servers all back up before you bring any
clients up.

This is a lot more traumatic to the users than simply configuration
failover the way it''s supposed to work.

b.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091110/94db0220/attachment.bin

Lustre discuss - Nov 2009 - how to define 60 failnodes

[Lustre-discuss] how to define 60 failnodes

[Lustre-discuss] how to define 60 failnodes

[Lustre-discuss] how to define 60 failnodes

[Lustre-discuss] how to define 60 failnodes

[Lustre-discuss] how to define 60 failnodes

[Lustre-discuss] how to define 60 failnodes

[Lustre-discuss] how to define 60 failnodes