thr3ads.net - Lustre discuss - [Lustre-discuss] lustre + drbd [May 2006]

If this information is useful, please help other people find it:
Share via:

Robert Read

2006-May-19 07:36 UTC

[Lustre-discuss] lustre + drbd

On Mar 2, 2004, at 06:05, Jan Bruvoll wrote:> Specifically, I am wondering about the following:
>
> - in the case of a master node disappearing, what is the correct 
> procedure for starting the OST on the slave? Anything in particular I 
> should think of?
The specific command depends on whether or not you are storing the 
config in LDAP or not.  If you are, then you''ll change the current 
active server for the OST with the lactive command, and then start 
lustre on the active server with a normal lconf invocation for that 
node, ie:

  lconf --node <node name> --config <config> --ldapurl ldap://url

If you are not using LDAP, then you''ll need to specify the active node 
on the lconf command line:

lconf --node <node name> --select <ost-service>=<node name>
config.xml

You need to be careful here lustre has completely stopped on the failed 
node before starting the second node. It''s a good idea to power off the
failed OST just to be sure it''s down.
> - is the case of an OST briefly disappearing from the network and then 
> reappearing, with all connections reset, a problem for the cluster, or 
> is this scenario already covered?
The clients will reconnect to the current active OST, so this should 
not be a problem.

robert

Jan Bruvoll

2006-May-19 07:36 UTC

head link

[Lustre-discuss] lustre + drbd

Hi Robert,

Robert Read wrote:
> The specific command depends on whether or not you are storing the 
> config in LDAP or not.  If you are, then you''ll change the current
> active server for the OST with the lactive command, and then start 
> lustre on the active server with a normal lconf invocation for that 
> node, ie:
>
>  lconf --node <node name> --config <config> --ldapurl
ldap://url
>
> If you are not using LDAP, then you''ll need to specify the active
node
> on the lconf command line:
>
> lconf --node <node name> --select <ost-service>=<node
name> config.xml
>
> You need to be careful here lustre has completely stopped on the 
> failed node before starting the second node. It''s a good idea to
power
> off the failed OST just to be sure it''s down.
Does this mean that I will have to alert all clients of the failed node,
ie. this would not be handled by the cluster itself?

Hmm - one thing I definitely should mention: my set-up is such that I
have, for each "unit" of storage, two machines mirrored using drbd.
These machines will also using heartbeat decide among themselves which
one of them is to serve on the IP address of the "storage unit", ie.
in
my case I have servers 172.16.3.1 and 172.16.3.2, but the lustre service
(OST) is to be found on IP 172.16.3.51. Will this confuse and/or
simplify things?

With regards to power-down, the good thing about drbd is that read-only
access is handled transparently to lustre - so I don''t -really- have to
know exactly what node is up or not. For cluster perspectives (lustre),
on the other hand, I should keep track of things I guess.
> The clients will reconnect to the current active OST, so this should 
> not be a problem.
Again, I guess somebody has to tell them?

Thanks for your help!
Jan

Paul Bryan

2006-May-19 07:36 UTC

head link

[Lustre-discuss] lustre + drbd

On Tue, Mar 02, 2004 at 02:05:38PM +0000, Jan Bruvoll
wrote:> Dear all,
> 
> I got in at the deep end and I am now trying to set up a cluster of 
> "non-stop" storage nodes using pairs of drbd-ed servers each
providing
> an OST to the cluster, all this without knowing too much about lustre. 
> I''m having fun, though!
> 
Sounds very much like what I''m doing. I''m at about the same
level as you at
the moment.
> 
> Ah, yes - my goal is to set up a storage cluster for a web farm, where 
> the storage requirement is quite high, probably reaching into 50Tb quite 
> soon. All storage needs to be accessible at all times, hence the use of 
> drbd (awaiting lustre-internal mirroring). The cluster will further down 
> the line be completed with an LVS setup to have all storage nodes, hot 
> or warm, also contribute with CPU cycles for Apaches, application 
> servers, etc.
> 
> If you would like me to contribute back in the form of a HOWTO when 
> everything''s running, please let me know.
This is similar to what I''m doing. Maybe we can help each other out and
work on
a HOWTO together? Anyone else interested in this? I think the current HOWTO,
while it gets you going, is lacking in detail. I''m about half-way
through the
lustre manual now and only just starting to get an idea of what''s going
on.

Cheers,
Paul.

Jan Bruvoll

2006-May-19 07:36 UTC

head link

[Lustre-discuss] lustre + drbd

Dear all,

I got in at the deep end and I am now trying to set up a cluster of 
"non-stop" storage nodes using pairs of drbd-ed servers each providing
an OST to the cluster, all this without knowing too much about lustre. 
I''m having fun, though!

However, before I start paddling, I have a couple of questions I am 
wondering if anybody has answers to. Maybe somebody already tried this - 
I am sorry if this has been covered here already, but I couldn''t find
it
in the archives.

Specifically, I am wondering about the following:

 - in the case of a master node disappearing, what is the correct 
procedure for starting the OST on the slave? Anything in particular I 
should think of?
 - is the case of an OST briefly disappearing from the network and then 
reappearing, with all connections reset, a problem for the cluster, or 
is this scenario already covered?

If you have any other pointers, those would be most appreciated.

Ah, yes - my goal is to set up a storage cluster for a web farm, where 
the storage requirement is quite high, probably reaching into 50Tb quite 
soon. All storage needs to be accessible at all times, hence the use of 
drbd (awaiting lustre-internal mirroring). The cluster will further down 
the line be completed with an LVS setup to have all storage nodes, hot 
or warm, also contribute with CPU cycles for Apaches, application 
servers, etc.

If you would like me to contribute back in the form of a HOWTO when 
everything''s running, please let me know.

Best regards
Jan

-- 
	

 Mr Jan Bruvoll      BRVL technology Ltd    Office: +44 7005 94 3430
 Managing Director   Unit 303               Fax:    +44 7005 93 8363
 jan@brvl.com        5 King Edward''s Road   Mobile: +44 7740 29 1600
 www.brvl.com        London E9 7SG, UK

Paul Bryan

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: Lustre + DRBD

cc''ed the list for anyone else who''s interested.
On Tue, Mar 23, 2004 at 12:47:45PM +0100, Sture Lygren
wrote:> Hello to both of you,
> 
> I saw your posts on the lustre-discuss list, and it seems like what you
> are doing is exactly what I try to implement for a disk-failover solution
> here.
> 
> Now to the question - have anyone of you had any luck so far? Could you
> let me know how you plan on implementing it? Maby we could help each other
> out?
> 
I''ve been thinking about it in a couple of ways. Ideally what
I''d like is
some sort of network RAID-5. The problem there I think is that everything has
to go through the RAID controller systems so you lose the performance benefit
of parallel I/O to the OST''s. I''m not sure how you''d
implement RAID-5 and
get that performance benefit which is essential in some applications.

So, lustre is the way I''ve picked to go instead. The main problem I
have with
lustre though is the shared back-end storage. You have redundancy on the
OST''s
but not the storage. So you use fibre-channel or some sort of RAID box or
whatever.

I don''t have enough money for some of these types of solutions and for
a few
reasons I''d like to do it using commodity hardware. The rationale here
is that
you''re not tied to any particular vendor and there are off-the-shelf
hardware
is everywhere. It''s also cheaper in many cases!

So, enter DRBD. My thoughts are to possibly do it the way jan is planning, 
though that seems to have limitations. Namely that an you have OST pairs 
sharing their drives via DRBD. One is a failover for the other, but lustre
also supports OST failover. In this case however, OST failover is not going to
be used as the two boxes appear to lustre as a single system, with the failover
controlled by some sort of heart beat and there is no OST failover by lustre
at all.

I''d rather have any OST available to failover for any OST. i.e. I
don''t want a
specific OST designated as a failover for a particular OST. So, we come back
to the concept of shared storage. Using DRBD I can have redundant backend 
storage on separate boxes to the OST''s. Then, I put several
OST''s in front of
each DRBD pair. Okay, it''s not quite what I was aiming for, but it
might work
until something better comes along (e.g. RAID-1 in lustre). 

So, essentially I''m using lustre with the shared storage model, but
cutting
costs by using DRBD instead of say a RAID box. I''m also using commodity
hardware to do it. I don''t think this is very scalable though. The
problem is
that to increase capacity, you need to set up a new DRBD system with more
OST''s in front of it. It would be better if you could add more DRBD
storage
dynamically to an existing pair. I''m not sure what the best way round
that is.
Maybe lustre will help there, maybe DRBD or some derivative will come along 
with a way to do it. Maybe it can already be done with PVFS or LVM or something
similar.

Sorry for the long response, but I was trying to get my thoughts arranged as
well as replying to your question.
> Appreciate your responses.
>
Waiting on yours too!
 > Best regards,
> Sture
> 
> -- 
> Sture Lygren
> Computer Systems Administrator
> Andoya Rocket Range
> Work: +4776144451 / Fax: +4776144401
Cheers,
Paul.

Jan Bruvoll

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Re: Lustre + DRBD

Sture Lygren wrote:
>Hello to both of you,
>
>I saw your posts on the lustre-discuss list, and it seems like what you
>are doing is exactly what I try to implement for a disk-failover solution
>here.
>
>Now to the question - have anyone of you had any luck so far? Could you
>let me know how you plan on implementing it? Maby we could help each other
>out?
>
>Appreciate your responses.
>
>Best regards,
>Sture
>
>  
>Hello Sture,

first of all - hello to my old town - I grew up in Andenes - left about 
20 years ago and sadly haven''t been back (yet)!

So, back to Lustre + DRBD: my setup now consists of 4 test nodes (all 
running within VMWare), where two and two are fail-over OSTs.

The setup is actually quite simple - the only thing I need to keep track 
of  is the DRBD failover, and all scripts are ready out of the box for 
that. On the active and passive OST I just run the same config - the 
fact that no requests come in on the passive node (because the clients 
go for the service IP, as handled by heartbeat), as well as DRBD making 
sure that no writes could ever take place on the  inactive node, takes 
care of the fail-over and STOMITH bit.

Setup detail:

node-1a: 172.16.3.1, owns a storage device of 500Mb
node-1b: 172.16.3.2, also owns a storage device of 500Mb

These two use heartbeat to agree on who provides the storage device 
(DRBD) service on ip address 172.16.3.51

node-2a: 172.16.254.1, storage device of 500Mb
node-2b: 172.16.254.2, storage device of 500Mb

Again, these two agree on the service on 172.16.254.51.

The two OSTs on 172.16.3.51 and 172.16.254.51 jointly provide a LOV of 
1Gb, and as far as I can see, this actually works quite well. Reads and 
writes are only slightly delayed when I pull the plug on the master 
node, and recoveries are quite smooth.

Next step is to do the same thing with the MDS - so far I''ve connected 
node-1a and node-2a and used the same heartbeat "trick", but for
obvious
reasons this needs looking at (mainly because the nodes are not in the 
same network). The cleanest solution would most probably be to just use 
node pair 1 for this, but that would again create a SPOF on the switch 
connecting node pair 1 to the rest of the network. For the final setup, 
that''s not so much a problem, since all nodes will be hanging off the 
same switch anyway.

Unfortunately, I still have a lot to learn about  Lustre itself, but it 
seems I''ve so far been able to avoid and/or conceal this problem by 
using my knowledge about supporting applications, technologies & approaches.

Hopefully this thing will work as well when scaled - this is going to 
support a web-based application with large-scale (at least in our 
understanding of "large") storage, ie. in the range of hopefully up to
50-60Tb.

Hope this helps - I''d be happy to help out if there''s anything
I can do.

Best regards
Jan

Lustre discuss - May 2006 - lustre + drbd

[Lustre-discuss] lustre + drbd

[Lustre-discuss] lustre + drbd

[Lustre-discuss] lustre + drbd

[Lustre-discuss] lustre + drbd

[Lustre-discuss] Re: Lustre + DRBD

[Lustre-discuss] Re: Lustre + DRBD