thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre on DRBD couples? [May 2006]

If this information is useful, please help other people find it:
Share via:

Brent A Nelson

2006-May-19 07:36 UTC

[Lustre-discuss] Lustre on DRBD couples?

On Tue, 2 May 2006, Andreas Dilger wrote:
> On May 02, 2006  18:51 +0200, Alexander Jolk wrote:
>> I''d configure a pair of these OSSs with two RAID0 sets striped
across
>> all six disks, and form two DRBD volumes to export as OST.  For the
DRBD
>> interconnect I was planning on using a crossover ethernet cable with
>> jumbo frames; connection to the rest of the network is over the other
>> ethernet port with standard MTU.
>
> I''d recommend against RAID0, just because disk failure is by far
the
> most common failure mode.  You''ll have to resync the whole volume
for
> each disk failure, opening up the possibility of a double failure.
>
What if he reversed his scenario, using RAID0 on top of drbd (6 drbd 
pairs), essentially making a RAID10 setup? Similarly he could skip the 
RAID0, and have each drbd pair be a Lustre OST so that Lustre handles the 
striping...

Andreas Dilger

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre on DRBD couples?

On May 02, 2006  18:51 +0200, Alexander Jolk wrote:> I''d configure a pair of these OSSs with two RAID0 sets striped
across
> all six disks, and form two DRBD volumes to export as OST.  For the DRBD 
> interconnect I was planning on using a crossover ethernet cable with 
> jumbo frames; connection to the rest of the network is over the other 
> ethernet port with standard MTU.
I''d recommend against RAID0, just because disk failure is by far the
most common failure mode.  You''ll have to resync the whole volume for
each disk failure, opening up the possibility of a double failure.
> The MDS would be an identical pair, possibly with smaller SCSI disks 
> and/or RAID5 internally.
The MDS should have RAID1, since it is doing almost exclusively small
random IO.
> An LTO-2 backup server (using amanda) would be a lustre client; a few 
> OSSs would very possibly serve as additional amanda clients in order to 
> speed up the nightly runs.  (I''m particularly unsure about this
point.)
Running read-only clients on an OSS is believed to be safe (though not
yet fully supported), the problem with client-on-OSS is for the case
of potential deadlock under heavy write load.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Alexander Jolk

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre on DRBD couples?

Brent A Nelson wrote:> On Tue, 2 May 2006, Andreas Dilger wrote:
> 
>> On May 02, 2006  18:51 +0200, Alexander Jolk wrote:
>>
>>> I''d configure a pair of these OSSs with two RAID0 sets
striped across
>>> all six disks, and form two DRBD volumes to export as OST.  For the
DRBD
>>> interconnect I was planning on using a crossover ethernet cable
with
>>> jumbo frames; connection to the rest of the network is over the
other
>>> ethernet port with standard MTU.
>>
>>
>> I''d recommend against RAID0, just because disk failure is by
far the
>> most common failure mode.  You''ll have to resync the whole
volume for
>> each disk failure, opening up the possibility of a double failure.
>>
> 
> What if he reversed his scenario, using RAID0 on top of drbd (6 drbd 
> pairs), essentially making a RAID10 setup? Similarly he could skip the 
> RAID0, and have each drbd pair be a Lustre OST so that Lustre handles 
> the striping...
Sounds reasonable to me, thanks for the input.  Just to make sure I 
follow correctly, if I do 6 DRBD pairs, three of which are exported by 
each of the OSSs, what happens if one disk fails?  As long as the 
heartbeat between the nodes works, the other node won''t be tempted to 
stonith the first one?

Does anybody have an idea of the I/O bandwidth that I might reasonably 
hope to attain with this kind of setup?

Alex


-- 
Alexander Jolk  * BUF Compagnie * alexj@buf.com
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29

Brent A Nelson

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre on DRBD couples?

On Wed, 3 May 2006, Alexander Jolk wrote:
> Brent A Nelson wrote:
>> On Tue, 2 May 2006, Andreas Dilger wrote:
>> 
>>> On May 02, 2006  18:51 +0200, Alexander Jolk wrote:
>>> 
>>>> I''d configure a pair of these OSSs with two RAID0 sets
striped across
>>>> all six disks, and form two DRBD volumes to export as OST.  For
the DRBD
>>>> interconnect I was planning on using a crossover ethernet cable
with
>>>> jumbo frames; connection to the rest of the network is over the
other
>>>> ethernet port with standard MTU.
>>> 
>>> 
>>> I''d recommend against RAID0, just because disk failure is
by far the
>>> most common failure mode.  You''ll have to resync the whole
volume for
>>> each disk failure, opening up the possibility of a double failure.
>>> 
>> 
>> What if he reversed his scenario, using RAID0 on top of drbd (6 drbd 
>> pairs), essentially making a RAID10 setup? Similarly he could skip the 
>> RAID0, and have each drbd pair be a Lustre OST so that Lustre handles
the
>> striping...
>
> Sounds reasonable to me, thanks for the input.  Just to make sure I follow 
> correctly, if I do 6 DRBD pairs, three of which are exported by each of the
> OSSs, what happens if one disk fails?  As long as the heartbeat between the
> nodes works, the other node won''t be tempted to stonith the first
one?
>
> Does anybody have an idea of the I/O bandwidth that I might reasonably hope
> to attain with this kind of setup?
>
If you''re letting heartbeat handle stonith (which I''m not
certain is all
that necessary anymore with drbd, at least not as necessary as it used to 
be), the node wouldn''t be expected to miss heartbeating and
wouldn''t be
killed.  From a drbd point-of-view, I THINK it will be just like RAID1 (at 
least with appropriate drbd settings; drbd can be told to panic the whole 
node on disk error, which would trigger heartbeat failover): drbd would 
serve out the other drive in the pair (across the network from the other 
node).  From a Lustre point-of-view, nothing would have happened.

I really, really need to get around to testing this...

Thanks,

Brent

João Miguel Neves

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre on DRBD couples?

On a low-cost unreliable gigabit switch, 4 2-machine nodes with drbd 0.6
with 8 250GB SATA on each machine and lustre 1.2 we got around 170MB/s
(same switch used for drbd and lustre). Each machine has 1GB of RAM and
a PIV 2.4MHz

We have since upgraded the switch and are deploying drbd 0.7 with lustre
1.4. I have no data on performance so far.

Qua, 2006-05-03 ?s 16:35 +0200, Alexander Jolk escreveu:> Brent A Nelson wrote:
> > On Tue, 2 May 2006, Andreas Dilger wrote:
> > 
> >> On May 02, 2006  18:51 +0200, Alexander Jolk wrote:
> >>
> >>> I''d configure a pair of these OSSs with two RAID0
sets striped across
> >>> all six disks, and form two DRBD volumes to export as OST. 
For the DRBD
> >>> interconnect I was planning on using a crossover ethernet
cable with
> >>> jumbo frames; connection to the rest of the network is over
the other
> >>> ethernet port with standard MTU.
> >>
> >>
> >> I''d recommend against RAID0, just because disk failure is
by far the
> >> most common failure mode.  You''ll have to resync the
whole volume for
> >> each disk failure, opening up the possibility of a double failure.
> >>
> > 
> > What if he reversed his scenario, using RAID0 on top of drbd (6 drbd 
> > pairs), essentially making a RAID10 setup? Similarly he could skip the
> > RAID0, and have each drbd pair be a Lustre OST so that Lustre handles 
> > the striping...
> 
> Sounds reasonable to me, thanks for the input.  Just to make sure I 
> follow correctly, if I do 6 DRBD pairs, three of which are exported by 
> each of the OSSs, what happens if one disk fails?  As long as the 
> heartbeat between the nodes works, the other node won''t be tempted
to
> stonith the first one?
> 
> Does anybody have an idea of the I/O bandwidth that I might reasonably 
> hope to attain with this kind of setup?
> 
> Alex
> 
> -------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta =?ISO-8859-1?Q?=E9?= uma parte de mensagem
	assinada digitalmente
Url :
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20060503/992989b2/attachment.bin

Alexander Jolk

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre on DRBD couples?

Hi,

we are currently thinking about a new installation of lustre using pairs 
of mutually mirrored DRBDs.  I found an old mail by Jan Bruvoll from 
march 2004 where he described more or less exactly the same setup that 
I''m thinking about, but I haven''t seen much else on this
topic.

Let me describe quickly what we are planning to do; I''d very much like 
some feedback on my plans.  We are considering lustre because we want a 
single exported volume with more than 500MB/s I/O bandwidth; our current 
crop of NFS servers quite often saturates on one server while the others 
are idling.  We have about 6TB of online accessible storage, with about 
6 more to come, and some 350 client machines, all under Linux (Debian 
sarge).  We are a 24/7 operation, more or less.

Each OSS would be a Dell PowerEdge 2850 or 2650 server with dual Xeon 
3GHz and 1GB of RAM.  Six SCSI disks (five for PE 2650) 146GB internal, 
dual 1000Base-T ethernet.

I''d configure a pair of these OSSs with two RAID0 sets striped across 
all six disks, and form two DRBD volumes to export as OST.  For the DRBD 
interconnect I was planning on using a crossover ethernet cable with 
jumbo frames; connection to the rest of the network is over the other 
ethernet port with standard MTU.

(Rationale for these ideas: by striping across all disks, I get more 
bandwidth for a single OST; operations on different OSTs are less 
correlated than on the individual disks for one OST.

Using up one ethernet port for DRBD speeds up every single write 
operation; the remaining ethernet port should be almost sufficient for 
the typical bandwidth of a striped 6-disk RAID0.)

In normal operation mode, each of the two servers would export one 
volume and be DRBD-slave of the other; in case a server goes down, the 
other one takes over.

The MDS would be an identical pair, possibly with smaller SCSI disks 
and/or RAID5 internally.


We currently have 15 similar server machines that are each exporting an 
individual NFS volume.  We would be planning on integrating them 
progressively into the lustre volume, using Lustre 1.6.x.  The starting 
config would consist of 4 OST pairs.

The 350 client machines would all access the same OV.  Installing a new 
kernel on all of them is not a big problem; just installing a kernel 
module is even easier.  (We are using cfengine for the whole net.)

An LTO-2 backup server (using amanda) would be a lustre client; a few 
OSSs would very possibly serve as additional amanda clients in order to 
speed up the nightly runs.  (I''m particularly unsure about this point.)

Is anybody using this kind of setup, and do you think we are on the 
right track?  I have run a few tests with lustre, but I haven''t done 
failover yet.

Sorry for being so long-winded...

Alex


-- 
Alexander Jolk  * BUF Compagnie * alexj@buf.com
Tel +33-1 42 68 18 28  *  Fax +33-1 42 68 18 29

Lustre discuss - May 2006 - Lustre on DRBD couples?

[Lustre-discuss] Lustre on DRBD couples?

[Lustre-discuss] Lustre on DRBD couples?

[Lustre-discuss] Lustre on DRBD couples?

[Lustre-discuss] Lustre on DRBD couples?

[Lustre-discuss] Lustre on DRBD couples?

[Lustre-discuss] Lustre on DRBD couples?