thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre-discuss Digest, Vol 45, Issue 6 [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Dam Thanh Tung

2009-Oct-07 03:24 UTC

[Lustre-discuss] Lustre-discuss Digest, Vol 45, Issue 6

>
>
> Date: Mon, 5 Oct 2009 15:42:25 +0100
> From: pg_lus at lus.for.sabi.co.UK (Peter Grandi)
> Subject: Re: [Lustre-discuss] drbd slow I/O with lustre filesystem
> To: Lustre discussion <lustre-discuss at lists.Lustre.org>
> Message-ID: <19146.1489.532204.6707 at tree.ty.sabi.co.uk>
> Content-Type: text/plain; charset=us-ascii
>
> RAID5 over RAID1? Nahh. Consider http://WWW.BAARF.com/ and that
> the storage system of a Lustre pool over DRBD is ideally suited to
> RAID10 (with each pair a DRBD resource). RAID5 may be contributing
> to your speed problem below because of or being rebuilt/syncing
> itself.
>
> Poor me, i don''t know it before, so now we can''t change
anything on my raidpartition :( .

> > After formatting them with lustre format ( using mkfs.lustre ) ,
> > i start to copy data to my drbd devices, but:
>
> > - Its I/O wait when i monitor by top or iostat is too hight,
> > about 25%
>
> This is not much related to anything... After all you are doing a
> lot of IO, and jumping around on the disk, doing a restore.
>
Could you please tell me in detail what do you mean is ?  I don''t
really
understand it ?
>
> > - The copy speed from my web client to our OST using drbd
> > devices is too low, only about 13MB/s although client and ost in
> > is the same 1Gb Ethernet LAN.
>
> Too few details about this. Thigns to check:
>
> * Raw network speed: I like ''nuttcp'' to do check it.
Using the
>  usual trick (larger send/receive buffers, jumbo frames, ...) may
>  help if there are issues. But then you were getting 70MB/s above.
>    http://lists.centos.org/pipermail/centos/2009-July/079505.html
> * If you are using LVMN2 bad news.
>    http://archives.free.net.ph/message/20070815.091608.fff62ba9.en.html
> * Using RAID5 as argued above may be detrimental.
> * The DRBD must be configured to allow higher sync speeds:
>    http://www.ossramblings.com/drbd_defaults_too_slow
>
>
http://www.linux-ha.org/DRBD/FAQ#head-e09d2c15ba7ff691ecd5d5d7b848a50d25a3c3eb
>  Your initial sync however seemed to run at 70MB/s so
>  I wonder. Maybe tuning the "unplug" waterkmark in DRBD
>  or if you have battery backup enabling no-flush mode.
>    http://archives.free.net.ph/message/20081219.085301.997727d2.en.html
>
> > When i tried using one OST without drbd, it worked quite well
>
> It might mean that it is mainly a DRBD issue. You might want to
> get the latest DRBD versions, as some earlier versions. If you
> have RHEL the ElRepo has got fairly recent ones.
>
> > So, could any one please tell me where the problem is ? In our
> > drbd devices or because of lustre ? Is there anyone has the same
> > problem with me ? :(
>
> All of the above probably -- max performance here means ensuring
> that write requests are issued as fast as possible and back-to-back
> packets/blocks are then possible both on the network and on the
> storage system...
>
>  http://www.gossamer-threads.com/lists/drbd/users/17991
>  http://lists.linbit.com/pipermail/drbd-user/2007-August/007256.html
>  http://lists.linbit.com/pipermail/drbd-user/2009-January/011165.html
>  http://lists.linbit.com/pipermail/drbd-user/2009-January/011198.html
>
They are really great information, i checked it but will consider to using
some of them ( i.e some drbd options like no-disk-flushes, no-md-flushes ...
it''s maybe useful in speed tuning but i am not sure it won''t
affect to my
system stability )

Anyway, many thanks for all of them :)
>
> It may conceivably be quicker for you to load all your data first
> on the primary storage half of the pair, and then reactivate the
> secondary and let resync.
>
> I tried using that way but the speed increasing is not remarkable, about5-7MB

My impression is that a problem is unlikely to originate in
the> Lustre side, but more on the underlying layers mentioned above.
> There is a fair bit of material on DRBD optimization, both on its
> site, and more specifically around the MySQL community, where it
> is very commonly used, and they care a lot about performance.
>
>
> It''s also what i guessed, so i posted my questions to  both lustre
and drbdmailing list and luckily, i received some useful information and tips.

After all, many thanks for you detail answer. I''m really appreciated it
:)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091006/97ba065a/attachment.html

Andreas Dilger

2009-Oct-07 18:19 UTC

head link

[Lustre-discuss] Lustre-discuss Digest, Vol 45, Issue 6

On Oct 06, 2009  20:24 -0700, Dam Thanh Tung wrote:> > RAID5 over RAID1? Nahh. Consider http://WWW.BAARF.com/ and that
> > the storage system of a Lustre pool over DRBD is ideally suited to
> > RAID10 (with each pair a DRBD resource). RAID5 may be contributing
> > to your speed problem below because of or being rebuilt/syncing
> > itself.
>
> Poor me, i don''t know it before, so now we can''t change
anything on my
> raid partition :( .
It is documented in the Lustre manual that the MDS should be running
on RAID-1 or RAID-1+0.

I would suggest to shut down your MDS, make sure your remote DRBD copy
is up-to-date, then reformat the local storage into RAID-1+0, copy
the remote DRBD mirror back to the local system, and then reformat
the remote DRBD storage to RAID-1+0 also and copy it there.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Oct 2009 - Lustre-discuss Digest, Vol 45, Issue 6

[Lustre-discuss] Lustre-discuss Digest, Vol 45, Issue 6

[Lustre-discuss] Lustre-discuss Digest, Vol 45, Issue 6