thr3ads.net - Lustre discuss - [Lustre-discuss] drbd slow I/O with lustre filesystem [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Dam Thanh Tung

2009-Oct-01 02:23 UTC

[Lustre-discuss] drbd slow I/O with lustre filesystem

Problems followed by problems, sorry if i am about to be a spammer in this
mailing list :)

I''m currently using drbd as a raid 1 network devices solution for my
lustre
storage system. It worked well during 3 months but now, when i have to
reformat our devices, re-synchronize drbd devices ( it took about 2 days
with 6TB raid-5 partition ). After formatting them with lustre format (
using mkfs.lustre ) , i start to copy data to my drbd devices, but:

- Its I/O wait when i monitor by top or iostat is too hight, about 25%

- The copy speed from my web client to our OST using drbd devices is too
low, only about 13MB/s although client and ost in is the same 1Gb Ethernet
LAN.


When i tried using one OST without drbd, it worked quite well

So, could any one please tell me where the problem is ? In our drbd devices
or because of lustre ? Is there anyone has the same problem with me ? :(
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091001/24d1ed6c/attachment.html

Jake Maul

2009-Oct-02 02:50 UTC

head link

[Lustre-discuss] drbd slow I/O with lustre filesystem

Just a guess, but possibly the DRBD syncer speed is too low? I forget
the syntax, but it''s in the manual... it can be set in drbd.conf or on
the command line... or maybe you''re using a different sync method now
than before (async -> sync)? Those are the first two things that come
to my mind.

Jake

On Wed, Sep 30, 2009 at 7:23 PM, Dam Thanh Tung <tungdt at isds.vn>
wrote:> Problems followed by problems, sorry if i am about to be a spammer in this
> mailing list :)
>
> I''m currently using drbd as a raid 1 network devices solution for
my lustre
> storage system. It worked well during 3 months but now, when i have to
> reformat our devices, re-synchronize drbd devices ( it took about 2 days
> with 6TB raid-5 partition ). After formatting them with lustre format (
> using mkfs.lustre ) , i start to copy data to my drbd devices, but:
>
> - Its I/O wait when i monitor by top or iostat is too hight, about 25%
>
> - The copy speed from my web client to our OST using drbd devices is too
> low, only about 13MB/s although client and ost in is the same 1Gb Ethernet
> LAN.
>
>
> When i tried using one OST without drbd, it worked quite well
>
> So, could any one please tell me where the problem is ? In our drbd devices
> or because of lustre ? Is there anyone has the same problem with me ? :(
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

Jake Maul

2009-Oct-02 21:03 UTC

head link

[Lustre-discuss] drbd slow I/O with lustre filesystem

Actually now that I think about it, I think the sync speed is for
*rebuilds*, and actually subtracts from available bandwidth usable for
normal replication. So if you''re saturating your disk or network with
the rebuild syncer, you won''t have much available performance for
normal replication. The DRBD documentation has a rule of thumb... I
think it''s 1/3rd of your bandwidth for the syncer.

Jake

On Thu, Oct 1, 2009 at 7:50 PM, Jake Maul <jakemaul at gmail.com>
wrote:> Just a guess, but possibly the DRBD syncer speed is too low? I forget
> the syntax, but it''s in the manual... it can be set in drbd.conf
or on
> the command line... or maybe you''re using a different sync method
now
> than before (async -> sync)? Those are the first two things that come
> to my mind.
>
> Jake
>
> On Wed, Sep 30, 2009 at 7:23 PM, Dam Thanh Tung <tungdt at isds.vn>
wrote:
>> Problems followed by problems, sorry if i am about to be a spammer in
this
>> mailing list :)
>>
>> I''m currently using drbd as a raid 1 network devices solution
for my lustre
>> storage system. It worked well during 3 months but now, when i have to
>> reformat our devices, re-synchronize drbd devices ( it took about 2
days
>> with 6TB raid-5 partition ). After formatting them with lustre format (
>> using mkfs.lustre ) , i start to copy data to my drbd devices, but:
>>
>> - Its I/O wait when i monitor by top or iostat is too hight, about 25%
>>
>> - The copy speed from my web client to our OST using drbd devices is
too
>> low, only about 13MB/s although client and ost in is the same 1Gb
Ethernet
>> LAN.
>>
>>
>> When i tried using one OST without drbd, it worked quite well
>>
>> So, could any one please tell me where the problem is ? In our drbd
devices
>> or because of lustre ? Is there anyone has the same problem with me ?
:(
>>
>>
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>

Peter Grandi

2009-Oct-05 14:42 UTC

head link

[Lustre-discuss] drbd slow I/O with lustre filesystem

> I''m currently using drbd as a raid 1 network devices solution
> for my lustre storage system. It worked well during 3 months
Well chosen, that is a popular and quite interesting setup; I
think that it should be the best/default Lustre setup if some
resilience is desired.
> but now, when i have to reformat our devices, re-synchronize
> drbd devices ( it took about 2 days with 6TB
Thats 70MB/s which is a bit low by any standards, but then it is
over a 1Gb/s link. Consider 10Gb/s.
> raid-5 partition ).
RAID5 over RAID1? Nahh. Consider http://WWW.BAARF.com/ and that
the storage system of a Lustre pool over DRBD is ideally suited to
RAID10 (with each pair a DRBD resource). RAID5 may be contributing
to your speed problem below because of or being rebuilt/syncing
itself.
> After formatting them with lustre format ( using mkfs.lustre ) ,
> i start to copy data to my drbd devices, but:
> - Its I/O wait when i monitor by top or iostat is too hight,
> about 25%
This is not much related to anything... After all you are doing a
lot of IO, and jumping around on the disk, doing a restore.
> - The copy speed from my web client to our OST using drbd
> devices is too low, only about 13MB/s although client and ost in
> is the same 1Gb Ethernet LAN.
Too few details about this. Thigns to check:

* Raw network speed: I like ''nuttcp'' to do check it. Using the
  usual trick (larger send/receive buffers, jumbo frames, ...) may
  help if there are issues. But then you were getting 70MB/s above.
    http://lists.centos.org/pipermail/centos/2009-July/079505.html
* If you are using LVMN2 bad news.
    http://archives.free.net.ph/message/20070815.091608.fff62ba9.en.html
* Using RAID5 as argued above may be detrimental.
* The DRBD must be configured to allow higher sync speeds:
    http://www.ossramblings.com/drbd_defaults_too_slow
   
http://www.linux-ha.org/DRBD/FAQ#head-e09d2c15ba7ff691ecd5d5d7b848a50d25a3c3eb
  Your initial sync however seemed to run at 70MB/s so
  I wonder. Maybe tuning the "unplug" waterkmark in DRBD
  or if you have battery backup enabling no-flush mode.
    http://archives.free.net.ph/message/20081219.085301.997727d2.en.html
> When i tried using one OST without drbd, it worked quite well
It might mean that it is mainly a DRBD issue. You might want to
get the latest DRBD versions, as some earlier versions. If you
have RHEL the ElRepo has got fairly recent ones.
> So, could any one please tell me where the problem is ? In our
> drbd devices or because of lustre ? Is there anyone has the same
> problem with me ? :(
All of the above probably -- max performance here means ensuring
that write requests are issued as fast as possible and back-to-back
packets/blocks are then possible both on the network and on the
storage system...

  http://www.gossamer-threads.com/lists/drbd/users/17991
  http://lists.linbit.com/pipermail/drbd-user/2007-August/007256.html
  http://lists.linbit.com/pipermail/drbd-user/2009-January/011165.html
  http://lists.linbit.com/pipermail/drbd-user/2009-January/011198.html

It may conceivably be quicker for you to load all your data first
on the primary storage half of the pair, and then reactivate the
secondary and let resync.

My impression is that a problem is unlikely to originate in the
Lustre side, but more on the underlying layers mentioned above.
There is a fair bit of material on DRBD optimization, both on its
site, and more specifically around the MySQL community, where it
is very commonly used, and they care a lot about performance.

Lustre discuss - Oct 2009 - drbd slow I/O with lustre filesystem

[Lustre-discuss] drbd slow I/O with lustre filesystem

[Lustre-discuss] drbd slow I/O with lustre filesystem

[Lustre-discuss] drbd slow I/O with lustre filesystem

[Lustre-discuss] drbd slow I/O with lustre filesystem