thr3ads.net - Lustre devel - [Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL [Jan 2007]

If this information is useful, please help other people find it:
Share via:

sss@cray.com

2007-Jan-08 09:13 UTC

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=10744



This bug reports performance problems: poor scaling performance on reads
and erratic and poor performance on shared-file I/O.  Alex and Peter
Braam have posted some comments but there have been no fixes.  I don''t
think this bug should be closed until the problems have been resolved.

sss@cray.com

2007-Jan-08 18:51 UTC

head link

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=10744



I think we need to know how the DDNs were misconfigured.  It is not very
helpful to say that the performance problems were caused by misconfiguration
without saying how it *should* be configured.  (It also seems pretty weak to
point to misconfiguration six months after the problem report.)

Are you saying that if we reconfigure the DDNs and re-run these tests then
the performance all will scale nicely?  (I''m thinking of moving to
Missouri.)

braam@clusterfs.com

2007-Jan-09 10:02 UTC

head link

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=10744



The precise settings for DDN 8500 arrays were published recently on the
lustre-devel mailing list, (with graphs).   Also our wiki was updated at the
same time:

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://mail.clusterfs.com/wikis/lustre/LustreDdnTuning

For DDN 9500 we are still discovering the settings and waiting on DDN.

nic@cray.com

2007-Jan-09 10:15 UTC

head link

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=10744



(In reply to comment #30)> The precise settings for DDN 8500 arrays were published recently on the
> lustre-devel mailing list, (with graphs).   Also our wiki was updated at
the
> same time:
> Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:> https://mail.clusterfs.com/wikis/lustre/LustreDdnTuning
As far as I can tell, those setting recommendations were made on the DDN 8500s
used at ORNL for these surveys. I fail to see a magic bullet in that wiki page
that would solve the performance issues we saw in this bug.

As I recall, we were told it was the reservation based allocator that would
solve the performance problems here. We''ve not seen that yet, or proved
the
performance improvement claims either, so I would want to leave this bug open as
well until we can verify any "fix".

scjody@clusterfs.com

2007-Jan-17 14:56 UTC

head link

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=10744



Created an attachment (id=9365)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
 --> (https://bugzilla.lustre.org/attachment.cgi?id=9365&action=view)
survey of various software RAID 0 methods

Here is a survey of various software RAID 0 methods.  Since current versions of

Lustre typically perform 1 MB IOs, I only plotted (and in most cases only
surveyed) this size.

As you can see, the results show modest improvements for reads with high       

								   region
counts, but not much improvement for lower counts.  There''s not much
performance
difference between LVM (DM) or MD RAID - LVM is faster			       

					       with low region counts but not
by
much.  Performance is also largely insensitive to chunk size.

Unfortunately, writes are a different story.  Both subsystems show consistently

worse write performance, even lower than with only one LUN.  Again, larger
chunk
sizes don''t improve things much.

I will discuss this issue with our IO specialists and see if there is a way to
improve write performance.

Lustre devel - Jan 2007 - [Bug 10744] ior surveys for Catamount at ORNL

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL

[Lustre-devel] [Bug 10744] ior surveys for Catamount at ORNL