thr3ads.net - Lustre discuss - [Lustre-discuss] Solid State MDT [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Jordan Mendler

2009-Apr-09 22:07 UTC

[Lustre-discuss] Solid State MDT

Has anyone done any testing of modern SSD drives as an MDT for Lustre 1.6?
Searching through the archives it seems that most of the posts related to
SSD are either incomplete or slightly dated.

Does anyone have any input as to how they would compare to 15k RPM drives
and at what deployment size the metadata performance gain would become
noticeable? We are currently using Lustre as a small scratch space, and
initially deployed our MDT as a 4x7200 RPM SATA RAID10 internal to the MDS.
Metadata slow downs have become apparent during heavy use and/or small file
operations, so we are currently deliberating which upgrade path to take.

As of now, our deployment is pretty small:
4 OSS''s each with a 4x1TB RAID10 OST on disks internal to the OSS. Will
increase the number of these as the system grows.
~50 clients that read/write large files that are striped across all
OSS''s.
Will grow 2-4x in the next several months.
We are currently on GigE, but will be switching to DDR-4x IB very soon.

Thanks,
Jordan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090409/770ce5c9/attachment.html

Sarp Oral (oad)

2009-Apr-12 14:01 UTC

head link

[Lustre-discuss] Solid State MDT

We have played around with that idea in one way or the other for a few times
in the past. It didn?t seem to be cost effective.

We tried a RamSan device (300, if I am not mistaken) as a MDT, almost two
years ago. We compared the metadata rates of that with the ones we get from
our MDT on a DDN 9500 with write back cache on. DDN (simply a big cache with
a RAID 5 magnetic disk set behind it) turned out to be a more cost effective
solution for our installation and use cases.


We haven?t evaluated any SSDs as a MDT since then as far as I can remember.


Sarp


On 4/9/09 6:07 PM, "Jordan Mendler" <jmendler at ucla.edu>
wrote:
> Has anyone done any testing of modern SSD drives as an MDT for Lustre 1.6?
> Searching through the archives it seems that most of the posts related to
SSD
> are either incomplete or slightly dated.
> 
> Does anyone have any input as to how they would compare to 15k RPM drives
and
> at what deployment size the metadata performance gain would become
noticeable?
> We are currently using Lustre as a small scratch space, and initially
deployed
> our MDT as a 4x7200 RPM SATA RAID10 internal to the MDS. Metadata slow
downs
> have become apparent during heavy use and/or small file operations, so we
are
> currently deliberating which upgrade path to take.
> 
> As of now, our deployment is pretty small:
> 4 OSS''s each with a 4x1TB RAID10 OST on disks internal to the OSS.
Will
> increase the number of these as the system grows.
> ~50 clients that read/write large files that are striped across all
OSS''s.
> Will grow 2-4x in the next several months.
> We are currently on GigE, but will be switching to DDR-4x IB very soon.
> 
> Thanks,
> Jordan
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090412/cec2aa81/attachment.html

Paul Cote

2009-Apr-13 13:31 UTC

head link

[Lustre-discuss] Solid State MDT

Hello

 > We compared the metadata rates of that with the ones we get from our 
MDT on a DDN 9500 with write back cache on

This is interesting since the same topic was raised here ... I''m
curious
though about the details of your MDT on the DDN array... did you 
dedicated one full tierr which would, unfortunately, waste a lot of 
capacity? or just allocate relative small capacity (100GB?) over many 
tiers ... which would, in turn, compete for I/Os via OSTs. Any insight 
would be appreciated ... i''m looking for best practices to configure
the
MDT on the DDN.

thanks,
/pgc

Sarp Oral (oad) wrote:> We have played around with that idea in one way or the other for a few 
> times in the past. It didn?t seem to be cost effective.
>
> We tried a RamSan device (300, if I am not mistaken) as a MDT, almost 
> two years ago. We compared the metadata rates of that with the ones we 
> get from our MDT on a DDN 9500 with write back cache on. DDN (simply a 
> big cache with a RAID 5 magnetic disk set behind it) turned out to be 
> a more cost effective solution for our installation and use cases.
>
>
> We haven?t evaluated any SSDs as a MDT since then as far as I can 
> remember.
>
>
> Sarp
>
>
> On 4/9/09 6:07 PM, "Jordan Mendler" <jmendler at ucla.edu>
wrote:
>
>     Has anyone done any testing of modern SSD drives as an MDT for
>     Lustre 1.6? Searching through the archives it seems that most of
>     the posts related to SSD are either incomplete or slightly dated.
>
>     Does anyone have any input as to how they would compare to 15k RPM
>     drives and at what deployment size the metadata performance gain
>     would become noticeable? We are currently using Lustre as a small
>     scratch space, and initially deployed our MDT as a 4x7200 RPM SATA
>     RAID10 internal to the MDS. Metadata slow downs have become
>     apparent during heavy use and/or small file operations, so we are
>     currently deliberating which upgrade path to take.
>
>     As of now, our deployment is pretty small:
>     4 OSS''s each with a 4x1TB RAID10 OST on disks internal to the
OSS.
>     Will increase the number of these as the system grows.
>     ~50 clients that read/write large files that are striped across
>     all OSS''s. Will grow 2-4x in the next several months.
>     We are currently on GigE, but will be switching to DDR-4x IB very
>     soon.
>
>     Thanks,
>     Jordan
>
>    
------------------------------------------------------------------------
>     _______________________________________________
>     Lustre-discuss mailing list
>     Lustre-discuss at lists.lustre.org
>     http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Jim Garlick

2009-Apr-13 16:55 UTC

head link

[Lustre-discuss] Solid State MDT

We''ve done smoe experiments with software RAID10 on Intel X-25E solid
state disks, focused on getting 4K random iops up very high.

We were at first greatly encouraged by the fact that our current
production MDS''s based on a dedicate DDN 8500 multiple tiers of FC
disks
with write cache on (older dual socket Nacona node with 8G RAM) topped out
around 700 IOPS read, 150 IOPS write, while an array of 16 X25-E''s
(quad socket, quad core opteron with 32G RAM) hit 120K IOPS read, 64K
IOPS write.

Two things curbed our enthusiasm for SSD''s in the short term:
1) the fact that a 3ware + 32 15K RPM SAS disks based MDS that gets
approximately 6K IOPS read, 4K IOPS write on the same quad socket opteron
node type was "about twice as fast" when measured with mdtest
workloads
as the production DDN 8500 setup above.  If the disk were still the
bottleneck we should have seen around 10X, and in fact the node is a
lot faster too so the speedup may be attributable to that rather than
the backend disk

2) some quick tests of MDS create rates (through lustre now) on the SSD
and DDN hardware where we seemed to get about 2350 creates/sec no matter
what hardware we used, and posts from Oleg on this mailing list indicating
that tests utilizing loopback devices were only getting about 5300
creates/sec:

http://lists.lustre.org/pipermail/lustre-devel/2009-February/002940.html

So we''re going with the 3ware setup for newer file systems for now
and keeping the SSD config in our back pocket for further investigation.

Jim

On Mon, Apr 13, 2009 at 09:31:34AM -0400, Paul Cote
wrote:> Hello
> 
>  > We compared the metadata rates of that with the ones we get from our 
> MDT on a DDN 9500 with write back cache on
> 
> This is interesting since the same topic was raised here ... I''m
curious
> though about the details of your MDT on the DDN array... did you 
> dedicated one full tierr which would, unfortunately, waste a lot of 
> capacity? or just allocate relative small capacity (100GB?) over many 
> tiers ... which would, in turn, compete for I/Os via OSTs. Any insight 
> would be appreciated ... i''m looking for best practices to
configure the
> MDT on the DDN.
> 
> thanks,
> /pgc
> 
> Sarp Oral (oad) wrote:
> > We have played around with that idea in one way or the other for a few
> > times in the past. It didn?t seem to be cost effective.
> >
> > We tried a RamSan device (300, if I am not mistaken) as a MDT, almost 
> > two years ago. We compared the metadata rates of that with the ones we
> > get from our MDT on a DDN 9500 with write back cache on. DDN (simply a
> > big cache with a RAID 5 magnetic disk set behind it) turned out to be 
> > a more cost effective solution for our installation and use cases.
> >
> >
> > We haven?t evaluated any SSDs as a MDT since then as far as I can 
> > remember.
> >
> >
> > Sarp
> >
> >
> > On 4/9/09 6:07 PM, "Jordan Mendler" <jmendler at
ucla.edu> wrote:
> >
> >     Has anyone done any testing of modern SSD drives as an MDT for
> >     Lustre 1.6? Searching through the archives it seems that most of
> >     the posts related to SSD are either incomplete or slightly dated.
> >
> >     Does anyone have any input as to how they would compare to 15k RPM
> >     drives and at what deployment size the metadata performance gain
> >     would become noticeable? We are currently using Lustre as a small
> >     scratch space, and initially deployed our MDT as a 4x7200 RPM SATA
> >     RAID10 internal to the MDS. Metadata slow downs have become
> >     apparent during heavy use and/or small file operations, so we are
> >     currently deliberating which upgrade path to take.
> >
> >     As of now, our deployment is pretty small:
> >     4 OSS''s each with a 4x1TB RAID10 OST on disks internal to
the OSS.
> >     Will increase the number of these as the system grows.
> >     ~50 clients that read/write large files that are striped across
> >     all OSS''s. Will grow 2-4x in the next several months.
> >     We are currently on GigE, but will be switching to DDR-4x IB very
> >     soon.
> >
> >     Thanks,
> >     Jordan
> >
> >    
------------------------------------------------------------------------
> >     _______________________________________________
> >     Lustre-discuss mailing list
> >     Lustre-discuss at lists.lustre.org
> >     http:// lists.lustre.org/mailman/listinfo/lustre-discuss
> >
> >
------------------------------------------------------------------------
> >
> > _______________________________________________
> > Lustre-discuss mailing list
> > Lustre-discuss at lists.lustre.org
> > http:// lists.lustre.org/mailman/listinfo/lustre-discuss
> >   
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http:// lists.lustre.org/mailman/listinfo/lustre-discuss

Oleg Drokin

2009-Apr-13 22:46 UTC

head link

[Lustre-discuss] Solid State MDT

Hello!

On Apr 13, 2009, at 12:55 PM, Jim Garlick wrote:> 2) some quick tests of MDS create rates (through lustre now) on the  
> SSD
> and DDN hardware where we seemed to get about 2350 creates/sec no  
> matter
> what hardware we used, and posts from Oleg on this mailing list  
> indicating
> that tests utilizing loopback devices were only getting about 5300
> creates/sec:
Please note that 5300 that I got was for a single client! (btw, I hope  
you did not
use -y option to mdtest)
Since that time I had a chance to perform multi-client tests at ORNL  
(not ssd based,
but this is unimportant since we turned out to be cpu bound at a  
certain point anyway).
The result was 18k creates/sec for mkdirs (for some sort of 16 way  
fast cpus) (needs
patch from bug 18534 to reduce unneeded rpcs during create).
Actual open-creates would be slower right now to around 10k-12k  
creates/sec I would
estimate (assuming your OSTs can keep up with creationg, we do some  
investigations in
this area and also have found some problems in the mds-precreate- 
reqesting code already).
> http://lists.lustre.org/pipermail/lustre-devel/2009-February/002940.html
> So we''re going with the 3ware setup for newer file systems for now
> and keeping the SSD config in our back pocket for further  
> investigation.
When approaching 18k creates/sec (total) (first with 8 clients), in  
initial test there was
a big dive at 16 clients that turned out to be journal overflow and so  
the syncing
slowed everything. This should not be a concern for SSDs, though.
Since we did not have an ssd in our back pocket at the time, we just  
tried
2G ramdisk-based journal instead, and that allowed us to remain at 18k  
creates/sec
plateau scaling creating from 8 to 32 clients doing creates, at which  
point we seem to
be overflowing the journal again (i know this is counterintuitive  
given how
the rate is the same and journal just got bigger).

Bye,
     Oleg

Lustre discuss - Apr 2009 - Solid State MDT

[Lustre-discuss] Solid State MDT

[Lustre-discuss] Solid State MDT

[Lustre-discuss] Solid State MDT

[Lustre-discuss] Solid State MDT

[Lustre-discuss] Solid State MDT