Has anyone done any testing of modern SSD drives as an MDT for Lustre 1.6? Searching through the archives it seems that most of the posts related to SSD are either incomplete or slightly dated. Does anyone have any input as to how they would compare to 15k RPM drives and at what deployment size the metadata performance gain would become noticeable? We are currently using Lustre as a small scratch space, and initially deployed our MDT as a 4x7200 RPM SATA RAID10 internal to the MDS. Metadata slow downs have become apparent during heavy use and/or small file operations, so we are currently deliberating which upgrade path to take. As of now, our deployment is pretty small: 4 OSS''s each with a 4x1TB RAID10 OST on disks internal to the OSS. Will increase the number of these as the system grows. ~50 clients that read/write large files that are striped across all OSS''s. Will grow 2-4x in the next several months. We are currently on GigE, but will be switching to DDR-4x IB very soon. Thanks, Jordan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090409/770ce5c9/attachment.html
We have played around with that idea in one way or the other for a few times in the past. It didn?t seem to be cost effective. We tried a RamSan device (300, if I am not mistaken) as a MDT, almost two years ago. We compared the metadata rates of that with the ones we get from our MDT on a DDN 9500 with write back cache on. DDN (simply a big cache with a RAID 5 magnetic disk set behind it) turned out to be a more cost effective solution for our installation and use cases. We haven?t evaluated any SSDs as a MDT since then as far as I can remember. Sarp On 4/9/09 6:07 PM, "Jordan Mendler" <jmendler at ucla.edu> wrote:> Has anyone done any testing of modern SSD drives as an MDT for Lustre 1.6? > Searching through the archives it seems that most of the posts related to SSD > are either incomplete or slightly dated. > > Does anyone have any input as to how they would compare to 15k RPM drives and > at what deployment size the metadata performance gain would become noticeable? > We are currently using Lustre as a small scratch space, and initially deployed > our MDT as a 4x7200 RPM SATA RAID10 internal to the MDS. Metadata slow downs > have become apparent during heavy use and/or small file operations, so we are > currently deliberating which upgrade path to take. > > As of now, our deployment is pretty small: > 4 OSS''s each with a 4x1TB RAID10 OST on disks internal to the OSS. Will > increase the number of these as the system grows. > ~50 clients that read/write large files that are striped across all OSS''s. > Will grow 2-4x in the next several months. > We are currently on GigE, but will be switching to DDR-4x IB very soon. > > Thanks, > Jordan > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090412/cec2aa81/attachment.html
Hello > We compared the metadata rates of that with the ones we get from our MDT on a DDN 9500 with write back cache on This is interesting since the same topic was raised here ... I''m curious though about the details of your MDT on the DDN array... did you dedicated one full tierr which would, unfortunately, waste a lot of capacity? or just allocate relative small capacity (100GB?) over many tiers ... which would, in turn, compete for I/Os via OSTs. Any insight would be appreciated ... i''m looking for best practices to configure the MDT on the DDN. thanks, /pgc Sarp Oral (oad) wrote:> We have played around with that idea in one way or the other for a few > times in the past. It didn?t seem to be cost effective. > > We tried a RamSan device (300, if I am not mistaken) as a MDT, almost > two years ago. We compared the metadata rates of that with the ones we > get from our MDT on a DDN 9500 with write back cache on. DDN (simply a > big cache with a RAID 5 magnetic disk set behind it) turned out to be > a more cost effective solution for our installation and use cases. > > > We haven?t evaluated any SSDs as a MDT since then as far as I can > remember. > > > Sarp > > > On 4/9/09 6:07 PM, "Jordan Mendler" <jmendler at ucla.edu> wrote: > > Has anyone done any testing of modern SSD drives as an MDT for > Lustre 1.6? Searching through the archives it seems that most of > the posts related to SSD are either incomplete or slightly dated. > > Does anyone have any input as to how they would compare to 15k RPM > drives and at what deployment size the metadata performance gain > would become noticeable? We are currently using Lustre as a small > scratch space, and initially deployed our MDT as a 4x7200 RPM SATA > RAID10 internal to the MDS. Metadata slow downs have become > apparent during heavy use and/or small file operations, so we are > currently deliberating which upgrade path to take. > > As of now, our deployment is pretty small: > 4 OSS''s each with a 4x1TB RAID10 OST on disks internal to the OSS. > Will increase the number of these as the system grows. > ~50 clients that read/write large files that are striped across > all OSS''s. Will grow 2-4x in the next several months. > We are currently on GigE, but will be switching to DDR-4x IB very > soon. > > Thanks, > Jordan > > ------------------------------------------------------------------------ > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
We''ve done smoe experiments with software RAID10 on Intel X-25E solid state disks, focused on getting 4K random iops up very high. We were at first greatly encouraged by the fact that our current production MDS''s based on a dedicate DDN 8500 multiple tiers of FC disks with write cache on (older dual socket Nacona node with 8G RAM) topped out around 700 IOPS read, 150 IOPS write, while an array of 16 X25-E''s (quad socket, quad core opteron with 32G RAM) hit 120K IOPS read, 64K IOPS write. Two things curbed our enthusiasm for SSD''s in the short term: 1) the fact that a 3ware + 32 15K RPM SAS disks based MDS that gets approximately 6K IOPS read, 4K IOPS write on the same quad socket opteron node type was "about twice as fast" when measured with mdtest workloads as the production DDN 8500 setup above. If the disk were still the bottleneck we should have seen around 10X, and in fact the node is a lot faster too so the speedup may be attributable to that rather than the backend disk 2) some quick tests of MDS create rates (through lustre now) on the SSD and DDN hardware where we seemed to get about 2350 creates/sec no matter what hardware we used, and posts from Oleg on this mailing list indicating that tests utilizing loopback devices were only getting about 5300 creates/sec: http://lists.lustre.org/pipermail/lustre-devel/2009-February/002940.html So we''re going with the 3ware setup for newer file systems for now and keeping the SSD config in our back pocket for further investigation. Jim On Mon, Apr 13, 2009 at 09:31:34AM -0400, Paul Cote wrote:> Hello > > > We compared the metadata rates of that with the ones we get from our > MDT on a DDN 9500 with write back cache on > > This is interesting since the same topic was raised here ... I''m curious > though about the details of your MDT on the DDN array... did you > dedicated one full tierr which would, unfortunately, waste a lot of > capacity? or just allocate relative small capacity (100GB?) over many > tiers ... which would, in turn, compete for I/Os via OSTs. Any insight > would be appreciated ... i''m looking for best practices to configure the > MDT on the DDN. > > thanks, > /pgc > > Sarp Oral (oad) wrote: > > We have played around with that idea in one way or the other for a few > > times in the past. It didn?t seem to be cost effective. > > > > We tried a RamSan device (300, if I am not mistaken) as a MDT, almost > > two years ago. We compared the metadata rates of that with the ones we > > get from our MDT on a DDN 9500 with write back cache on. DDN (simply a > > big cache with a RAID 5 magnetic disk set behind it) turned out to be > > a more cost effective solution for our installation and use cases. > > > > > > We haven?t evaluated any SSDs as a MDT since then as far as I can > > remember. > > > > > > Sarp > > > > > > On 4/9/09 6:07 PM, "Jordan Mendler" <jmendler at ucla.edu> wrote: > > > > Has anyone done any testing of modern SSD drives as an MDT for > > Lustre 1.6? Searching through the archives it seems that most of > > the posts related to SSD are either incomplete or slightly dated. > > > > Does anyone have any input as to how they would compare to 15k RPM > > drives and at what deployment size the metadata performance gain > > would become noticeable? We are currently using Lustre as a small > > scratch space, and initially deployed our MDT as a 4x7200 RPM SATA > > RAID10 internal to the MDS. Metadata slow downs have become > > apparent during heavy use and/or small file operations, so we are > > currently deliberating which upgrade path to take. > > > > As of now, our deployment is pretty small: > > 4 OSS''s each with a 4x1TB RAID10 OST on disks internal to the OSS. > > Will increase the number of these as the system grows. > > ~50 clients that read/write large files that are striped across > > all OSS''s. Will grow 2-4x in the next several months. > > We are currently on GigE, but will be switching to DDR-4x IB very > > soon. > > > > Thanks, > > Jordan > > > > ------------------------------------------------------------------------ > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http:// lists.lustre.org/mailman/listinfo/lustre-discuss > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http:// lists.lustre.org/mailman/listinfo/lustre-discuss > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http:// lists.lustre.org/mailman/listinfo/lustre-discuss
Hello! On Apr 13, 2009, at 12:55 PM, Jim Garlick wrote:> 2) some quick tests of MDS create rates (through lustre now) on the > SSD > and DDN hardware where we seemed to get about 2350 creates/sec no > matter > what hardware we used, and posts from Oleg on this mailing list > indicating > that tests utilizing loopback devices were only getting about 5300 > creates/sec:Please note that 5300 that I got was for a single client! (btw, I hope you did not use -y option to mdtest) Since that time I had a chance to perform multi-client tests at ORNL (not ssd based, but this is unimportant since we turned out to be cpu bound at a certain point anyway). The result was 18k creates/sec for mkdirs (for some sort of 16 way fast cpus) (needs patch from bug 18534 to reduce unneeded rpcs during create). Actual open-creates would be slower right now to around 10k-12k creates/sec I would estimate (assuming your OSTs can keep up with creationg, we do some investigations in this area and also have found some problems in the mds-precreate- reqesting code already).> http://lists.lustre.org/pipermail/lustre-devel/2009-February/002940.html > So we''re going with the 3ware setup for newer file systems for now > and keeping the SSD config in our back pocket for further > investigation.When approaching 18k creates/sec (total) (first with 8 clients), in initial test there was a big dive at 16 clients that turned out to be journal overflow and so the syncing slowed everything. This should not be a concern for SSDs, though. Since we did not have an ssd in our back pocket at the time, we just tried 2G ramdisk-based journal instead, and that allowed us to remain at 18k creates/sec plateau scaling creating from 8 to 32 clients doing creates, at which point we seem to be overflowing the journal again (i know this is counterintuitive given how the rate is the same and journal just got bigger). Bye, Oleg