Andrei Maslennikov
2007-Jun-21 11:00 UTC
[Lustre-discuss] Performance hints (1 OSS/3 OST configuration)
We are in the process of testing of a would-be low cost Lustre head (a black box disk server with an Infiniband outlet). The box contains 3 standalone RAID-6 controllers capable to deliver 300 MB/sec each. The box has 4 cores at 3GHz, so 3 parallel dd processes are delivering 3x300=900 MB/sec aggregate without any problem. Following the advice obtained on this list, we are hence configuring one single OSS with 3 OSTs inside (one OST per controller), and then stripe over these 3 OSTs to get the best performance on IB. This configuration perfectly works, but we are only able to achieve max 336 MB/sec for a striped file on a stanadlone IB client. Our further actions will be to play with the ost_num_threads and/or mds_num_threads, cache segment sizes, maxcmds etc. Before doing that, I however would seek for a guru''s comment on the following: not that we will *never* be able to detach from the performance of a single controller due to the fact that our MDT and MGS are using their areas served by only one of the three controllers? If the answer is "yes", then the better bet would probably be to come back to an LVM-based solution which we have previously discarded as it was starting only at 750 MB/sec. And to place MDT, MGS and 1 OST on 3 separate logical volumes each striped over the 3 controllers. Thanks ahead to anyone who would comment on this. Andrei. PS We had the "lnet" option set as "networks=o2ib". Is there any chance that we were using IPoIB in the place of RDMA? This could explain low performance, as well. A. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070621/1d1b2931/attachment-0001.html
Andreas Dilger
2007-Jun-21 17:03 UTC
[Lustre-discuss] Performance hints (1 OSS/3 OST configuration)
On Jun 21, 2007 19:00 +0200, Andrei Maslennikov wrote:> We are in the process of testing of a would-be low cost Lustre head > (a black box disk server with an Infiniband outlet). The box contains > 3 standalone RAID-6 controllers capable to deliver 300 MB/sec each. > The box has 4 cores at 3GHz, so 3 parallel dd processes are delivering > 3x300=900 MB/sec aggregate without any problem.Is this locally, or from the lustre client?> This configuration perfectly works, but we are only able to achieve max > 336 MB/sec for a striped file on a stanadlone IB client.Is that a single-threaded test.> Our further actions > will be to play with the ost_num_threads and/or mds_num_threads, > cache segment sizes, maxcmds etc. Before doing that, I however would > seek for a guru''s comment on the following: not that we will *never* be > able to detach from the performance of a single controller due to the fact > that our MDT and MGS are using their areas served by only one of the > three controllers?Could you rephrase the question?> If the answer is "yes", then the better bet would probably be to come back > to an LVM-based solution which we have previously discarded as it was > starting only at 750 MB/sec. And to place MDT, MGS and 1 OST on 3 > separate logical volumes each striped over the 3 controllers.You hardly need to have a separate LV/controller for just the MGS. It would be better to have 2 OSTs and put the MGS on a small LV on the same controller with the MDS.> PS We had the "lnet" option set as "networks=o2ib". Is there any > chance that we were using IPoIB in the place of RDMA? This > could explain low performance, as well. A.No, that would happen only if you had "networks=tcp". Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Andrei Maslennikov
2007-Jun-21 23:51 UTC
[Lustre-discuss] Performance hints (1 OSS/3 OST configuration)
On 6/22/07, Andreas Dilger <adilger@clusterfs.com> wrote:> > On Jun 21, 2007 19:00 +0200, Andrei Maslennikov wrote: > > We are in the process of testing of a would-be low cost Lustre head > > (a black box disk server with an Infiniband outlet). The box contains > > 3 standalone RAID-6 controllers capable to deliver 300 MB/sec each. > > The box has 4 cores at 3GHz, so 3 parallel dd processes are delivering > > 3x300=900 MB/sec aggregate without any problem. > > Is this locally, or from the lustre client?Locally on the box, before it was configured as an OSS.> This configuration perfectly works, but we are only able to achieve max > > 336 MB/sec for a striped file on a stanadlone IB client. > > Is that a single-threaded test.Yes, it is a thingle-threaded test. The goal is to provide a highest possible peak throughput for just one process on the client. The OSS has 3 OSTs, and the file is being striped via IB over these three OSTs, i.e. all 3 controllers are being used simultaneously.> Our further actions > > will be to play with the ost_num_threads and/or mds_num_threads, > > cache segment sizes, maxcmds etc. Before doing that, I however would > > seek for a guru''s comment on the following: not that we will *never* be > > able to detach from the performance of a single controller due to the > fact > > that our MDT and MGS are using their areas served by only one of the > > three controllers? > > Could you rephrase the question?In our case MDT and MGS are using the bandwidth of only one controller (300 MB/sec) to store their data, while they are serving three OSTs spread over three such controllers. I suspect that in this configuration one cannot get an aggregate of 3x300 out of all OSTs together, because the MDT and MGS cannot operate at more than 300x1. So probably the only remedy is to use LVM for MDT and MGS, and spread their data over all controllers.> If the answer is "yes", then the better bet would probably be to come back > > to an LVM-based solution which we have previously discarded as it was > > starting only at 750 MB/sec. And to place MDT, MGS and 1 OST on 3 > > separate logical volumes each striped over the 3 controllers. > > You hardly need to have a separate LV/controller for just the MGS. It > would > be better to have 2 OSTs and put the MGS on a small LV on the same > controller > with the MDS.I tried this, but there was no improvement... Andrei. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070621/0ce5d108/attachment.html