Yujun Wu
2009-May-28 13:40 UTC
[Lustre-discuss] OST selection considering load and qos_threshold_rr
Hello, Could somebody give me some hint on this? I am trying to watch over how the OSTs are selected for putting files on. One thing I find is that, the files can be put on an OST although it is already busy handling other files while some other OSTs are idling. Does Lustre select an OST considering its current load or not? I looked at the Lustre document and thought I can do some adjustment with qos_threshold_rr at least with Round-Robin(RR): http://manual.lustre.org/manual/LustreManual16_HTML/LustreProc.html#50401407_pgfId-1290499 But I couldn''t find qos_threshold_rr proc tunable on the Lustre MDS I have installed. The version of my installation is 1.6.7. Am I missing something or it has changed? Thanks in advance for your help. Regards, Yujun
Andreas Dilger
2009-May-28 18:06 UTC
[Lustre-discuss] OST selection considering load and qos_threshold_rr
On May 28, 2009 09:40 -0400, Yujun Wu wrote:> I am trying to watch over how the OSTs are selected for > putting files on. One thing I find is that, the files > can be put on an OST although it is already busy handling > other files while some other OSTs are idling. Does Lustre > select an OST considering its current load or not?No, currently there are only two modes for OST selection: - round robin, used most of the time because it distributes the space usage/load best in most situations - space-weighted random selection that selects OSTs with more free space more often than OSTs with less free space Per bug 18547 we would like to change the space-weighted random OST selection with a weighted round-robin algorithm to avoid the random collisions (due to birthday paradox) that result in the same OST being selected more often than it should. Once we have a better weighted round-robin OST selector we can add in other parameters such as throughput, iops rate, etc. There are reserved fields in the obd_statfs structure that can be used to pass a number of performance metrics from the OST to the MDS. The danger of using the _current_ IO load for the OST object selection is that this information may be somewhat stale by the time the MDS does OST selection and it might result in poor load balancing. The MDS only gets updated statfs information every 5s at most. At a minimum the MDS should estimate the load it is placing on each OST as it allocates objects there so that its calculations don''t become completely irrelevant after the first few allocations.> I looked at the Lustre document and thought I can do > some adjustment with qos_threshold_rr at least with Round-Robin(RR): > > http://manual.lustre.org/manual/LustreManual16_HTML/LustreProc.html#50401407_pgfId-1290499 > > But I couldn''t find qos_threshold_rr proc tunable on the Lustre MDS I have > installed. The version of my installation is 1.6.7. Am I missing > something or it has changed?Per bug 18334 this patch was not landed in the 1.6.7 release, only for 1.6.8 and 1.8.0. This should probably be made clear in the 1.6 manual. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Yujun Wu
2009-May-28 18:58 UTC
[Lustre-discuss] OST selection considering load and qos_threshold_rr
Hello Andreas, Thanks a lot for your detailed info. On Thu, 28 May 2009, Andreas Dilger wrote:> On May 28, 2009 09:40 -0400, Yujun Wu wrote: > > I am trying to watch over how the OSTs are selected for > > putting files on. One thing I find is that, the files > > can be put on an OST although it is already busy handling > > other files while some other OSTs are idling. Does Lustre > > select an OST considering its current load or not? > > No, currently there are only two modes for OST selection: > - round robin, used most of the time because it distributes the > space usage/load best in most situations > - space-weighted random selection that selects OSTs with more free > space more often than OSTs with less free space > > Per bug 18547 we would like to change the space-weighted random OST > selection with a weighted round-robin algorithm to avoid the random > collisions (due to birthday paradox) that result in the same OST being > selected more often than it should.Yes, it will be good to avoid this.> > Once we have a better weighted round-robin OST selector we can add in > other parameters such as throughput, iops rate, etc. There are reserved > fields in the obd_statfs structure that can be used to pass a number of > performance metrics from the OST to the MDS.Looking forward to seeing features like this.> The danger of using the _current_ IO load for the OST object selection > is that this information may be somewhat stale by the time the MDS does > OST selection and it might result in poor load balancing. The MDS only > gets updated statfs information every 5s at most. At a minimum the > MDS should estimate the load it is placing on each OST as it allocates > objects there so that its calculations don''t become completely irrelevant > after the first few allocations.I think some study is needed on how to use some dynamic info like "_current_ IO load". It is not necessarily update all the time. I think it could be useful to use this info to avoid some OST becomes too heavily used while others are idling.> > > I looked at the Lustre document and thought I can do > > some adjustment with qos_threshold_rr at least with Round-Robin(RR): > > > > http://manual.lustre.org/manual/LustreManual16_HTML/LustreProc.html#50401407_pgfId-1290499 > > > > But I couldn''t find qos_threshold_rr proc tunable on the Lustre MDS I have > > installed. The version of my installation is 1.6.7. Am I missing > > something or it has changed? > > Per bug 18334 this patch was not landed in the 1.6.7 release, only for > 1.6.8 and 1.8.0. This should probably be made clear in the 1.6 manual.Thanks for your info. Maybe I should try a later version of Lustre. Regards, Yujun> > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >