Hello! In our service, we use lustre for a huge storage. we don''t need high performance, but we need easy storage expansibility. In current lustre version(1.5.97) , wo noticed that if OSTs have different capacity, or OSTs were added to system in different time (e.g when some OSTs used rate up to 90%, we add some new OSTs), in these condition, lustre may prompt storage is full, but actually some OSTs still have free capacity while some OSTs if full. We hope that if an OST''s used rate up to 90% or 95%, it can be marked FULL, and avoid to use this OST. we plan to add a state FULL in structure lov_tgt_desc, when the OST used rate up to 90% or 95%, set its state to FULL, and when create new object, don''t use these FULL OST. We want to know how to calculate used rate of a OST, is it like this: used rate = lov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_bavail / ov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_blocks ?? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070418/d05aed8f/attachment.html
swin wang
2007-Apr-18 02:28 UTC
[Lustre-discuss] Re: How to calculate used rate of a OST ?
sorry, my mistake. used rate = 1 - (lov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_bavail / ov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_blocks) is it right? 2007/4/18, swin wang <wangswin@gmail.com>:> > Hello! > > In our service, we use lustre for a huge storage. we don''t need high > performance, > but we need easy storage expansibility. In current lustre version(1.5.97) > , wo noticed > that if OSTs have different capacity, or OSTs were added to system in > different time > (e.g when some OSTs used rate up to 90%, we add some new OSTs), in these > condition, > lustre may prompt storage is full, but actually some OSTs still have > free capacity while > some OSTs if full. > We hope that if an OST''s used rate up to 90% or 95%, it can be marked > FULL, and > avoid to use this OST. we plan to add a state FULL in structure > lov_tgt_desc, when the > OST used rate up to 90% or 95%, set its state to FULL, and when create > new object, > don''t use these FULL OST. > We want to know how to calculate used rate of a OST, is it like > this: > used rate = lov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_bavail > / ov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_blocks ?? > > Thanks! >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070418/bd64915b/attachment.html
Andreas Dilger
2007-Apr-18 08:02 UTC
[Lustre-discuss] How to calculate used rate of a OST ?
On Apr 18, 2007 16:19 +0800, swin wang wrote:> We hope that if an OST''s used rate up to 90% or 95%, it can be marked > FULL, and avoid to use this OST. we plan to add a state FULL in structure > lov_tgt_desc, when the > OST used rate up to 90% or 95%, set its state to FULL, and when create > new object, > don''t use these FULL OST. > We want to know how to calculate used rate of a OST, is it like this: > used rate = lov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_bavail / > ov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_blocks ??This is already done on the OSTs in filter_precreate(), though the limit is only 0.1%. I''d be happy to accept a patch which made this limit a tunable in /proc that can be set by the sysadmin. There is also in 1.6 (and betas) some free space management that is tunable on the MDS /proc/fs/lustre/lov/*/qos_priofree which is a percentage of weight which you want to give to "free" OSTs. In your case you want this to be relatively high (90% is the default in later betas). If this is not sufficient to have the MDS allocate very preferentially on free OSTs, please provide more details on your configuration. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
We have tested the 1.5.97, and set qos_priofree, but it didn''t work, I think qos_priofree just make different penalty to OST, it can not avoid alloc object on the full OST. Are you sure qos_priofree can do it? 2007/4/18, Andreas Dilger <adilger@clusterfs.com>:> > On Apr 18, 2007 16:19 +0800, swin wang wrote: > > We hope that if an OST''s used rate up to 90% or 95%, it can be marked > > FULL, and avoid to use this OST. we plan to add a state FULL in > structure > > lov_tgt_desc, when the > > OST used rate up to 90% or 95%, set its state to FULL, and when > create > > new object, > > don''t use these FULL OST. > > We want to know how to calculate used rate of a OST, is it like > this: > > used rate > lov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_bavail / > > ov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_blocks ?? > > This is already done on the OSTs in filter_precreate(), though the limit > is > only 0.1%. I''d be happy to accept a patch which made this limit a tunable > in /proc that can be set by the sysadmin. > > There is also in 1.6 (and betas) some free space management that is > tunable > on the MDS /proc/fs/lustre/lov/*/qos_priofree which is a percentage of > weight which you want to give to "free" OSTs. In your case you want this > to be relatively high (90% is the default in later betas). If this is not > sufficient to have the MDS allocate very preferentially on free OSTs, > please > provide more details on your configuration. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070418/56d10efe/attachment-0001.html
Nathaniel Rutman
2007-Apr-18 10:05 UTC
[Lustre-discuss] How to calculate used rate of a OST ?
swin wang wrote:> > We have tested the 1.5.97, and set qos_priofree, but it didn''t work, > I think qos_priofree just make different penalty to OST, it can not avoid > alloc object on the full OST. > Are you sure qos_priofree can do it?You are correct; it just changes the penalty. QOS does not ever guarantee that any particular OST will or won''t be used - it only changes the likelihood. You can prevent the MDT from assigning new stripes to a particular OST by deactivating the corresponding OSC on the MDT using "lctl deactivate", but this has to be done every time the MDT is restarted. You could also use mount -o exclude=testfs-OST0000 -t lustre /dev/mdt /mnt/mdt at MDT mount time for the same result. You could also change the line in filter_precreate() if (rc == 0 && osfs->os_bavail < (osfs->os_blocks >> 10)) blocks >> 10 = blocks / 1024 = 0.1%, you could just change it to os_blocks / 10 for an automatic 10% free space limit.> > 2007/4/18, Andreas Dilger <adilger@clusterfs.com > <mailto:adilger@clusterfs.com>>: > > On Apr 18, 2007 16:19 +0800, swin wang wrote: > > We hope that if an OST''s used rate up to 90% or 95%, it can > be marked > > FULL, and avoid to use this OST. we plan to add a state FULL in > structure > > lov_tgt_desc, when the > > OST used rate up to 90% or 95%, set its state to FULL, and > when create > > new object, > > don''t use these FULL OST. > > We want to know how to calculate used rate of a OST, is it > like this: > > used rate > lov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_bavail / > > ov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_blocks ?? > > This is already done on the OSTs in filter_precreate(), though the > limit is > only 0.1%. I''d be happy to accept a patch which made this limit a > tunable > in /proc that can be set by the sysadmin. > > There is also in 1.6 (and betas) some free space management that > is tunable > on the MDS /proc/fs/lustre/lov/*/qos_priofree which is a > percentage of > weight which you want to give to "free" OSTs. In your case you > want this > to be relatively high (90% is the default in later betas). If > this is not > sufficient to have the MDS allocate very preferentially on free > OSTs, please > provide more details on your configuration. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
It didn''t work when set to if (rc == 0 && osfs->os_bavail < (osfs->os_blocks / 10)) this just return error, but we don''t want any error or fail, and just choose another OST(our stripe count is 1) automatically, Another our change is in lov_qos after if (!lov->lov_tgts[i] || !lov->lov_tgts[i]->ltd_active) continue; add: if (lov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_bavail < lov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_blocks / 10 ){ lov->lov_tgts[i]->ltd_qos.ltq_usable=0; continue; } but it sometimes still write to the ost with used rate up to 90%. Any suggestions?? 2007/4/19, Nathaniel Rutman <nathan@clusterfs.com>:> > swin wang wrote: > > > > We have tested the 1.5.97, and set qos_priofree, but it didn''t work, > > I think qos_priofree just make different penalty to OST, it can not > avoid > > alloc object on the full OST. > > Are you sure qos_priofree can do it? > You are correct; it just changes the penalty. QOS does not ever > guarantee that any particular OST will or won''t be used - it only > changes the likelihood. > > You can prevent the MDT from assigning new stripes to a particular OST > by deactivating the corresponding OSC on the MDT using "lctl > deactivate", but this has to be done every time the MDT is restarted. > You could also use > > mount -o exclude=testfs-OST0000 -t lustre /dev/mdt /mnt/mdt > > at MDT mount time for the same result. > > You could also change the line in filter_precreate() > if (rc == 0 && osfs->os_bavail < (osfs->os_blocks >> 10)) > blocks >> 10 = blocks / 1024 = 0.1%, > you could just change it to os_blocks / 10 for an automatic 10% free > space limit. > > > > > 2007/4/18, Andreas Dilger <adilger@clusterfs.com > > <mailto:adilger@clusterfs.com>>: > > > > On Apr 18, 2007 16:19 +0800, swin wang wrote: > > > We hope that if an OST''s used rate up to 90% or 95%, it can > > be marked > > > FULL, and avoid to use this OST. we plan to add a state FULL in > > structure > > > lov_tgt_desc, when the > > > OST used rate up to 90% or 95%, set its state to FULL, and > > when create > > > new object, > > > don''t use these FULL OST. > > > We want to know how to calculate used rate of a OST, is it > > like this: > > > used rate > > lov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_bavail / > > > ov->lov_tgts[i]->ltd_exp->exp_obd->obd_osfs.os_blocks ?? > > > > This is already done on the OSTs in filter_precreate(), though the > > limit is > > only 0.1%. I''d be happy to accept a patch which made this limit a > > tunable > > in /proc that can be set by the sysadmin. > > > > There is also in 1.6 (and betas) some free space management that > > is tunable > > on the MDS /proc/fs/lustre/lov/*/qos_priofree which is a > > percentage of > > weight which you want to give to "free" OSTs. In your case you > > want this > > to be relatively high (90% is the default in later betas). If > > this is not > > sufficient to have the MDS allocate very preferentially on free > > OSTs, please > > provide more details on your configuration. > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Principal Software Engineer > > Cluster File Systems, Inc. > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070419/006e9a72/attachment.html