Folks, Not having to deal with quotas on our scratch filesystems in the past, I''m puzzled on why we''re seeing messages like the following: Oct 22 09:29:00 widow-oss3c2 kernel: kernel: Lustre: widow3-OST00b1: slow quota init 35s due to heavy IO load We''re (I think) not doing quotas. I''ve ran through the 1.8 manual and it''s unclear to me how to detect if Lustre is in fact calculating quotas underneath the covers. I''m extremely hesitant to run lfs quota check - we''ve only got 400T and 30-ish million files in the filesystem currently, but there are 372 OST''s on 48 OSS nodes - and I''m concerned about the amount of time it would take as I''ve heard the initial lfs quota check command can take quite a while to build the "mapping". So, the question is - if we see messages like "slow quota init", are quotas being calculated in the background? And as a followup - how do we turn them off? Thanks, -- -Jason ------------------------------------------------- // Jason J. Hill // // HPC Systems Administrator // // National Center for Computational Sciences // // Oak Ridge National Laboratory // // e-mail: hilljj at ornl.gov // // Phone: (865) 576-5867 // -------------------------------------------------
On 10/22/10 9:37 PM, Jason Hill wrote:> Folks, > > Not having to deal with quotas on our scratch filesystems in the past, I''m > puzzled on why we''re seeing messages like the following: > > Oct 22 09:29:00 widow-oss3c2 kernel: kernel: Lustre: widow3-OST00b1: slow quota init 35s due to heavy IO load > > We''re (I think) not doing quotas. I''ve ran through the 1.8 manual and it''s > unclear to me how to detect if Lustre is in fact calculating quotas underneath > the covers. I''m extremely hesitant to run lfs quota check - we''ve only got 400T > and 30-ish million files in the filesystem currently, but there are 372 OST''s > on 48 OSS nodes - and I''m concerned about the amount of time it would take as > I''ve heard the initial lfs quota check command can take quite a while to build > the "mapping". > > So, the question is - if we see messages like "slow quota init", are quotas > being calculated in the background? And as a followup - how do we turn them > off?No. I think you are misguided by the message "slow quota init 35s due to heavy IO load", which does not mean recalculating (initial calculating) quota in the background. In fact, such message is printed out before obdfilter write, at such point, the OST tries to acquire enough quota for this write operation. It will check locally whether the remaining quota related with the uid/gid (for this OST object) is enough or not, if not, the quota slave on this OST will acquire more quota from quota master on MDS. This process maybe take some long time on high load system, especially when the remaining quota on quota master (MDS) is also very limit. The message you saw just shows that. There is no good way to disable these message so long as setting quota on this uid/gid. Cheers, Nasf> Thanks,
On Fri, 2010-10-22 at 22:56 +0800, Fan Yong wrote:> On 10/22/10 9:37 PM, Jason Hill wrote: > > Folks, > > > > Not having to deal with quotas on our scratch filesystems in the past, I''m > > puzzled on why we''re seeing messages like the following: > > > > Oct 22 09:29:00 widow-oss3c2 kernel: kernel: Lustre: widow3-OST00b1: slow quota init 35s due to heavy IO load > > > > We''re (I think) not doing quotas.[ ... ]> > So, the question is - if we see messages like "slow quota init", are quotas > > being calculated in the background? And as a followup - how do we turn them > > off?> No. I think you are misguided by the message "slow quota init 35s due to > heavy IO load", which does not mean recalculating (initial calculating) > quota in the background. In fact, such message is printed out before > obdfilter write, at such point, the OST tries to acquire enough quota > for this write operation. It will check locally whether the remaining > quota related with the uid/gid (for this OST object) is enough or not, > if not, the quota slave on this OST will acquire more quota from quota > master on MDS. This process maybe take some long time on high load > system, especially when the remaining quota on quota master (MDS) is > also very limit. The message you saw just shows that. There is no good > way to disable these message so long as setting quota on this uid/gid.This is the heart of Jason''s question -- he has done nothing to his knowledge to enable quotas at all, so why is he getting a message about quotas? Are they actually enabled on the FS, and how would he be able to verify that? Or does it always process quotas, even if they are not enabled? -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office
David Dillow wrote:> On Fri, 2010-10-22 at 22:56 +0800, Fan Yong wrote: > >> On 10/22/10 9:37 PM, Jason Hill wrote: >> >>> Folks, >>> >>> Not having to deal with quotas on our scratch filesystems in the past, I''m >>> puzzled on why we''re seeing messages like the following: >>> >>> Oct 22 09:29:00 widow-oss3c2 kernel: kernel: Lustre: widow3-OST00b1: slow quota init 35s due to heavy IO load >>> >>> We''re (I think) not doing quotas. >>> > [ ... ] > >>> So, the question is - if we see messages like "slow quota init", are quotas >>> being calculated in the background? And as a followup - how do we turn them >>> off? >>> > > >> No. I think you are misguided by the message "slow quota init 35s due to >> heavy IO load", which does not mean recalculating (initial calculating) >> quota in the background. In fact, such message is printed out before >> obdfilter write, at such point, the OST tries to acquire enough quota >> for this write operation. It will check locally whether the remaining >> quota related with the uid/gid (for this OST object) is enough or not, >> if not, the quota slave on this OST will acquire more quota from quota >> master on MDS. This process maybe take some long time on high load >> system, especially when the remaining quota on quota master (MDS) is >> also very limit. The message you saw just shows that. There is no good >> way to disable these message so long as setting quota on this uid/gid. >> > > This is the heart of Jason''s question -- he has done nothing to his > knowledge to enable quotas at all, so why is he getting a message about > quotas? Are they actually enabled on the FS, and how would he be able to > verify that? > > Or does it always process quotas, even if they are not enabled? >That message, from lustre/obdfilter/filter_io_26.c, is the result of the thread taking 35 second from when it entered filter_commitrw_write() until after it called lquota_chkquota() to check the quota. However, it is certainly plausible that the thread was delayed because of something other than quotas, such as an allocation (eg, it could have been stuck in filter_iobuf_get). Kevin
Kevin/Dave/(and Dave from DDN): Thanks for your replies. From tunefs.lustre --dryrun it is very apparent that we are not running quotas. Thanks for your assistance. -- -Jason On Fri, Oct 22, 2010 at 04:39:29PM -0400, Kevin Van Maren wrote:> That message, from lustre/obdfilter/filter_io_26.c, is the result of the > thread taking 35 second > from when it entered filter_commitrw_write() until after it called > lquota_chkquota() to check the quota. > > However, it is certainly plausible that the thread was delayed because > of something other than quotas, > such as an allocation (eg, it could have been stuck in filter_iobuf_get). > > Kevin
Hello Jason, please note that it is also possible to enable quotas using lctl and that would not be visible using tunefs.lustre. I think the only real option to check if quotas are enabled is to check if quota file exist. For an online filesystem ''debugfs -c /dev/device'' is probably the safest way (there is also a ''secret'' way how to bind mount the underlying ldiskfs to another directory, but I only use that for test filesystems and never in production, as have not verified the kernel code path yet). Either way, you should check for lquota files, such as root at rhel5-nfs@phys-oss0:~# mount -t ldiskfs /dev/mapper/ost_demofs_2 /mnt root at rhel5-nfs@phys-oss0:~# ll /mnt [...] -rw-r--r-- 1 root root 7168 Oct 23 09:48 lquota_v2.group -rw-r--r-- 1 root root 71680 Oct 23 09:48 lquota_v2.user (Of course, you should check that for those OST which have reported the slow quota messages). I just poked around a bit in the code and above the fsfilt_check_slow() check, there is also a loop that calls filter_range_is_mapped(). Now this function calls fs_bmap() and when that eventually goes to down to ext3, it might get a bit slow if, if another thread should modify that file (check out linux/fs/inode.c): /* * bmap() is special. It gets used by applications such as lilo and by * the swapper to find the on-disk block of a specific piece of data. * * Naturally, this is dangerous if the block concerned is still in the * journal. If somebody makes a swapfile on an ext3 data-journaling * filesystem and enables swap, then they may get a nasty shock when the * data getting swapped to that swapfile suddenly gets overwritten by * the original zero''s written out previously to the journal and * awaiting writeback in the kernel''s buffer cache. * * So, if we see any bmap calls here on a modified, data-journaled file, * take extra steps to flush any blocks which might be in the cache. */ I don''t know though, if it can happen that several threads write to the same file. But if it happens, it gets slow. I wonder if a possible swap file is worth the efforts here... In fact, the reason to call filter_range_is_mapped() certainly does not require a journal flush in that loop. I will check myself next week, if journal flushes are ever made due to that and open a Lustre bugzilla then. Avoiding all of that should not be difficult Cheers, Bernd On Saturday, October 23, 2010, Jason Hill wrote:> Kevin/Dave/(and Dave from DDN): > > Thanks for your replies. From tunefs.lustre --dryrun it is very apparent > that we are not running quotas. > > Thanks for your assistance. > > > That message, from lustre/obdfilter/filter_io_26.c, is the result of the > > thread taking 35 second > > from when it entered filter_commitrw_write() until after it called > > lquota_chkquota() to check the quota. > > > > However, it is certainly plausible that the thread was delayed because > > of something other than quotas, > > such as an allocation (eg, it could have been stuck in filter_iobuf_get). > > > > Kevin > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- Bernd Schubert DataDirect Networks
On 10/23/10 4:39 PM, Bernd Schubert wrote:> Hello Jason, > > please note that it is also possible to enable quotas using lctl and that > would not be visible using tunefs.lustre. I think the only real option to > check if quotas are enabled is to check if quota file exist. For an online > filesystem ''debugfs -c /dev/device'' is probably the safest way (there is also > a ''secret'' way how to bind mount the underlying ldiskfs to another directory, > but I only use that for test filesystems and never in production, as have not > verified the kernel code path yet). > > Either way, you should check for lquota files, such as > > root at rhel5-nfs@phys-oss0:~# mount -t ldiskfs /dev/mapper/ost_demofs_2 /mnt > > root at rhel5-nfs@phys-oss0:~# ll /mnt > [...] > -rw-r--r-- 1 root root 7168 Oct 23 09:48 lquota_v2.group > -rw-r--r-- 1 root root 71680 Oct 23 09:48 lquota_v2.user > > > (Of course, you should check that for those OST which have reported the slow > quota messages).In fact, once you have performed the command "lfs quotacheck -ug $MNT" on any client, the two files you mentioned will be created on each OST and MDT, even though you performed "lfs quotaoff -ug $MNT" later, such two files will not be removed. So you can not know whether quota is on/off for your system just according to whether such two files exists or not. (If quota is off for your system, then"lquota_chkquota()" called in "filter_commitrw_write()" will be bypassed directly) Since you want to disable quota on your system, why not perform "lfs quotaoff -ug $MNT" on client directly? such command can be performed even if quota is off already, without any harm. If you want to make sure whether quota is off, you can try "lfs quota -u $UID $MNT" on client, if quota is off, it will report "user quotas are not enabled.". - Nasf> I just poked around a bit in the code and above the fsfilt_check_slow() check, > there is also a loop that calls filter_range_is_mapped(). Now this function > calls fs_bmap() and when that eventually goes to down to ext3, it might get a > bit slow if, if another thread should modify that file (check out > linux/fs/inode.c): > > /* > * bmap() is special. It gets used by applications such as lilo and by > * the swapper to find the on-disk block of a specific piece of data. > * > * Naturally, this is dangerous if the block concerned is still in the > * journal. If somebody makes a swapfile on an ext3 data-journaling > * filesystem and enables swap, then they may get a nasty shock when the > * data getting swapped to that swapfile suddenly gets overwritten by > * the original zero''s written out previously to the journal and > * awaiting writeback in the kernel''s buffer cache. > * > * So, if we see any bmap calls here on a modified, data-journaled file, > * take extra steps to flush any blocks which might be in the cache. > */ > > I don''t know though, if it can happen that several threads write to the same > file. But if it happens, it gets slow. I wonder if a possible swap file is > worth the efforts here... In fact, the reason to call > filter_range_is_mapped() certainly does not require a journal flush in that > loop. I will check myself next week, if journal flushes are ever made due to > that and open a Lustre bugzilla then. Avoiding all of that should not be > difficult > > Cheers, > Bernd > > > > > > On Saturday, October 23, 2010, Jason Hill wrote: >> Kevin/Dave/(and Dave from DDN): >> >> Thanks for your replies. From tunefs.lustre --dryrun it is very apparent >> that we are not running quotas. >> >> Thanks for your assistance. >> >>> That message, from lustre/obdfilter/filter_io_26.c, is the result of the >>> thread taking 35 second >>> from when it entered filter_commitrw_write() until after it called >>> lquota_chkquota() to check the quota. >>> >>> However, it is certainly plausible that the thread was delayed because >>> of something other than quotas, >>> such as an allocation (eg, it could have been stuck in filter_iobuf_get). >>> >>> Kevin >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >