thr3ads.net - Lustre discuss - [Lustre-discuss] 1.8 quotas [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Jason Hill

2010-Oct-22 13:37 UTC

[Lustre-discuss] 1.8 quotas

Folks,

Not having to deal with quotas on our scratch filesystems in the past,
I''m
puzzled on why we''re seeing messages like the following:

Oct 22 09:29:00 widow-oss3c2 kernel: kernel: Lustre: widow3-OST00b1: slow quota
init 35s due to heavy IO load

We''re (I think) not doing quotas. I''ve ran through the 1.8
manual and it''s
unclear to me how to detect if Lustre is in fact calculating quotas underneath
the covers. I''m extremely hesitant to run lfs quota check -
we''ve only got 400T
and 30-ish million files in the filesystem currently, but there are 372
OST''s
on 48 OSS nodes - and I''m concerned about the amount of time it would
take as
I''ve heard the initial lfs quota check command can take quite a while
to build
the "mapping". 

So, the question is - if we see messages like "slow quota init", are
quotas
being calculated in the background? And as a followup - how do we turn them
off?

Thanks,
-- 
-Jason
-------------------------------------------------
//  Jason J. Hill                              //
//  HPC Systems Administrator                  //
//  National Center for Computational Sciences //
//  Oak Ridge National Laboratory              // 
//  e-mail: hilljj at ornl.gov                    //
//  Phone: (865) 576-5867                      //
-------------------------------------------------

Fan Yong

2010-Oct-22 14:56 UTC

head link

[Lustre-discuss] 1.8 quotas

On 10/22/10 9:37 PM, Jason Hill wrote:> Folks,
>
> Not having to deal with quotas on our scratch filesystems in the past,
I''m
> puzzled on why we''re seeing messages like the following:
>
> Oct 22 09:29:00 widow-oss3c2 kernel: kernel: Lustre: widow3-OST00b1: slow
quota init 35s due to heavy IO load
>
> We''re (I think) not doing quotas. I''ve ran through the
1.8 manual and it''s
> unclear to me how to detect if Lustre is in fact calculating quotas
underneath
> the covers. I''m extremely hesitant to run lfs quota check -
we''ve only got 400T
> and 30-ish million files in the filesystem currently, but there are 372
OST''s
> on 48 OSS nodes - and I''m concerned about the amount of time it
would take as
> I''ve heard the initial lfs quota check command can take quite a
while to build
> the "mapping".
>
> So, the question is - if we see messages like "slow quota init",
are quotas
> being calculated in the background? And as a followup - how do we turn them
> off?No. I think you are misguided by the message "slow quota init 35s due to 
heavy IO load", which does not mean recalculating (initial calculating) 
quota in the background. In fact, such message is printed out before 
obdfilter write, at such point, the OST tries to acquire enough quota 
for this write operation. It will check locally whether the remaining 
quota related with the uid/gid (for this OST object) is enough or not, 
if not, the quota slave on this OST will acquire more quota from quota 
master on MDS. This process maybe take some long time on high load 
system, especially when the remaining quota on quota master (MDS) is 
also very limit. The message you saw just shows that. There is no good 
way to disable these message so long as setting quota on this uid/gid.


Cheers,
Nasf> Thanks,

David Dillow

2010-Oct-22 19:17 UTC

head link

[Lustre-discuss] 1.8 quotas

On Fri, 2010-10-22 at 22:56 +0800, Fan Yong wrote:> On 10/22/10 9:37 PM, Jason Hill wrote:
> > Folks,
> >
> > Not having to deal with quotas on our scratch filesystems in the past,
I''m
> > puzzled on why we''re seeing messages like the following:
> >
> > Oct 22 09:29:00 widow-oss3c2 kernel: kernel: Lustre: widow3-OST00b1:
slow quota init 35s due to heavy IO load
> >
> > We''re (I think) not doing quotas. 
[ ... ]> > So, the question is - if we see messages like "slow quota
init", are quotas
> > being calculated in the background? And as a followup - how do we turn
them
> > off?
> No. I think you are misguided by the message "slow quota init 35s due
to
> heavy IO load", which does not mean recalculating (initial
calculating)
> quota in the background. In fact, such message is printed out before 
> obdfilter write, at such point, the OST tries to acquire enough quota 
> for this write operation. It will check locally whether the remaining 
> quota related with the uid/gid (for this OST object) is enough or not, 
> if not, the quota slave on this OST will acquire more quota from quota 
> master on MDS. This process maybe take some long time on high load 
> system, especially when the remaining quota on quota master (MDS) is 
> also very limit. The message you saw just shows that. There is no good 
> way to disable these message so long as setting quota on this uid/gid.
This is the heart of Jason''s question -- he has done nothing to his
knowledge to enable quotas at all, so why is he getting a message about
quotas? Are they actually enabled on the FS, and how would he be able to
verify that?

Or does it always process quotas, even if they are not enabled?

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

Kevin Van Maren

2010-Oct-22 20:39 UTC

head link

[Lustre-discuss] 1.8 quotas

David Dillow wrote:> On Fri, 2010-10-22 at 22:56 +0800, Fan Yong wrote:
>   
>> On 10/22/10 9:37 PM, Jason Hill wrote:
>>     
>>> Folks,
>>>
>>> Not having to deal with quotas on our scratch filesystems in the
past, I''m
>>> puzzled on why we''re seeing messages like the following:
>>>
>>> Oct 22 09:29:00 widow-oss3c2 kernel: kernel: Lustre:
widow3-OST00b1: slow quota init 35s due to heavy IO load
>>>
>>> We''re (I think) not doing quotas. 
>>>       
> [ ... ]
>   
>>> So, the question is - if we see messages like "slow quota
init", are quotas
>>> being calculated in the background? And as a followup - how do we
turn them
>>> off?
>>>       
>
>   
>> No. I think you are misguided by the message "slow quota init 35s
due to
>> heavy IO load", which does not mean recalculating (initial
calculating)
>> quota in the background. In fact, such message is printed out before 
>> obdfilter write, at such point, the OST tries to acquire enough quota 
>> for this write operation. It will check locally whether the remaining 
>> quota related with the uid/gid (for this OST object) is enough or not, 
>> if not, the quota slave on this OST will acquire more quota from quota 
>> master on MDS. This process maybe take some long time on high load 
>> system, especially when the remaining quota on quota master (MDS) is 
>> also very limit. The message you saw just shows that. There is no good 
>> way to disable these message so long as setting quota on this uid/gid.
>>     
>
> This is the heart of Jason''s question -- he has done nothing to
his
> knowledge to enable quotas at all, so why is he getting a message about
> quotas? Are they actually enabled on the FS, and how would he be able to
> verify that?
>
> Or does it always process quotas, even if they are not enabled?
>   

That message, from lustre/obdfilter/filter_io_26.c, is the result of the 
thread taking 35 second
from when it entered filter_commitrw_write() until after it called 
lquota_chkquota() to check the quota.

However, it is certainly plausible that the thread was delayed because 
of something other than quotas,
such as an allocation (eg, it could have been stuck in filter_iobuf_get).

Kevin

Jason Hill

2010-Oct-23 02:17 UTC

head link

[Lustre-discuss] 1.8 quotas

Kevin/Dave/(and Dave from DDN):

Thanks for your replies. From tunefs.lustre --dryrun it is very apparent that
we are not running quotas.

Thanks for your assistance. 

-- 
-Jason

On Fri, Oct 22, 2010 at 04:39:29PM -0400, Kevin Van Maren
wrote:> That message, from lustre/obdfilter/filter_io_26.c, is the result of the 
> thread taking 35 second
> from when it entered filter_commitrw_write() until after it called 
> lquota_chkquota() to check the quota.
> 
> However, it is certainly plausible that the thread was delayed because 
> of something other than quotas,
> such as an allocation (eg, it could have been stuck in filter_iobuf_get).
> 
> Kevin

Bernd Schubert

2010-Oct-23 08:39 UTC

head link

[Lustre-discuss] 1.8 quotas

Hello Jason,

please note that it is also possible to enable quotas using lctl and that 
would not be visible using tunefs.lustre. I think the only real option to 
check if quotas are enabled is to check if quota file exist. For an online 
filesystem ''debugfs -c /dev/device'' is probably the safest way
(there is also
a ''secret'' way how to bind mount the underlying ldiskfs to
another directory,
but I only use that for test filesystems and never in production, as have not 
verified the kernel code path yet).

Either way, you should check for lquota files, such as

root at rhel5-nfs@phys-oss0:~# mount -t ldiskfs /dev/mapper/ost_demofs_2 /mnt

root at rhel5-nfs@phys-oss0:~# ll /mnt
[...]
-rw-r--r-- 1 root root  7168 Oct 23 09:48 lquota_v2.group
-rw-r--r-- 1 root root 71680 Oct 23 09:48 lquota_v2.user

(Of course, you should check that for those OST which have reported the slow 
quota messages).

I just poked around a bit in the code and above the fsfilt_check_slow() check, 
there is also a loop that calls filter_range_is_mapped(). Now this function 
calls fs_bmap() and when that eventually goes to down to ext3, it might get a 
bit slow if, if another thread should modify that file (check out 
linux/fs/inode.c):

/* 
 * bmap() is special.  It gets used by applications such as lilo and by
 * the swapper to find the on-disk block of a specific piece of data.
 *
 * Naturally, this is dangerous if the block concerned is still in the
 * journal.  If somebody makes a swapfile on an ext3 data-journaling
 * filesystem and enables swap, then they may get a nasty shock when the
 * data getting swapped to that swapfile suddenly gets overwritten by
 * the original zero''s written out previously to the journal and
 * awaiting writeback in the kernel''s buffer cache. 
 *
 * So, if we see any bmap calls here on a modified, data-journaled file,
 * take extra steps to flush any blocks which might be in the cache. 
 */

I don''t know though, if it can happen that several threads write to the
same
file. But if it happens, it gets slow. I wonder if a possible swap file is 
worth the  efforts here... In fact, the reason to call 
filter_range_is_mapped() certainly does not require a journal flush in that 
loop. I will check myself next week, if journal flushes are ever made due to 
that and open a Lustre bugzilla then. Avoiding all of that should not be 
difficult

Cheers,
Bernd

On Saturday, October 23, 2010, Jason Hill wrote:> Kevin/Dave/(and Dave from DDN):
> 
> Thanks for your replies. From tunefs.lustre --dryrun it is very apparent
> that we are not running quotas.
> 
> Thanks for your assistance.
> 
> > That message, from lustre/obdfilter/filter_io_26.c, is the result of
the
> > thread taking 35 second
> > from when it entered filter_commitrw_write() until after it called
> > lquota_chkquota() to check the quota.
> > 
> > However, it is certainly plausible that the thread was delayed because
> > of something other than quotas,
> > such as an allocation (eg, it could have been stuck in
filter_iobuf_get).
> > 
> > Kevin
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

-- 
Bernd Schubert
DataDirect Networks

Fan Yong

2010-Oct-23 14:38 UTC

head link

[Lustre-discuss] 1.8 quotas

On 10/23/10 4:39 PM, Bernd Schubert wrote:> Hello Jason,
>
> please note that it is also possible to enable quotas using lctl and that
> would not be visible using tunefs.lustre. I think the only real option to
> check if quotas are enabled is to check if quota file exist. For an online
> filesystem ''debugfs -c /dev/device'' is probably the
safest way (there is also
> a ''secret'' way how to bind mount the underlying ldiskfs
to another directory,
> but I only use that for test filesystems and never in production, as have
not
> verified the kernel code path yet).
>
> Either way, you should check for lquota files, such as
>
> root at rhel5-nfs@phys-oss0:~# mount -t ldiskfs /dev/mapper/ost_demofs_2
/mnt
>
> root at rhel5-nfs@phys-oss0:~# ll /mnt
> [...]
> -rw-r--r-- 1 root root  7168 Oct 23 09:48 lquota_v2.group
> -rw-r--r-- 1 root root 71680 Oct 23 09:48 lquota_v2.user
>
>
> (Of course, you should check that for those OST which have reported the
slow
> quota messages).In fact, once you have performed the command "lfs quotacheck -ug $MNT"
on any client, the two files you mentioned will be created on each OST 
and MDT, even though you performed "lfs quotaoff -ug $MNT" later, such
two files will not be removed. So you can not know whether quota is 
on/off for your system just according to whether such two files exists 
or not. (If quota is off for your system, then"lquota_chkquota()"
called
in "filter_commitrw_write()" will be bypassed directly)

Since you want to disable quota on your system, why not perform "lfs 
quotaoff -ug $MNT" on client directly? such command can be performed 
even if quota is off already, without any harm. If you want to make sure 
whether quota is off, you can try "lfs quota -u $UID $MNT" on client,
if
quota is off, it will report "user quotas are not enabled.".

-
Nasf> I just poked around a bit in the code and above the fsfilt_check_slow()
check,
> there is also a loop that calls filter_range_is_mapped(). Now this function
> calls fs_bmap() and when that eventually goes to down to ext3, it might get
a
> bit slow if, if another thread should modify that file (check out
> linux/fs/inode.c):
>
> /*
>   * bmap() is special.  It gets used by applications such as lilo and by
>   * the swapper to find the on-disk block of a specific piece of data.
>   *
>   * Naturally, this is dangerous if the block concerned is still in the
>   * journal.  If somebody makes a swapfile on an ext3 data-journaling
>   * filesystem and enables swap, then they may get a nasty shock when the
>   * data getting swapped to that swapfile suddenly gets overwritten by
>   * the original zero''s written out previously to the journal and
>   * awaiting writeback in the kernel''s buffer cache.
>   *
>   * So, if we see any bmap calls here on a modified, data-journaled file,
>   * take extra steps to flush any blocks which might be in the cache.
>   */
>
> I don''t know though, if it can happen that several threads write
to the same
> file. But if it happens, it gets slow. I wonder if a possible swap file is
> worth the  efforts here... In fact, the reason to call
> filter_range_is_mapped() certainly does not require a journal flush in that
> loop. I will check myself next week, if journal flushes are ever made due
to
> that and open a Lustre bugzilla then. Avoiding all of that should not be
> difficult
>
> Cheers,
> Bernd
>
>
>
>
>
> On Saturday, October 23, 2010, Jason Hill wrote:
>> Kevin/Dave/(and Dave from DDN):
>>
>> Thanks for your replies. From tunefs.lustre --dryrun it is very
apparent
>> that we are not running quotas.
>>
>> Thanks for your assistance.
>>
>>> That message, from lustre/obdfilter/filter_io_26.c, is the result
of the
>>> thread taking 35 second
>>> from when it entered filter_commitrw_write() until after it called
>>> lquota_chkquota() to check the quota.
>>>
>>> However, it is certainly plausible that the thread was delayed
because
>>> of something other than quotas,
>>> such as an allocation (eg, it could have been stuck in
filter_iobuf_get).
>>>
>>> Kevin
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Lustre discuss - Oct 2010 - 1.8 quotas

[Lustre-discuss] 1.8 quotas

[Lustre-discuss] 1.8 quotas

[Lustre-discuss] 1.8 quotas

[Lustre-discuss] 1.8 quotas

[Lustre-discuss] 1.8 quotas

[Lustre-discuss] 1.8 quotas

[Lustre-discuss] 1.8 quotas