thr3ads.net - Lustre discuss - [Lustre-discuss] ost''s reporting full [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Stuart Midgley

2010-Sep-11 08:27 UTC

[Lustre-discuss] ost''s reporting full

We are getting jobs that fail due to no space left on device.

BUT none of our lustre servers are full (as reported by lfs df -h on a client
and by df -h on the oss''s).

They are all close to being full, but are not actually full (still have ~300gb
of space left)

I''ve tried playing around with tune2fs -m {0,1,2,3} and tune2fs -r 1024
etc and nothing appears to help.

Anyone have a similar problem?  We are running 1.8.3


-- 
Dr Stuart Midgley
sdm900 at gmail.com

Robin Humble

2010-Sep-11 09:27 UTC

head link

[Lustre-discuss] ost''s reporting full

Hey Dr Stu,

On Sat, Sep 11, 2010 at 04:27:43PM +0800, Stuart Midgley
wrote:>We are getting jobs that fail due to no space left on device.
>BUT none of our lustre servers are full (as reported by lfs df -h on a
client and by df -h on the oss''s).
>They are all close to being full, but are not actually full (still have
~300gb of space left)
sounds like a grant problem.
>I''ve tried playing around with tune2fs -m {0,1,2,3} and tune2fs -r
1024 etc and nothing appears to help.
>Anyone have a similar problem?  We are running 1.8.3
there are a couple of grant leaks that are fixed in 1.8.4 eg.
  https://bugzilla.lustre.org/show_bug.cgi?id=22755
or see the 1.8.4 release notes.

however the overall grant revoking problem is still unresolved AFAICT
  https://bugzilla.lustre.org/show_bug.cgi?id=12069
and you''ll hit that issue more frequently with many clients and small
OSTs, or when any OST starts getting full.

in your case 300g per OST should be enough headroom unless you have
~4k clients now (assuming 32-64m grants per client), so it''s probably
grant leaks. there''s a recipe for adding up client grants and comparing
them to server grants to see if they''ve gone wrong in bz 22755.

cheers,
robin

Malcolm Cowe

2010-Sep-12 00:14 UTC

head link

[Lustre-discuss] ost''s reporting full

On 11/09/2010 19:27, Robin Humble wrote:> Hey Dr Stu,
>
> On Sat, Sep 11, 2010 at 04:27:43PM +0800, Stuart Midgley wrote:
>> We are getting jobs that fail due to no space left on device.
>> BUT none of our lustre servers are full (as reported by lfs df -h on a
client and by df -h on the oss''s).
>> They are all close to being full, but are not actually full (still have
~300gb of space left)
> sounds like a grant problem.
>
>> I''ve tried playing around with tune2fs -m {0,1,2,3} and
tune2fs -r 1024 etc and nothing appears to help.
>> Anyone have a similar problem?  We are running 1.8.3
> there are a couple of grant leaks that are fixed in 1.8.4 eg.
>    https://bugzilla.lustre.org/show_bug.cgi?id=22755
> or see the 1.8.4 release notes.
>
> however the overall grant revoking problem is still unresolved AFAICT
>    https://bugzilla.lustre.org/show_bug.cgi?id=12069
> and you''ll hit that issue more frequently with many clients and
small
> OSTs, or when any OST starts getting full.
>
> in your case 300g per OST should be enough headroom unless you have
> ~4k clients now (assuming 32-64m grants per client), so it''s
probably
> grant leaks. there''s a recipe for adding up client grants and
comparing
> them to server grants to see if they''ve gone wrong in bz 22755.
>Per BZ 22755, comment #96 
(https://bugzilla.lustre.org/show_bug.cgi?id=22755#c96), you can arrest 
the grant leak by changing the "grant shrink interval" to a large
value
(if you want to reset the server side grant reservation, you will have 
to remount the OSTs). We have applied this work-around to our system 
with good results. We have been monitoring our file systems with Nagios 
and have not encountered a repeat of this problem.

Malcolm.

Lustre discuss - Sep 2010 - ost's reporting full

[Lustre-discuss] ost''s reporting full

[Lustre-discuss] ost''s reporting full

[Lustre-discuss] ost''s reporting full