thr3ads.net - Lustre discuss - [Lustre-discuss] Question about No space left on device [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Chad Kerner

2009-Jan-22 18:11 UTC

[Lustre-discuss] Question about No space left on device

Hello,

      We are running lustre 1.6.6 and are seeing a weird error on space 
usage.  The filesystem is not anywhere near full, but writes are failing 
if they hit OST 23.

if I do an lfs setstripe -i 23 chad, and then do
# dd if=/dev/zero of=chad
dd: writing to `chad'': No space left on device
26+0 records in
25+0 records out
#

The actual device is fairly full.
# df /lustre/home/ost_h_24
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/ost_h_24 564172088 509452368  26061444  96% 
/lustre/home/ost_h_24
#

The striping is set as:
stripe_count: 2 stripe_size: 1048576 stripe_offset: -1

So there should be enough space to still create that stripe.  Was 
wondering if anyone else sees this type of error and if so, how can you 
get around it?

# lfs df /u
UUID                 1K-blocks      Used Available  Use% Mounted on
home-MDT0000_UUID    250741088   7437416 228974536    2% /u[MDT:0]
home-OST0000_UUID    564172088 465752856  69742520   82% /u[OST:0]
home-OST0001_UUID    564172088 481188596  54304660   85% /u[OST:1]
home-OST0002_UUID    564172088 476474964  59007024   84% /u[OST:2]
home-OST0003_UUID    564172088 477867116  57604028   84% /u[OST:3]
home-OST0004_UUID    564172088 467160140  68338168   82% /u[OST:4]
home-OST0005_UUID    564172088 460648340  74822768   81% /u[OST:5]
home-OST0006_UUID    564172088 461577612  73936192   81% /u[OST:6]
home-OST0007_UUID    564172088 463904716  71582888   82% /u[OST:7]
home-OST0008_UUID    564172088 490786452  67629072   86% /u[OST:8]
home-OST0009_UUID    564172088 494448544  41036872   87% /u[OST:9]
home-OST000a_UUID    564172088 469107600  66406208   83% /u[OST:10]
home-OST000b_UUID    564172088 471937184  63550148   83% /u[OST:11]
home-OST000c_UUID    564172088 470872456  64614456   83% /u[OST:12]
home-OST000d_UUID    564172088 468447640  67066156   83% /u[OST:13]
home-OST000e_UUID    564172088 461661040  73823812   81% /u[OST:14]
home-OST000f_UUID    564172088 472427452  63086356   83% /u[OST:15]
home-OST0010_UUID    564172088 465760292  69712320   82% /u[OST:16]
home-OST0011_UUID    564172088 463973452  71515716   82% /u[OST:17]
home-OST0012_UUID    564172088 478922600  56568572   84% /u[OST:18]
home-OST0013_UUID    564172088 488618676  46890016   86% /u[OST:19]
home-OST0014_UUID    564172088 472108048  63382768   83% /u[OST:20]
home-OST0015_UUID    564172088 489621560  45874056   86% /u[OST:21]
home-OST0016_UUID    564172088 475357600  83082828   84% /u[OST:22]
home-OST0017_UUID    564172088 509437736  26069524   90% /u[OST:23]

filesystem summary:  13540130112 11398062672 1499647128   84% /u


Thanks,
Chad

Brian J. Murrell

2009-Jan-22 19:36 UTC

head link

[Lustre-discuss] Question about No space left on device

On Thu, 2009-01-22 at 12:11 -0600, Chad Kerner wrote:> Hello,
> 
>       We are running lustre 1.6.6 and are seeing a weird error on space 
> usage.  The filesystem is not anywhere near full, but writes are failing 
> if they hit OST 23.
> 
> if I do an lfs setstripe -i 23 chad, and then do
> # dd if=/dev/zero of=chad
> dd: writing to `chad'': No space left on device
> 26+0 records in
> 25+0 records out
> #
> 
> The actual device is fairly full.
> # df /lustre/home/ost_h_24
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/mapper/ost_h_24 564172088 509452368  26061444  96% 
> /lustre/home/ost_h_24
At 4% free, unless you have changed the "reserved space" on the
OSTs''
filesystem (see the ops manual) you are into the space that a normal
user is not allowed to write (and gets ENOSPC when he does).  By default
5% of every device is reserved for root.

That said, you really are running that OST quite full.  Historically
(I''m not sure if this still applies -- maybe one of our ext3 experts
can
comment) if you run an ext3 filesystem >80% you start to get performance
degradations.

Are you getting any ENOSPC (-28) errors other than by trying to force a
write to that full OST?  I ask because unless directed specifically to
use a particular OST (as your example does) the MDS should avoid using a
full OST.  If it''s not, that''s a bug.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090122/ccc12098/attachment.bin

Chad Kerner

2009-Jan-22 19:41 UTC

head link

[Lustre-discuss] Question about No space left on device

Brian, yes, the users are not specifying any stripe parameters, and they 
are getting this error.


Chad

On Thu, 22 Jan 2009, Brian J. Murrell wrote:
> On Thu, 2009-01-22 at 12:11 -0600, Chad Kerner wrote:
>> Hello,
>>
>>       We are running lustre 1.6.6 and are seeing a weird error on space
>> usage.  The filesystem is not anywhere near full, but writes are
failing
>> if they hit OST 23.
>>
>> if I do an lfs setstripe -i 23 chad, and then do
>> # dd if=/dev/zero of=chad
>> dd: writing to `chad'': No space left on device
>> 26+0 records in
>> 25+0 records out
>> #
>>
>> The actual device is fairly full.
>> # df /lustre/home/ost_h_24
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/mapper/ost_h_24 564172088 509452368  26061444  96%
>> /lustre/home/ost_h_24
>
> At 4% free, unless you have changed the "reserved space" on the
OSTs''
> filesystem (see the ops manual) you are into the space that a normal
> user is not allowed to write (and gets ENOSPC when he does).  By default
> 5% of every device is reserved for root.
>
> That said, you really are running that OST quite full.  Historically
> (I''m not sure if this still applies -- maybe one of our ext3
experts can
> comment) if you run an ext3 filesystem >80% you start to get performance
> degradations.
>
> Are you getting any ENOSPC (-28) errors other than by trying to force a
> write to that full OST?  I ask because unless directed specifically to
> use a particular OST (as your example does) the MDS should avoid using a
> full OST.  If it''s not, that''s a bug.
>
> b.
>
>

Brian J. Murrell

2009-Jan-22 20:02 UTC

head link

[Lustre-discuss] Question about No space left on device

On Thu, 2009-01-22 at 13:41 -0600, Chad Kerner wrote: > Brian, yes, the users are not specifying any stripe parameters, and they
> are getting this error.
Are you absolutely positive there is no striping policy on the dir (or
parent dir in absence of anything specific on a given dir, or it''s
parent, and so on) they are creating the file in?

Understanding that if a given dir has no specific striping policy it
inherits it''s parent''s policy and that re-curses all the way
up to the
root of the filesystem.

Also, attempted "appends" to any objects (i.e. files) on that OST will
fail with ENOSPC. Maybe that is what you are seeing.

If you are sure of the striping and that the ENOSPC is not from trying
to append to existing files, then you have found a bug. Please file a
bug at our bugzilla. If you can show a test case and prove the
striping, all the better.

If the ENOSPC is exclusively from file appends then you will need to
rebalance the OST using the poor-man''s mirgrate/reblance procedure that
we have described here previously. I am sure you could cook one up.
It''s basically a cp/mv of enough files (to reallocate objects) on that
full OST (lfs find) to get your full OST''s usage down. You have to
deactivate the OST on the MDS first to be sure not to allocate new
objects to it until it''s back in balance.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090122/82846a93/attachment.bin

Heald, Nathan T.

2009-Jan-22 22:07 UTC

head link

[Lustre-discuss] Question about bug 18018 (external journal bug)

I seem to understand that patches are available for Lustre 1.6.7 but I
don''t see any specific patches for affected versions below that, am I
missing something? I am looking for a patch for 1.6.4.3 on RHEL4, would this
fall under the RHEL4 patch? (Comment 34):
https://bugzilla.lustre.org/show_bug.cgi?id=18018

Thanks,
-Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090122/cf7f3cdb/attachment-0001.html

Brian J. Murrell

2009-Jan-22 22:58 UTC

head link

[Lustre-discuss] Question about bug 18018 (external journal bug)

On Thu, 2009-01-22 at 17:07 -0500, Heald, Nathan T.
wrote:> I seem to understand that patches are available for Lustre 1.6.7 but I
> don?t see any specific patches for affected versions below that,
We typically only create patches for the "upcoming release(s)".  Those
patches are what makes the "current release" the "next
release".  The
amount of work it would take to create a version of every patch for
every past release is just far far too much.

This is one of the reasons we urge people to keep up with our release
cycle.
> I am looking for a patch for 1.6.4.3 on RHEL4, would this fall under
> the RHEL4 patch? (Comment 34):
> https://bugzilla.lustre.org/show_bug.cgi?id=18018
Given that this particular patch is for the kernel itself, it escapes
the above clause with regard to Lustre verions but falls under a similar
clause in that we are only going to produce that patch for the
particular RHEL4 that is released with the "next release".  If you
want
that patch for previous RHEL4 kernels you will have to try to port it
(if it even needs porting -- it might just apply cleanly) yourself,
maybe with help from people here in the community.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090122/351820dd/attachment.bin

Heald, Nathan T.

2009-Jan-22 23:33 UTC

head link

[Lustre-discuss] Question about bug 18018 (external journal bug)

That''s fine, I just wanted to make sure I wasn''t missing
something. We can just plan to reboot after every unmount until we are ready to
upgrade.

Thanks,
-Nathan



On 1/22/09 5:58 PM, "Brian J. Murrell" <Brian.Murrell at
Sun.COM> wrote:

On Thu, 2009-01-22 at 17:07 -0500, Heald, Nathan T.
wrote:> I seem to understand that patches are available for Lustre 1.6.7 but I
> don''t see any specific patches for affected versions below that,
We typically only create patches for the "upcoming release(s)".  Those
patches are what makes the "current release" the "next
release".  The
amount of work it would take to create a version of every patch for
every past release is just far far too much.

This is one of the reasons we urge people to keep up with our release
cycle.
> I am looking for a patch for 1.6.4.3 on RHEL4, would this fall under
> the RHEL4 patch? (Comment 34):
> https://bugzilla.lustre.org/show_bug.cgi?id=18018
Given that this particular patch is for the kernel itself, it escapes
the above clause with regard to Lustre verions but falls under a similar
clause in that we are only going to produce that patch for the
particular RHEL4 that is released with the "next release".  If you
want
that patch for previous RHEL4 kernels you will have to try to port it
(if it even needs porting -- it might just apply cleanly) yourself,
maybe with help from people here in the community.

b.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090122/18e8ba3c/attachment.html

Lustre discuss - Jan 2009 - Question about No space left on device

[Lustre-discuss] Question about No space left on device

[Lustre-discuss] Question about No space left on device

[Lustre-discuss] Question about No space left on device

[Lustre-discuss] Question about No space left on device

[Lustre-discuss] Question about bug 18018 (external journal bug)

[Lustre-discuss] Question about bug 18018 (external journal bug)

[Lustre-discuss] Question about bug 18018 (external journal bug)