thr3ads.net - zfs discuss - [zfs-discuss] [raidz] file not removed: No space left on device [Jul 2006]

If this information is useful, please help other people find it:
Share via:

Tatjana S Heuser

2006-Jul-03 12:08 UTC

[zfs-discuss] [raidz] file not removed: No space left on device

On a system still running nv_30, I''ve a small RaidZ filled to the brim:

2 3 root at mir pts/9 ~ 78# uname -a
SunOS mir 5.11 snv_30 sun4u sparc SUNW,UltraAX-MP

0 3 root at mir pts/9 ~ 50# zfs list
NAME                   USED  AVAIL  REFER  MOUNTPOINT
mirpool1              33.6G      0   137K  /mirpool1
mirpool1/home         12.3G      0  12.3G  /export/home
mirpool1/install      12.9G      0  12.9G  /export/install
mirpool1/local        1.86G      0  1.86G  /usr/local
mirpool1/opt          4.76G      0  4.76G  /opt
mirpool1/sfw           752M      0   752M  /usr/sfw

Trying to free some space is meeting a lot of reluctance, though:
0 3 root at mir pts/9 ~ 51# rm debug.log 
rm: debug.log not removed: No space left on device
0 3 root at mir pts/9 ~ 55# rm -f debug.log
2 3 root at mir pts/9 ~ 56# ls -l debug.log 
-rw-r--r--   1 th122    420        27048 Jun 29 23:24 debug.log
0 3 root at mir pts/9 ~ 58# :> debug.log 
debug.log: No space left on device.
0 3 root at mir pts/9 ~ 63# ls -l debug.log
-rw-r--r--   1 th122    420        27048 Jun 29 23:24 debug.log

There are no snapshots, so removing/clearing the files /should/ 
be a way to free some space there.

Of course this is the same filesystem where zdb dumps core 
- see:

*Synopsis*: zdb dumps core - bad checksum
http://bt2ws.central.sun.com/CrPrint?id=6437157
*Change Request ID*: 6437157

(zpool reports the RaidZ pool as healthy while
zdb crashes with a ''bad checksum'' message.)
 
 
This message posted from opensolaris.org

Constantin Gonzalez

2006-Jul-03 12:23 UTC

head link

[zfs-discuss] [raidz] file not removed: No space left on device

Hi,

of course, the reason for this is the copy-on-write approach: ZFS has
to write new blocks first before the modification of the FS structure
can reflect the state with the deleted blocks removed.

The only way out of this is of course to grow the pool. Once ZFS learns
how to free up vdevs this may become a better solution because you can then
shrink the pool again after the rming.

I expect many customers to run into similar problems and I''ve already
gotten
a number of "what if the pool is full" questions. My answer has always
been
"No file system should be used up more than 90% for a number of
reasons", but
in practice this is hard to ensure.

Perhaps this is a good opportunity for an RFE: ZFS should reserve enough
blocks in a pool in order to always be able to rm and destroy stuff.

Best regards,
   Constantin

P.S.: Most US Sun employees are on vacation this week, so don''t be
alarmed
if the really good answers take some time :).

Tatjana S Heuser wrote:> On a system still running nv_30, I''ve a small RaidZ filled to the
brim:
> 
> 2 3 root at mir pts/9 ~ 78# uname -a
> SunOS mir 5.11 snv_30 sun4u sparc SUNW,UltraAX-MP
> 
> 0 3 root at mir pts/9 ~ 50# zfs list
> NAME                   USED  AVAIL  REFER  MOUNTPOINT
> mirpool1              33.6G      0   137K  /mirpool1
> mirpool1/home         12.3G      0  12.3G  /export/home
> mirpool1/install      12.9G      0  12.9G  /export/install
> mirpool1/local        1.86G      0  1.86G  /usr/local
> mirpool1/opt          4.76G      0  4.76G  /opt
> mirpool1/sfw           752M      0   752M  /usr/sfw
> 
> Trying to free some space is meeting a lot of reluctance, though:
> 0 3 root at mir pts/9 ~ 51# rm debug.log 
> rm: debug.log not removed: No space left on device
> 0 3 root at mir pts/9 ~ 55# rm -f debug.log
> 2 3 root at mir pts/9 ~ 56# ls -l debug.log 
> -rw-r--r--   1 th122    420        27048 Jun 29 23:24 debug.log
> 0 3 root at mir pts/9 ~ 58# :> debug.log 
> debug.log: No space left on device.
> 0 3 root at mir pts/9 ~ 63# ls -l debug.log
> -rw-r--r--   1 th122    420        27048 Jun 29 23:24 debug.log
> 
> There are no snapshots, so removing/clearing the files /should/ 
> be a way to free some space there.
> 
> Of course this is the same filesystem where zdb dumps core 
> - see:
> 
> *Synopsis*: zdb dumps core - bad checksum
> http://bt2ws.central.sun.com/CrPrint?id=6437157
> *Change Request ID*: 6437157
> 
> (zpool reports the RaidZ pool as healthy while
> zdb crashes with a ''bad checksum'' message.)
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
Constantin Gonzalez                            Sun Microsystems GmbH, Germany
Platform Technology Group, Client Solutions                http://www.sun.de/
Tel.: +49 89/4 60 08-25 91                   http://blogs.sun.com/constantin/

Eric Schrock

2006-Jul-03 16:19 UTC

head link

[zfs-discuss] [raidz] file not removed: No space left on device

You don''t need to grow the pool.  You should always be able truncate
the
file without consuming more space, provided you don''t have snapshots.
Mark has a set of fixes in testing which do a much better job of
estimating space, allowing us to always unlink files in full pools
(provided there are no snapshots, of course).  This provides much more
logical behavior by reserving some extra slop.

- Eric

On Mon, Jul 03, 2006 at 02:23:06PM +0200, Constantin Gonzalez
wrote:> Hi,
> 
> of course, the reason for this is the copy-on-write approach: ZFS has
> to write new blocks first before the modification of the FS structure
> can reflect the state with the deleted blocks removed.
> 
> The only way out of this is of course to grow the pool. Once ZFS learns
> how to free up vdevs this may become a better solution because you can then
> shrink the pool again after the rming.
> 
> I expect many customers to run into similar problems and I''ve
already gotten
> a number of "what if the pool is full" questions. My answer has
always been
> "No file system should be used up more than 90% for a number of
reasons", but
> in practice this is hard to ensure.
> 
> Perhaps this is a good opportunity for an RFE: ZFS should reserve enough
> blocks in a pool in order to always be able to rm and destroy stuff.
> 
> Best regards,
>    Constantin
> 
> P.S.: Most US Sun employees are on vacation this week, so don''t be
alarmed
> if the really good answers take some time :).
--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Nathan Kroenert

2006-Jul-03 23:14 UTC

head link

[zfs-discuss] [raidz] file not removed: No space left on device

That''s excellent news, as with the frequency that customers
applications
go feral and write a whole heap of crap (or they don''t watch closely
enough with gradual filling) we will forever be getting calls if this
functionality is *anything* but transparent...

Most explorers I see have filesystem 100% full messages in them...

It will be interesting to see how the current S10_u2 bits go. :)

Nathan. 


On Tue, 2006-07-04 at 02:19, Eric Schrock wrote:> You don''t need to grow the pool.  You should always be able
truncate the
> file without consuming more space, provided you don''t have
snapshots.
> Mark has a set of fixes in testing which do a much better job of
> estimating space, allowing us to always unlink files in full pools
> (provided there are no snapshots, of course).  This provides much more
> logical behavior by reserving some extra slop.
> 
> - Eric
> 
> On Mon, Jul 03, 2006 at 02:23:06PM +0200, Constantin Gonzalez wrote:
> > Hi,
> > 
> > of course, the reason for this is the copy-on-write approach: ZFS has
> > to write new blocks first before the modification of the FS structure
> > can reflect the state with the deleted blocks removed.
> > 
> > The only way out of this is of course to grow the pool. Once ZFS
learns
> > how to free up vdevs this may become a better solution because you can
then
> > shrink the pool again after the rming.
> > 
> > I expect many customers to run into similar problems and I''ve
already gotten
> > a number of "what if the pool is full" questions. My answer
has always been
> > "No file system should be used up more than 90% for a number of
reasons", but
> > in practice this is hard to ensure.
> > 
> > Perhaps this is a good opportunity for an RFE: ZFS should reserve
enough
> > blocks in a pool in order to always be able to rm and destroy stuff.
> > 
> > Best regards,
> >    Constantin
> > 
> > P.S.: Most US Sun employees are on vacation this week, so
don''t be alarmed
> > if the really good answers take some time :).
> 
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss--

Constantin Gonzalez

2006-Jul-04 07:10 UTC

head link

[zfs-discuss] [raidz] file not removed: No space left on device

Hi Eric,

Eric Schrock wrote:> You don''t need to grow the pool.  You should always be able
truncate the
> file without consuming more space, provided you don''t have
snapshots.
> Mark has a set of fixes in testing which do a much better job of
> estimating space, allowing us to always unlink files in full pools
> (provided there are no snapshots, of course).  This provides much more
> logical behavior by reserving some extra slop.
is this a planned and not yet implemented functionality or why did Tatjana
see the "not able to rm" behaviour?

Or should she use unlink(1M) in these cases?

Best regards,
   Constantin
> 
> - Eric
> 
> On Mon, Jul 03, 2006 at 02:23:06PM +0200, Constantin Gonzalez wrote:
>> Hi,
>>
>> of course, the reason for this is the copy-on-write approach: ZFS has
>> to write new blocks first before the modification of the FS structure
>> can reflect the state with the deleted blocks removed.
>>
>> The only way out of this is of course to grow the pool. Once ZFS learns
>> how to free up vdevs this may become a better solution because you can
then
>> shrink the pool again after the rming.
>>
>> I expect many customers to run into similar problems and I''ve
already gotten
>> a number of "what if the pool is full" questions. My answer
has always been
>> "No file system should be used up more than 90% for a number of
reasons", but
>> in practice this is hard to ensure.
>>
>> Perhaps this is a good opportunity for an RFE: ZFS should reserve
enough
>> blocks in a pool in order to always be able to rm and destroy stuff.
>>
>> Best regards,
>>    Constantin
>>
>> P.S.: Most US Sun employees are on vacation this week, so
don''t be alarmed
>> if the really good answers take some time :).
> 
> --
> Eric Schrock, Solaris Kernel Development      
http://blogs.sun.com/eschrock
-- 
Constantin Gonzalez                            Sun Microsystems GmbH, Germany
Platform Technology Group, Client Solutions                http://www.sun.de/
Tel.: +49 89/4 60 08-25 91                   http://blogs.sun.com/constantin/

Eric Schrock

2006-Jul-04 16:47 UTC

head link

[zfs-discuss] [raidz] file not removed: No space left on device

On Tue, Jul 04, 2006 at 09:10:11AM +0200, Constantin Gonzalez
wrote:> Hi Eric,
> 
> Eric Schrock wrote:
> > You don''t need to grow the pool.  You should always be able
truncate the
> > file without consuming more space, provided you don''t have
snapshots.
> > Mark has a set of fixes in testing which do a much better job of
> > estimating space, allowing us to always unlink files in full pools
> > (provided there are no snapshots, of course).  This provides much more
> > logical behavior by reserving some extra slop.
> 
> is this a planned and not yet implemented functionality or why did Tatjana
> see the "not able to rm" behaviour?
As I mentioned, Mark has a set of fixes in testing.  They should be
available sometime in the near future.  In the meantime, you can
truncate large files to free up space instead - because these don''t
involve rewriting the parent directory pointers, this should always
work.

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock

Matthew C Aycock

2006-Jul-05 13:54 UTC

head link

[zfs-discuss] Re: [raidz] file not removed: No space left on device

Eric,

To ask the obvious but crucial question :) What is the best way to truncate the
file on ZFS?
 
 
This message posted from opensolaris.org

Tatjana S Heuser

2006-Jul-06 10:07 UTC

head link

[zfs-discuss] Re: [raidz] file not removed: No space left on device

Unfortunately, truncating files doesn''t work either.
> > Eric Schrock wrote:
> > > You don''t need to grow the pool.  You should always be
able truncate the
> > > file without consuming more space, provided you don''t
have snapshots.
>[..] In the meantime, you can
> truncate large files to free up space instead - because these
don''t
> involve rewriting the parent directory pointers, this should always work.
As quoted in the original posting, truncating did not work. It does
"work" for
files of size 0 though ;) (i.e. returns 0, while a failed truncate returns 1)

0 3 root at mir pts/9 ~ 211# find . -type f -size 0
./tex/latexmk/latexmk.txt
^C
1 3 root at mir pts/9 ~ 212# :> ./tex/latexmk/latexmk.txt
0 3 root at mir pts/9 ~ 213# find . -type f -size 1
[..]
./tex/labels/test/test2.tex
^C
1 3 root at mir pts/9 ~ 214# ls -l ./tex/labels/test/test2.tex
-rw-r--r--   1 th122    420          121 Nov  7  2005
./tex/labels/test/test2.tex
0 3 root at mir pts/9 ~ 215# :> ./tex/labels/test/test2.tex
./tex/labels/test/test2.tex: No space left on device.
1 3 root at mir pts/9 ~ 216# :>
/export/install/SunDownload/sol-nv-b42a-x86-v2-iso.zip
/export/install/SunDownload/sol-nv-b42a-x86-v2-iso.zip: No space left on device.

Other, more verbose means (than the shell NOP) of truncating a file give the
same result.
Unlink doesn''t work either - returning 255:

0 3 root at mir pts/9 ~ 219# unlink
/export/install/SunDownload/sol-nv-b42a-x86-v2-iso.zip
unlink: No space left on device
255 3 root at mir pts/9 ~ 220# 

Interesting situation. 
  -- Tatjana
 
 
This message posted from opensolaris.org

Tatjana S Heuser

2006-Jul-08 01:42 UTC

head link

[zfs-discuss] Summary: [raidz] file not removed: No space left on device

Thanks to Constantin Gonzalez and Eric Schrock for answering my initial
report.

- Truncating files to free up some space had worked in the past but not
  this time.
  From my experiment it seems to be possible to fill up a filesystem
  beyond that, for even truncating was met by "No space left on
  device."

- I eventually got out of that squeeze by dissolving one of the smaller
  filesystems in that pool using zfs destroy to get rid of /opt/sfw.

- Afterwards, the system was upgraded from nv30 to nv42a. (Exporting
  the pool and reimporting went smoothly. Great! 

- I''ll need the backup anyway though, since I want to give double
  parity a try :)

- I had found the free space guesstimate of zfs to be quite fluctuating
  on that filesystem, and upgrading was no different. Is this
  indicating that the calculations are already a bit more on the
  conservative side, to avoid situations like the one experienced?

  0 3 root at mir pts/9 ~ 50# zfs list
  NAME USED AVAIL REFER MOUNTPOINT
  mirpool1 33.6G 0 137K /mirpool1
  mirpool1/home 12.3G 0 12.3G /export/home
  mirpool1/install 12.9G 0 12.9G /export/install
  mirpool1/local 1.86G 0 1.86G /usr/local
  mirpool1/opt 4.76G 0 4.76G /opt
  mirpool1/sfw 752M 0 752M /usr/sfw

- after dissolving the sfw filesytem, free space was indicated as 
  600-odd MB (I forgot to take a log).

- after exporting the pool, and reimporting unter nv42a, free space is
  shown as:

  0 3 root at mir pts/4 ~ 30# zfs list
  NAME                   USED  AVAIL  REFER  MOUNTPOINT
  mirpool1              32.9G   372M   137K  /mirpool1
  mirpool1/home         12.3G   372M  12.3G  /export/home
  mirpool1/install      12.9G   372M  12.9G  /export/install
  mirpool1/local        1.86G   372M  1.86G  /usr/local
  mirpool1/opt          4.76G   372M  4.76G  /opt

And, finally:
- Under nv42a, zdb goes a little bit further before throwing its core
  at the sight of that pool:

  Traversing all blocks to verify checksums and verify nothing leaked ...
  Assertion failed: dmu_read(os, smo->smo_object, offset, size, entry_map) ==
0 (0x5 == 0x0), file .
./../../uts/common/fs/zfs/space_map.c, line 327
  Abort (core dumped)

Core and description at http://opensolaris.in-berlin.de/core/ (currently
uploading)
Should I file a new bug, or is there a way to append the new information to to 
bug 6437157 (filed against nv30) ?

Next step was to upgrade the pool. This time zdb gets even further. 
Right now it''s still happily running cumulating lines and lines of:
 zdb_blkptr_cb: Got error 50 reading <114, 0, 1, 5>  -- skipping
Looks like it may take a while to chew on that.

--Tatjana
 
 
This message posted from opensolaris.org

Reasonably Related Threads

Search for more apparently analagous threads

zfs discuss - Jul 2006 - [raidz] file not removed: No space left on device

[zfs-discuss] [raidz] file not removed: No space left on device

[zfs-discuss] [raidz] file not removed: No space left on device

[zfs-discuss] [raidz] file not removed: No space left on device

[zfs-discuss] [raidz] file not removed: No space left on device

[zfs-discuss] [raidz] file not removed: No space left on device

[zfs-discuss] [raidz] file not removed: No space left on device

[zfs-discuss] Re: [raidz] file not removed: No space left on device

[zfs-discuss] Re: [raidz] file not removed: No space left on device

[zfs-discuss] Summary: [raidz] file not removed: No space left on device

Reasonably Related Threads