Brad Diggs
2008-Jun-04 16:40 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
Hello, A customer recently brought to my attention that ZFS can get into a situation where the filesystem is full but no files can be removed. The workaround is to remove a snapshot and then you should have enough free space to remove a file. Here is a sample series of commands to reproduce the problem. # mkfile 1g /tmp/disk.raw # zpool create -f zFullPool /tmp/disk2.raw # sz=`df -k /zFullPool | awk ''{ print $2 }'' | tail -1` # mkfile $((${sz}-1024))k /zFullPool/f1 # zfs snapshot zFullPool at snap # sz=`df -k /zFullPool | awk ''{ print $2 }'' | tail -1` # mkfile ${sz}k /zFullPool/f2 /zFullPool/f2: initialized 401408 of 1031798784 bytes: No space left on device # df -k /zFullPool Filesystem kbytes used avail capacity Mounted on zFullPool 1007659 1007403 0 100% /zFullPool # rm -f /zFullPool/f1 # ls -al /zFullPool total 2014797 drwxr-xr-x 2 root sys 4 Jun 4 12:15 . drwxr-xr-x 31 root root 18432 Jun 4 12:14 .. -rw------T 1 root root 1030750208 Jun 4 12:15 f1 -rw------- 1 root root 1031798784 Jun 4 12:15 f2 # rm -f /zFullPool/f2 # ls -al /zFullPool total 2014797 drwxr-xr-x 2 root sys 4 Jun 4 12:15 . drwxr-xr-x 31 root root 18432 Jun 4 12:14 .. -rw------T 1 root root 1030750208 Jun 4 12:15 f1 -rw------- 1 root root 1031798784 Jun 4 12:15 f2 At this point, the only way in which I can free up sufficient space to remove either file is to first remove the snapshot. # zfs destroy zFullPool at snap # rm -f /zFullPool/f1 # ls -al /zFullPool total 1332 drwxr-xr-x 2 root sys 3 Jun 4 12:17 . drwxr-xr-x 31 root root 18432 Jun 4 12:14 .. -rw------- 1 root root 1031798784 Jun 4 12:15 f2 Is there an existing bug on this that is going to address enabling the removal of a file without the pre-requisite removal of a snapshot? Thanks in advance, Brad -- --------------------------------------------------------------------- _/_/_/ _/ _/ _/ _/ Brad Diggs _/ _/ _/ _/_/ _/ Communications Area Market _/_/_/ _/ _/ _/ _/ _/ Senior Directory Architect _/ _/ _/ _/ _/_/ _/_/_/ _/_/_/ _/ _/ Office: 972-992-0002 E-Mail: Bradley.Diggs at Sun.Com M I C R O S Y S T E M S
Keith Bierman
2008-Jun-04 17:16 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
On Jun 4, 2008, at 10:40 AM, Brad Diggs wrote:> > At this point, the only way in which I can free up sufficient > space to remove either file is to first remove the snapshot.Can''t you just truncate a large file or two? Sadly I lack the time to try your example right now, but I''d have guessed that cat > bigfile might do the trick. -- Keith H. Bierman khbkhb at gmail.com | AIM kbiermank 5430 Nassau Circle East | Cherry Hills Village, CO 80113 | 303-997-2749 <speaking for myself*> Copyright 2008
Brad Diggs
2008-Jun-06 02:58 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
Hi Keith, Sure you can truncate some files but that effectively corrupts the files in our case and would cause more harm than good. The only files in our volume are data files. Brad On Wed, 2008-06-04 at 11:16 -0600, Keith Bierman wrote:> On Jun 4, 2008, at 10:40 AM, Brad Diggs wrote: > > > > > At this point, the only way in which I can free up sufficient > > space to remove either file is to first remove the snapshot. > > > Can''t you just truncate a large file or two? > > Sadly I lack the time to try your example right now, but I''d have > guessed that cat > bigfile might do the trick. >-- --------------------------------------------------------------------- _/_/_/ _/ _/ _/ _/ Brad Diggs _/ _/ _/ _/_/ _/ Communications Area Market _/_/_/ _/ _/ _/ _/ _/ Senior Directory Architect _/ _/ _/ _/ _/_/ _/_/_/ _/_/_/ _/ _/ Office: 972-992-0002 E-Mail: Bradley.Diggs at Sun.Com M I C R O S Y S T E M S
Keith Bierman
2008-Jun-06 03:13 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
On Jun 5, 2008, at 8:58 PM 6/5/, Brad Diggs wrote:> Hi Keith, > > Sure you can truncate some files but that effectively corrupts > the files in our case and would cause more harm than good. The > only files in our volume are data files. >So an rm is ok, but a truncation is not? Seems odd to me, but if that''s your constraint so be it. -- Keith H. Bierman khbkhb at gmail.com | AIM kbiermank 5430 Nassau Circle East | Cherry Hills Village, CO 80113 | 303-997-2749 <speaking for myself*> Copyright 2008
Nicolas Williams
2008-Jun-06 03:21 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
On Thu, Jun 05, 2008 at 09:13:24PM -0600, Keith Bierman wrote:> On Jun 5, 2008, at 8:58 PM 6/5/, Brad Diggs wrote: > > Hi Keith, > > > > Sure you can truncate some files but that effectively corrupts > > the files in our case and would cause more harm than good. The > > only files in our volume are data files. > > So an rm is ok, but a truncation is not? > > Seems odd to me, but if that''s your constraint so be it.Neither will help since before the space can be freed a transaction must be written, which in turn requires free space. (So you say "let ZFS save some just-in-case-space for this," but, how much is enough?)
Richard L. Hamilton
2008-Jun-06 07:33 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
> On Thu, Jun 05, 2008 at 09:13:24PM -0600, Keith > Bierman wrote: > > On Jun 5, 2008, at 8:58 PM 6/5/, Brad Diggs > wrote: > > > Hi Keith, > > > > > > Sure you can truncate some files but that > effectively corrupts > > > the files in our case and would cause more harm > than good. The > > > only files in our volume are data files. > > > > So an rm is ok, but a truncation is not? > > > > Seems odd to me, but if that''s your constraint so > be it. > > Neither will help since before the space can be freed > a transaction must > be written, which in turn requires free space. > > (So you say "let ZFS save some just-in-case-space for > this," but, how > much is enough?)If you make it a parameter, that''s the admin''s problem. Although since each rm of a file also present in a snapshot just increases the divergence, only an rm of a file _not_ present in a snapshot would actually recover space, right? So in some circumstances, even if it''s the admin''s problem, there might be no amount that''s enough to do what one wants to do without removing a snapshot. Specifically, take a snapshot of a filesystem that''s very nearly full, and then use dd or whatever to create a single new file that fills up the filesystem. At that point, only removing that single new file will help, and even that''s not possible without a just-in-case reserve of enough to handle worst case metadata(including system attributes, if any) update+transaction log+\ any other fudge I forgot, for at least one file''s worth. Maybe that''s a simplistic view of the scenario, I dunno... This message posted from opensolaris.org
Brad Diggs
2008-Jun-10 21:24 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
Great point. Hadn''t thought of it in that way. I haven''t tried truncating a file prior to trying to remove it. Either way though, I think it is a bug if once the filesystem fills up, you can''t remove a file. Brad On Thu, 2008-06-05 at 21:13 -0600, Keith Bierman wrote:> On Jun 5, 2008, at 8:58 PM 6/5/, Brad Diggs wrote: > > > Hi Keith, > > > > Sure you can truncate some files but that effectively corrupts > > the files in our case and would cause more harm than good. The > > only files in our volume are data files. > > > > > > So an rm is ok, but a truncation is not? > > Seems odd to me, but if that''s your constraint so be it. >-- --------------------------------------------------------------------- _/_/_/ _/ _/ _/ _/ Brad Diggs _/ _/ _/ _/_/ _/ Communications Area Market _/_/_/ _/ _/ _/ _/ _/ Senior Directory Architect _/ _/ _/ _/ _/_/ _/_/_/ _/_/_/ _/ _/ Office: 972-992-0002 E-Mail: Bradley.Diggs at Sun.Com M I C R O S Y S T E M S
Richard L. Hamilton
2008-Jun-13 10:07 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
I wonder if one couldln''t reduce (but probably not eliminate) the likelihood of this sort of situation by setting refreservation significantly lower than reservation? Along those lines, I don''t see any property that would restrict the number of concurrent snapshots of a dataset :-( I think that would be real handy, along with one that would say whether to refuse another when the limit was reached, or to automatically delete the oldest snapshot. Yes, one can script the rotation of snapshots, but it might be nice to just make it policy for a given dataset instead, particularly together with delegated snapshot permission (provided that that didn''t also delegate the ability to change the maximum number of allowed snapshots). This message posted from opensolaris.org
Lance
2008-Jun-19 05:52 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
It''s probably this bug: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6453407 We''ve been affected by the same problem on our X4500 Thumpers. Although the bug report claims a fix was delivered in solaris_nevada(snv_70), I''ve yet to see an official patch released for it (we run Solaris, not OpenSolaris, in production). Our workaround is to undercommit the zpool on quotas or teach users to truncate files via the shell (e.g., ''cat /dev/null >! FILE'' in tcsh). This message posted from opensolaris.org
Rudolf Potucek
2009-Oct-01 18:03 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
Hmm ... I understand this is a bug, but only in the sense that the message is not sufficiently descriptive. Removing the file from the source filesystem will not necessarily free any space because the blocks have to be retained in the snapshots. The same problem exists for zeroing the file with >file as suggested earlier. It seems like the appropriate solution would be to have a tool that allows removing a file from one or more snapshots at the same time as removing the source ... Rudolf -- This message posted from opensolaris.org
Nicolas Williams
2009-Oct-01 18:09 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
On Thu, Oct 01, 2009 at 11:03:06AM -0700, Rudolf Potucek wrote:> Hmm ... I understand this is a bug, but only in the sense that the > message is not sufficiently descriptive. Removing the file from the > source filesystem will not necessarily free any space because the > blocks have to be retained in the snapshots. The same problem exists > for zeroing the file with >file as suggested earlier. > > It seems like the appropriate solution would be to have a tool that > allows removing a file from one or more snapshots at the same time as > removing the source ...That would make them not really snapshots. And such a tool would have to "fix" clones too. Snapshot and clones are great. They are also great ways to consume too much space. One must do some spring cleaning once in a while. Nico --
Andrew Gabriel
2009-Oct-01 18:34 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
Rudolf Potucek wrote:> Hmm ... I understand this is a bug, but only in the sense that the message is not sufficiently descriptive. Removing the file from the source filesystem will not necessarily free any space because the blocks have to be retained in the snapshots.and if it''s in a snapshot, it might need more blocks because you now need a copy of the parent directory with that file removed, whilst the snapshot parent directory version still has it in.> The same problem exists for zeroing the file with >file as suggested earlier. >Pick a file which isn''t in a snapshot (either because it''s been created since the most recent snapshot, or because it''s been rewritten since the most recent snapshot so it''s no longer sharing blocks with the snapshot version).> It seems like the appropriate solution would be to have a tool that allows removing a file from one or more snapshots at the same time as removing the source ... > > Rudolf >-- Andrew
Chris Ridd
2009-Oct-01 19:50 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
On 1 Oct 2009, at 19:34, Andrew Gabriel wrote:> Pick a file which isn''t in a snapshot (either because it''s been > created since the most recent snapshot, or because it''s been > rewritten since the most recent snapshot so it''s no longer sharing > blocks with the snapshot version).Out of curiosity, is there an easy way to find such a file? Cheers, Chris
Robert Milkowski
2009-Oct-02 12:48 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
Chris Ridd wrote:> > On 1 Oct 2009, at 19:34, Andrew Gabriel wrote: > >> Pick a file which isn''t in a snapshot (either because it''s been >> created since the most recent snapshot, or because it''s been >> rewritten since the most recent snapshot so it''s no longer sharing >> blocks with the snapshot version). > > Out of curiosity, is there an easy way to find such a file? >Find files with modification or creation time after last snapshot was created. Files which were modified after may still have most of their blocks refered by a snapshot though. -- Robert Milkowski http://milek.blogspot.com
Rudolf Potucek
2009-Oct-02 15:32 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
> > It seems like the appropriate solution would be to > have a tool that > > allows removing a file from one or more snapshots > at the same time as > > removing the source ... > > That would make them not really snapshots. And such > a tool would have > to "fix" clones too.While I concur that being able to remove files from snapshots is somewhat against the concept behind snapshots, I feel that there is a tradeoff here for the administrator: Let''s say we accidentally snapshotted a very large temporary file. We don''t need the file and we don''t need its snapshot. Yet the only way to free the space taken up by this accidentally snapshotted file is to delete the WHOLE snapshot, including all the files of which snapshots may be required. To paraphrase: that would make this snapshot not really a snapshot ANYMORE. At this point having a separate tool that allows you to do "spring cleaning" and deleting files from snapshots would quite possibly be more in the spirit of snapshotting than having to delete snapshots. Just my $.02, Rudolf -- This message posted from opensolaris.org
Erik Trimble
2009-Oct-02 21:38 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
Rudolf Potucek wrote:>>> It seems like the appropriate solution would be to >>> >> have a tool that >> >>> allows removing a file from one or more snapshots >>> >> at the same time as >> >>> removing the source ... >>> >> That would make them not really snapshots. And such >> a tool would have >> to "fix" clones too. >> > > While I concur that being able to remove files from snapshots is somewhat against the concept behind snapshots, I feel that there is a tradeoff here for the administrator: > > Let''s say we accidentally snapshotted a very large temporary file. We don''t need the file and we don''t need its snapshot. Yet the only way to free the space taken up by this accidentally snapshotted file is to delete the WHOLE snapshot, including all the files of which snapshots may be required. To paraphrase: that would make this snapshot not really a snapshot ANYMORE. > > At this point having a separate tool that allows you to do "spring cleaning" and deleting files from snapshots would quite possibly be more in the spirit of snapshotting than having to delete snapshots. > > Just my $.02, > > Rudolf >NO. Snapshotting is sacred - once you break the model where a snapshot is a point-in-time picture, all sorts of bad things can happen. You''ve changed a fundamental assumption of snapshots, and this then impacts how we view them from all sorts of angles; it''s a huge loss to trade away for a very small gain. Should you want to modify a snapshot for some reason, that''s what the ''zfs clone'' function is for. clone your snapshot, promote it, and make your modifications. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
Rudolf Potucek
2009-Oct-02 22:46 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
> NO. Snapshotting is sacredLOL! Ok, ok, I admit that snapshotting the whole ZFS root filesystem (yes, we have ZFS root in production, oops) instead of creating individual snapshots for *each* individual ZFS is against the code of good sysadmin-ing. I bow to the developer gods and will only follow the approved gospel in the future ;)> once you break the model where a snapshot is a point-in-time picture, all sorts of bad things can happen. You''ve changed a fundamental assumption of snapshots, and this then impacts how we view them from all sorts of angles; it''s a huge loss to trade away for a very small gain.Hmm ... I can see how the assumption of a snapshot being unalterable could provide some programming shortcuts and opportunities for optimization of ZFS code. Not sure that I understand the "huge loss" perspective though. I think at the point where I am desperately scrabbling to free 30% of my root FS held hostage by an accidental snapshot while keeping on-line backup strategy in tact, I won''t be too worried about performance ;)> Should you want to modify a snapshot for some reason, that''s what the ''zfs clone'' function is for. clone your snapshot, promote it, and make your modifications.Err ... hello ... filesystem already full ... hello? -- This message posted from opensolaris.org
Robert Milkowski
2009-Oct-03 10:13 UTC
[zfs-discuss] Can''t rm file when "No space left on device"...
Rudolf Potucek wrote:>> once you break the model where a snapshot is a point-in-time picture, all sorts of bad things can happen. You''ve changed a fundamental assumption of snapshots, and this then impacts how we view them from all sorts of angles; it''s a huge loss to trade away for a very small gain. >> > > Hmm ... I can see how the assumption of a snapshot being unalterable could provide some programming shortcuts and opportunities for optimization of ZFS code. Not sure that I understand the "huge loss" perspective though. I think at the point where I am desperately scrabbling to free 30% of my root FS held hostage by an accidental snapshot while keeping on-line backup strategy in tact, I won''t be too worried about performance ;) >I don''t think Erik meant the code. IMHO it is more about the assumption in all technologies around zfs. For example you want a guarantee that if you boot back into old BE it will be the *same* as before upgrade. Then there is replication based on zfs send|recv... what happens if snapshot is modified on one side only? Or for example we do use snapshot+clone to clone dev environments - can I be sure that if I clone again from the same snapshot I will end-up with the same system? I''m all for flexibility and leavening a choice to sysadmins and I know that developers can be very stuborn sometimes about "the right way or nothing" approach but I don''t think this on is the case as you do have clones. In your case you are concerned with files you would like do delete to regain disk space and they are still in a snapshot... in most cases it is relatively easy to plan for it with a dedicated filesystem(s) for temporary files, etc. -- Robert Milkowski http://milek.blogspot.com