Giuseppe Scrivano wrote:> Jim Meyering <jim@meyering.net> writes: > >>> Another possible issue with this I can think of is >>> depending on the modification pattern of the COW files, >>> the modification processes could fragment the file or >>> more seriously be given ENOSPC errors. >> I hope btrfs takes care of this behind the scene. >> >> How does the clone work wrt to space consumed, a la df? >> If copying a 1GB file this way does not update usage >> stats to reflect the additional 1GB of space used, ... > > I tried to clone a big file and df reported a different "used blocks" > stat that it was before the clone operation.How different exactly? OK I tried this myself on F11 with inconclusive results. $ uname -r 2.6.29.6-213.fc11.i586 $ sudo yum install btrfs-progs # dd bs=1M count=300 if=/dev/zero of=/btrfs.img #min size? # mkfs.btrfs /btrfs.img # mkdir /btrfs # mount -o loop /btrfs.img /btrfs # cd /btrfs # dd bs=1M count=100 if=/dev/zero of=alloc.test # df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop0 300M 28K 300M 1% /btrfs # df -h . #only allocated about 30s later Filesystem Size Used Avail Use% Mounted on /dev/loop0 300M 101M 200M 34% /btrfs # /home/padraig/clone_file alloc.test alloc.test.clone # umount /btrfs # mount -o loop /btrfs.img /btrfs # cd btrfs # df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop0 300M 101M 200M 34% /btrfs OK the above suggests that the clone doesn''t take any space as I would expect. Then it starts getting confusing... # du -h * 100M alloc.test 244M alloc.test.clone #wha? # dd bs=1M count=200 if=/dev/zero of=use.space dd: writing `use.space'': No space left on device 101+0 records in 100+0 records out # ls -l total 454656 -rw-r--r-- 1 root root 104857600 2009-07-28 00:06 alloc.test -rw-r--r-- 1 root root 104857600 2009-07-28 00:07 alloc.test.clone -rw-r--r-- 1 root root 104857600 2009-07-28 00:18 use.space # df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop0 300M 184M 117M 62% /btrfs The above suggests that the clone does actually allocate space but btrfs isn''t reporting it through statvfs correctly? If the clone does allocate space, then how can one clone without allocation which could be very useful for snapshotting for example? Also I tried the above twice and both times got: http://www.kerneloops.org/submitresult.php?number=578993 cheers, Pádraig.
Hi Pádraig, Pádraig Brady <P@draigBrady.com> writes:> How different exactly? > OK I tried this myself on F11 with inconclusive results.I can''t replicate it now, all tests I am doing report that blocks used before and after the clone are the same. Probably yesterday the difference I noticed was in reality the original file flushed to the disk.> The above suggests that the clone does actually allocate space > but btrfs isn''t reporting it through statvfs correctly?The same message appeared here too some days ago, though I cloned only few Kb files, not much to fill the entire partition.> If the clone does allocate space, then how can one > clone without allocation which could be very useful > for snapshotting for example?I don''t know if snapshotting is handled in the same way as a "clone", but in this case it seems more obvious to me that no additional space should be reported.> Also I tried the above twice and both times got: > http://www.kerneloops.org/submitresult.php?number=578993I didn''t get these errors. I am using the btrfs git version. Regards, Giuseppe
On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote:> Hi Pádraig, > > > Pádraig Brady <P@draigBrady.com> writes: > > > How different exactly? > > OK I tried this myself on F11 with inconclusive results. > > I can''t replicate it now, all tests I am doing report that blocks used > before and after the clone are the same. Probably yesterday the > difference I noticed was in reality the original file flushed to the > disk.The clone will use some additional space for the metadata required to point to the cloned blocks. It isn''t exactly O(1) it is O(metadata for the file).> > > > The above suggests that the clone does actually allocate space > > but btrfs isn''t reporting it through statvfs correctly? > > The same message appeared here too some days ago, though I cloned only > few Kb files, not much to fill the entire partition. > > > > If the clone does allocate space, then how can one > > clone without allocation which could be very useful > > for snapshotting for example? > > I don''t know if snapshotting is handled in the same way as a "clone", > but in this case it seems more obvious to me that no additional space > should be reported.The COW for snapshotting and a clone are the same, but the way we get there is a little different. For a snapshot, we have two btree roots pointing to the same nodes, and we''ve incremented the reference count on each of the nodes they both point to. No matter how big the subvolume is, this will always be O(number of pointers in the root block). Cloning a file is done by walking the file metadata and taking a reference on each extent pointed to by the file. The file data is never read in, but all of the file metadata is read in.> > > > Also I tried the above twice and both times got: > > http://www.kerneloops.org/submitresult.php?number=578993 > > I didn''t get these errors. I am using the btrfs git version.These have been fixed. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote: >> >> I can''t replicate it now, all tests I am doing report that blocks used >> before and after the clone are the same. Probably yesterday the >> difference I noticed was in reality the original file flushed to the >> disk. > > The clone will use some additional space for the metadata required to > point to the cloned blocks. It isn''t exactly O(1) it is O(metadata for > the file).Thanks for the clarification Chris. So the just committed change in cp will link the destination file to the extents of the source. We may need to play around with fallocate() if we want to get back to the original cp semantics of actually allocating space on the file system for the new file. I''ll test this when I get an up to date btrfs and when the fallocate interface in glibc settles down. cheers, Pádraig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady wrote:> Chris Mason wrote: > > On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote: > >> > >> I can''t replicate it now, all tests I am doing report that blocks used > >> before and after the clone are the same. Probably yesterday the > >> difference I noticed was in reality the original file flushed to the > >> disk. > > > > The clone will use some additional space for the metadata required to > > point to the cloned blocks. It isn''t exactly O(1) it is O(metadata for > > the file). > > Thanks for the clarification Chris. > So the just committed change in cp will > link the destination file to the extents of the source. > > We may need to play around with fallocate() > if we want to get back to the original > cp semantics of actually allocating space > on the file system for the new file.Well, best to just use the original cp code. I was talking with Giuseppe about this as well, I think we should the option to do regular cp via a flag. There will soon be a reflink system call that can be used on ocfs2 and btrfs as well. Thanks for adding this to glibc! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 29, 2009 at 12:10:14PM -0400, Chris Mason wrote:> On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady wrote: > > Chris Mason wrote: > > > On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote: > > >> > > >> I can''t replicate it now, all tests I am doing report that blocks used > > >> before and after the clone are the same. Probably yesterday the > > >> difference I noticed was in reality the original file flushed to the > > >> disk. > > > > > > The clone will use some additional space for the metadata required to > > > point to the cloned blocks. It isn''t exactly O(1) it is O(metadata for > > > the file). > > > > Thanks for the clarification Chris. > > So the just committed change in cp will > > link the destination file to the extents of the source. > > > > We may need to play around with fallocate() > > if we want to get back to the original > > cp semantics of actually allocating space > > on the file system for the new file. > > Well, best to just use the original cp code. I was talking with > Giuseppe about this as well, I think we should the option to do regular > cp via a flag. > > There will soon be a reflink system call that can be used on ocfs2 and > btrfs as well. Thanks for adding this to glibc!Um, cp, not glibc, sorry ;) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady wrote: >> >> We may need to play around with fallocate() >> if we want to get back to the original >> cp semantics of actually allocating space >> on the file system for the new file. > > Well, best to just use the original cp code. I was talking with > Giuseppe about this as well, I think we should the option to do regular > cp via a flag.Right. Well we can turn off this cloning by doing --sparse={never,always} but that has side effects. If we need an option then maybe we should have it turn on cloning rather than restore default cp behaviour? The side effects I thought of earlier, of COW without corresponding allocation were possible fragmentation on write or unexpected/mishandled ENOSPC. Also for endangered mechanical disks, subsequent processing could be slowed as the head seeks between the old and new data to be copied. Perhaps these are a small price to pay, especially considering that solid state disks will only be affected by the write()=ENOSPC issue. At the moment we have these linking options: cp -l, --link #for hardlinks cp -s, --symbolic-link #for symlinks So perhaps we should support: cp --link={soft,hard,cow} for symlink(), link() and reflink() respectively? I.E. link to the name, inode or extents respectively.> There will soon be a reflink system call that can be used on ocfs2 and > btrfs as well. Thanks for adding this to glibc!I was thinking there would be a generic syscall for this. So cp should call reflink() instead when it becomes available. thanks for the info! Pádraig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 29, 2009 at 07:14:37PM +0100, Pádraig Brady wrote:> Chris Mason wrote: > > On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady wrote: > >> > >> We may need to play around with fallocate() > >> if we want to get back to the original > >> cp semantics of actually allocating space > >> on the file system for the new file. > > > > Well, best to just use the original cp code. I was talking with > > Giuseppe about this as well, I think we should the option to do regular > > cp via a flag. > > Right. Well we can turn off this cloning by doing --sparse={never,always} > but that has side effects. If we need an option then maybe we should have > it turn on cloning rather than restore default cp behaviour? > The side effects I thought of earlier, of COW without corresponding allocation > were possible fragmentation on write or unexpected/mishandled ENOSPC. > Also for endangered mechanical disks, subsequent processing could > be slowed as the head seeks between the old and new data to be copied. > Perhaps these are a small price to pay, especially considering that > solid state disks will only be affected by the write()=ENOSPC issue. > > At the moment we have these linking options: > > cp -l, --link #for hardlinks > cp -s, --symbolic-link #for symlinks > > So perhaps we should support: > > cp --link={soft,hard,cow} > for symlink(), link() and reflink() respectively? > I.E. link to the name, inode or extents respectively.I''ve cooked up ''ln -r'' for reflinks, which works for ln(1) but not for cp(1). I have a git tree with the (in-flux) code on oss.oracle.com: [View] http://oss.oracle.com/git/?p=jlbec/reflink.git;a=summary [Pull] git://oss.oracle.com/git/jlbec/reflink.git master This repository isn''t designed to be an authorative patch for coreutils. Instead it provides a reflink(1) program that is actually ln -r in disguise. Later work would be to get coreutils updated "properly". Joel -- "This is the end, beautiful friend. This is the end, my only friend the end Of our elaborate plans, the end Of everything that stands, the end No safety or surprise, the end I''ll never look into your eyes again." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Joel Becker wrote:> On Wed, Jul 29, 2009 at 07:14:37PM +0100, Pádraig Brady wrote: >> Chris Mason wrote: >> > On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady wrote: >> >> >> >> We may need to play around with fallocate() >> >> if we want to get back to the original >> >> cp semantics of actually allocating space >> >> on the file system for the new file. >> > >> > Well, best to just use the original cp code. I was talking with >> > Giuseppe about this as well, I think we should the option to do regular >> > cp via a flag. >> >> Right. Well we can turn off this cloning by doing --sparse={never,always} >> but that has side effects. If we need an option then maybe we should have >> it turn on cloning rather than restore default cp behaviour? >> The side effects I thought of earlier, of COW without corresponding allocation >> were possible fragmentation on write or unexpected/mishandled ENOSPC. >> Also for endangered mechanical disks, subsequent processing could >> be slowed as the head seeks between the old and new data to be copied. >> Perhaps these are a small price to pay, especially considering that >> solid state disks will only be affected by the write()=ENOSPC issue. >> >> At the moment we have these linking options: >> >> cp -l, --link #for hardlinks >> cp -s, --symbolic-link #for symlinks >> >> So perhaps we should support: >> >> cp --link={soft,hard,cow} >> for symlink(), link() and reflink() respectively? >> I.E. link to the name, inode or extents respectively. > > I''ve cooked up ''ln -r'' for reflinks, which works for ln(1) but > not for cp(1).Thanks. I haven''t looked, but after reading about the reflink syscall [http://lwn.net/Articles/332802/] had come to the same conclusion: this feature belongs with ln rather than with cp. Besides, putting the new behavior on a new option avoids the current semantic change we would otherwise induce in cp.
On Thu, Jul 30, 2009 at 09:39:17AM +0200, Jim Meyering wrote:> Joel Becker wrote: > > I''ve cooked up ''ln -r'' for reflinks, which works for ln(1) but > > not for cp(1). > > Thanks. I haven''t looked, but after reading about the reflink syscall > [http://lwn.net/Articles/332802/] had come to the same conclusion: > this feature belongs with ln rather than with cp. > > Besides, putting the new behavior on a new option avoids > the current semantic change we would otherwise induce in cp.Well, I don''t see any reason cp(1) can''t take advantage of reflink(2). I just think that cp(1) should look at reflink(2) as an optimization, not a specific methodology. What do I mean? If you want to say "I know what a reflink is, and that''s exactly what I want", you want "ln -r". But say you want a "cp --snap" that tries to take a snapshot regardless of the backend. It could use reflink(2) on filesystems that support it, or perhaps a passthrough call to the underlying storage, or who knows what. I can also imagine a "cp --shallow" that is "if you can cow, do it, otherwise do a normal cp". Joel -- "I think it would be a good idea." - Mahatma Ghandi, when asked what he thought of Western civilization Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127
Jim Meyering wrote:> Joel Becker wrote: > >> On Wed, Jul 29, 2009 at 07:14:37PM +0100, Pádraig Brady wrote: >>> >>> At the moment we have these linking options: >>> >>> cp -l, --link #for hardlinks >>> cp -s, --symbolic-link #for symlinks >>> >>> So perhaps we should support: >>> >>> cp --link={soft,hard,cow} >>> for symlink(), link() and reflink() respectively? >>> I.E. link to the name, inode or extents respectively. >> >> I''ve cooked up ''ln -r'' for reflinks, which works for ln(1) but >> not for cp(1). > > Thanks. I haven''t looked, but after reading about the reflink syscall > [http://lwn.net/Articles/332802/] had come to the same conclusion: > this feature belongs with ln rather than with cp.Right. It definitely should be in ln anyway.> Besides, putting the new behavior on a new option avoids > the current semantic change we would otherwise induce in cp.Yes doing reflink() in cp by default currently can be problematic as discussed, especially on mechanical hard disks. Though in future I can see most users of cp preferring reflink() to be done, rather than read()/write(). Ponder... In any case putting --link=cow or --reflink or whatever in cp could be very useful for creating writeable snapshot branches. cheers, Pádraig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jim Meyering <jim@meyering.net> writes:> > Thanks. I haven''t looked, but after reading about the reflink syscall > [http://lwn.net/Articles/332802/] had come to the same conclusion: > this feature belongs with ln rather than with cp.cp already has -l so it would make sense to extend that too.> Besides, putting the new behavior on a new option avoids > the current semantic change we would otherwise induce in cp.I don''t see how semantics change in a user visible way. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andi Kleen wrote:> Jim Meyering <jim@meyering.net> writes: >> Thanks. I haven''t looked, but after reading about the reflink syscall >> [http://lwn.net/Articles/332802/] had come to the same conclusion: >> this feature belongs with ln rather than with cp. > > cp already has -l so it would make sense to extend that too. > >> Besides, putting the new behavior on a new option avoids >> the current semantic change we would otherwise induce in cp. > > I don''t see how semantics change in a user visible way.I was thinking that doing reflink() in cp has the following user visible advantages/disadvantages: Advantages: very quick copy less space used Disadvantages: disk head seeking deferred to modification process possible fragmentation on write possible ENOSPC on write The disk head seeking issue will go away with time. I''m not sure if the other disadvantages exist or whether they could be alleviated with fallocate() or something. cheers, Pádraig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andi Kleen wrote:> Jim Meyering <jim@meyering.net> writes: >> >> Thanks. I haven''t looked, but after reading about the reflink syscall >> [http://lwn.net/Articles/332802/] had come to the same conclusion: >> this feature belongs with ln rather than with cp. > > cp already has -l so it would make sense to extend that too.Good point.>> Besides, putting the new behavior on a new option avoids >> the current semantic change we would otherwise induce in cp. > > I don''t see how semantics change in a user visible way.With classic cp, if I copy a 1GB non-sparse file and there''s less space than that available, cp fails with ENOSPC. With this new feature, it succeeds even if there are just a few blocks available. Also, consider (buggy!) code that then depends on being able to modify that file in-place, and that "knows" it doesn''t need to check for ENOSPC. Sure, they should always check for write failure, but still. It is a change. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jim Meyering wrote:> With classic cp, if I copy a 1GB non-sparse file and there''s less > space than that available, cp fails with ENOSPC. > With this new feature, it succeeds even if there are > just a few blocks available.Is it good or bad?> Also, consider (buggy!) code that then depends on being able to modify > that file in-place, and that "knows" it doesn''t need to check for ENOSPC. > Sure, they should always check for write failure, but still. It is > a change.On a multiuser system, that (buggy) tool would fail anyway if something else adds enough new data to the filesystem in the meantime. But sure, it''s a change. -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > With classic cp, if I copy a 1GB non-sparse file and there''s less > space than that available, cp fails with ENOSPC. > With this new feature, it succeeds even if there are > just a few blocks available. > > Also, consider (buggy!) code that then depends on being able to modify > that file in-place, and that "knows" it doesn''t need to check for ENOSPC. > Sure, they should always check for write failure, but still. It is > a change.Fair point, although I suspect there are cases where ENOSPC on non extending write can already happen on specific file systems. e.g. on btrfs it might happen when the tree gets rebalanced? Or perhaps on nilfs2 when the garbage collector doesn''t run in time. Wouldn''t surprise me if there weren''t more cases already. -Andi -- ak@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/30/2009 04:40 AM, Pádraig Brady wrote:> Jim Meyering wrote: > >> Joel Becker wrote: >> >> >>> On Wed, Jul 29, 2009 at 07:14:37PM +0100, Pádraig Brady wrote: >>> >>>> At the moment we have these linking options: >>>> >>>> cp -l, --link #for hardlinks >>>> cp -s, --symbolic-link #for symlinks >>>> >>>> So perhaps we should support: >>>> >>>> cp --link={soft,hard,cow} >>>> for symlink(), link() and reflink() respectively? >>>> I.E. link to the name, inode or extents respectively. >>>> >>> I''ve cooked up ''ln -r'' for reflinks, which works for ln(1) but >>> not for cp(1). >>> >> Thanks. I haven''t looked, but after reading about the reflink syscall >> [http://lwn.net/Articles/332802/] had come to the same conclusion: >> this feature belongs with ln rather than with cp. >> > > Right. It definitely should be in ln anyway. > > >> Besides, putting the new behavior on a new option avoids >> the current semantic change we would otherwise induce in cp. >> > > Yes doing reflink() in cp by default currently can > be problematic as discussed, especially on mechanical hard disks. > Though in future I can see most users of cp preferring > reflink() to be done, rather than read()/write(). Ponder... > >I think that doing reflink by default would be a horrible idea - one good reason to copy a file is to increase your level of fault tolerance and reflink magically avoids that :-) reflink is a neat feature, but should be used on purpose in my opinion, ric> In any case putting --link=cow or --reflink or whatever in cp > could be very useful for creating writeable snapshot branches. > > cheers, > Pádraig. >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ric Wheeler wrote:> I think that doing reflink by default would be a horrible idea - one > good reason to copy a file is to increase your level of fault > tolerance and reflink magically avoids that :-)Good point. This would constitute another user-visible semantic change in cp: a disk fault that affects any non-metadata block of a ref-linked file affects both copies. GNU cp will soon attempt this only when a --reflink option is specified. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Pádraig, thanks for the comments. Pádraig Brady <P@draigBrady.com> writes:> # 300MB seems to be the minimum size for a btrfs with default > parameters.Actually, it seems the minimum space required is 256MB. Using a 255MB image I get: "device btrfs.img is too small (must be at least 256 MB)"> # FIXME: use `truncate --allocate` when it becomes available, which > # may allow unmarking this as an expensive test.Are you sure that this feature will make the test less expensive? Still the test files must be written there, so in the best case (considering the fallocate done in 0s) only the dd cost will be saved but still it looks like an expensive test. In the version I attached, I am using a sparse file (truncate --size) and it seems to work fine. Is it correct or am I missing something? I haven''t looked yet but probably there are other tests that can take advantage of sparse files instead of using "dd". I am also considering the Jim''s note doing the umount in the cleanup_ function. Cheers, Giuseppe From 7add4b337b7db0a63bca0dd0fe0f146f175163f8 Mon Sep 17 00:00:00 2001 From: Giuseppe Scrivano <gscrivano@gnu.org> Date: Wed, 29 Jul 2009 20:31:20 +0200 Subject: [PATCH] tests: add a test for btrfs'' copy-on-write file clone operation * tests/Makefile.am: Consider the new test. * tests/cp/file-clone: New file. --- tests/Makefile.am | 1 + tests/cp/file-clone | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+), 0 deletions(-) create mode 100755 tests/cp/file-clone diff --git a/tests/Makefile.am b/tests/Makefile.am index 59737a0..9841aa3 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -20,6 +20,7 @@ EXTRA_DIST = \ root_tests = \ chown/basic \ + cp/file-clone \ cp/cp-a-selinux \ cp/preserve-gid \ cp/special-bits \ diff --git a/tests/cp/file-clone b/tests/cp/file-clone new file mode 100755 index 0000000..c65b9cb --- /dev/null +++ b/tests/cp/file-clone @@ -0,0 +1,58 @@ +#!/bin/sh +# Make sure file-clone on a btrfs file system works properly. + +# Copyright (C) 2009 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + + +if test "$VERBOSE" = yes; then + set -x + cp --version +fi + +. $srcdir/test-lib.sh + +require_root_ +require_sparse_support_ +#expensive_ + +cleanup_(){ umount btrfs; } + +fail=0 + +mkfs.btrfs --version || skip_test_ "btrfs userland tools not installed" + +# 256MB seems to be the minimum size for a btrfs with default parameters. +truncate --size=256M btrfs.img || framework_failure + +mkfs.btrfs btrfs.img || framework_failure + +mkdir btrfs || framework_failure + +mount -t btrfs -o loop btrfs.img btrfs || framework_failure + +dd bs=1M count=200 if=/dev/zero of=btrfs/alloc.test || framework_failure + +# If the file is cloned, only additional space for metadata is required. +# Two 200MB files can be present even if the total file system space is 256MB. +cp btrfs/alloc.test btrfs/clone.test || fail=1 +rm btrfs/clone.test + +# When --sparse={always,never} is used, the file is copied without any cloning. +# Use --sparse=never to be sure the file is copied without holes and it is not +# possible since there is not enough free space. +cp --sparse=never btrfs/alloc.test btrfs/clone.test && fail=1 + +Exit $fail -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 30, 2009 at 12:54:16PM +0200, Andi Kleen wrote:> > With classic cp, if I copy a 1GB non-sparse file and there''s less > > space than that available, cp fails with ENOSPC. > > With this new feature, it succeeds even if there are > > just a few blocks available. > > > > Also, consider (buggy!) code that then depends on being able to modify > > that file in-place, and that "knows" it doesn''t need to check for ENOSPC. > > Sure, they should always check for write failure, but still. It is > > a change. > > Fair point, although I suspect there are cases where ENOSPC > on non extending write can already happen on specific file systems. e.g. on > btrfs it might happen when the tree gets rebalanced? Or perhaps on nilfs2 > when the garbage collector doesn''t run in time. Wouldn''t surprise > me if there weren''t more cases already.In some sense, using btrfs, nilfs2i, ocfs2 with refcount trees enabled, or any other CoW-ish filesystem is a tacit approval of the delayed ENOSPC. The same can be said of "thin provisioning" LUNs. However, the other concerns are still valid. A user invoking vanilla cp(1) expects two independent storage regions for the data. (Oh, and what about future support of de-duping in filesystems? :-) Joel -- "Anything that is too stupid to be spoken is sung." - Voltaire Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Joel Becker wrote:> In some sense, using btrfs, nilfs2i, ocfs2 with refcount trees > enabled, or any other CoW-ish filesystem is a tacit approval of the > delayed ENOSPC. The same can be said of "thin provisioning" LUNs. > However, the other concerns are still valid. A user invoking vanilla > cp(1) expects two independent storage regions for the data. > (Oh, and what about future support of de-duping in filesystems? > :-)I maintain an app to de-dupe at http://www.pixelbeat.org/fslint/ and I''ll be adding reflink support as soon as it becomes available. From a filesystem point of view, one thing that would help speed this up (and many other things like rsync etc.) would be to allow one to associate say a sha-3 hash or whatever with the file, which the filesystem would automatically clear when the file data changes. So in general having a special set of extended attributes that were auto cleared on file modification would be very useful for lots of stuff. cheers, Pádraig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html