thr3ads.net - Btrfs devel - Re: BTRFS file clone support for cp [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Pádraig Brady

2009-Jul-27 23:40 UTC

Re: BTRFS file clone support for cp

Giuseppe Scrivano wrote:> Jim Meyering <jim@meyering.net> writes:
> 
>>> Another possible issue with this I can think of is
>>> depending on the modification pattern of the COW files,
>>> the modification processes could fragment the file or
>>> more seriously be given ENOSPC errors.
>> I hope btrfs takes care of this behind the scene.
>>
>> How does the clone work wrt to space consumed, a la df?
>> If copying a 1GB file this way does not update usage
>> stats to reflect the additional 1GB of space used, ...
> 
> I tried to clone a big file and df reported a different "used
blocks"
> stat that it was before the clone operation.
How different exactly?
OK I tried this myself on F11 with inconclusive results.

$ uname -r
2.6.29.6-213.fc11.i586
$ sudo yum install btrfs-progs
# dd bs=1M count=300 if=/dev/zero of=/btrfs.img #min size?
# mkfs.btrfs /btrfs.img
# mkdir /btrfs
# mount -o loop /btrfs.img /btrfs
# cd /btrfs
# dd bs=1M count=100 if=/dev/zero of=alloc.test
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            300M   28K  300M   1% /btrfs
# df -h . #only allocated about 30s later
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            300M  101M  200M  34% /btrfs
# /home/padraig/clone_file alloc.test alloc.test.clone
# umount /btrfs
# mount -o loop /btrfs.img /btrfs
# cd btrfs
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            300M  101M  200M  34% /btrfs

OK the above suggests that the clone doesn''t take
any space as I would expect. Then it starts getting confusing...

# du -h *
100M    alloc.test
244M    alloc.test.clone #wha?
# dd bs=1M count=200 if=/dev/zero of=use.space
dd: writing `use.space'': No space left on device
101+0 records in
100+0 records out
# ls -l
total 454656
-rw-r--r-- 1 root root 104857600 2009-07-28 00:06 alloc.test
-rw-r--r-- 1 root root 104857600 2009-07-28 00:07 alloc.test.clone
-rw-r--r-- 1 root root 104857600 2009-07-28 00:18 use.space
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            300M  184M  117M  62% /btrfs

The above suggests that the clone does actually allocate space
but btrfs isn''t reporting it through statvfs correctly?
If the clone does allocate space, then how can one
clone without allocation which could be very useful
for snapshotting for example?

Also I tried the above twice and both times got:
http://www.kerneloops.org/submitresult.php?number=578993

cheers,
Pádraig.

Giuseppe Scrivano

2009-Jul-28 20:06 UTC

head link

Re: BTRFS file clone support for cp

Hi Pádraig,


Pádraig Brady <P@draigBrady.com> writes:
> How different exactly?
> OK I tried this myself on F11 with inconclusive results.
I can''t replicate it now, all tests I am doing report that blocks used
before and after the clone are the same.  Probably yesterday the
difference I noticed was in reality the original file flushed to the
disk.

> The above suggests that the clone does actually allocate space
> but btrfs isn''t reporting it through statvfs correctly?
The same message appeared here too some days ago, though I cloned only
few Kb files, not much to fill the entire partition.

> If the clone does allocate space, then how can one
> clone without allocation which could be very useful
> for snapshotting for example?
I don''t know if snapshotting is handled in the same way as a
"clone",
but in this case it seems more obvious to me that no additional space
should be reported.

> Also I tried the above twice and both times got:
> http://www.kerneloops.org/submitresult.php?number=578993
I didn''t get these errors.  I am using the btrfs git version.


Regards,
Giuseppe

Chris Mason

2009-Jul-29 13:01 UTC

head link

Re: BTRFS file clone support for cp

On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano
wrote:> Hi Pádraig,
> 
> 
> Pádraig Brady <P@draigBrady.com> writes:
> 
> > How different exactly?
> > OK I tried this myself on F11 with inconclusive results.
> 
> I can''t replicate it now, all tests I am doing report that blocks
used
> before and after the clone are the same.  Probably yesterday the
> difference I noticed was in reality the original file flushed to the
> disk.
The clone will use some additional space for the metadata required to
point to the cloned blocks.  It isn''t exactly O(1) it is O(metadata for
the file).
> 
> 
> > The above suggests that the clone does actually allocate space
> > but btrfs isn''t reporting it through statvfs correctly?
> 
> The same message appeared here too some days ago, though I cloned only
> few Kb files, not much to fill the entire partition.
> 
> 
> > If the clone does allocate space, then how can one
> > clone without allocation which could be very useful
> > for snapshotting for example?
> 
> I don''t know if snapshotting is handled in the same way as a
"clone",
> but in this case it seems more obvious to me that no additional space
> should be reported.
The COW for snapshotting and a clone are the same, but the way we get
there is a little different.  For a snapshot, we have two btree roots
pointing to the same nodes, and we''ve incremented the reference count
on
each of the nodes they both point to.  No matter how big the subvolume
is, this will always be O(number of pointers in the root block).

Cloning a file is done by walking the file metadata and taking a
reference on each extent pointed to by the file.  The file data is never
read in, but all of the file metadata is read in.
> 
> 
> > Also I tried the above twice and both times got:
> > http://www.kerneloops.org/submitresult.php?number=578993
> 
> I didn''t get these errors.  I am using the btrfs git version.
These have been fixed.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pádraig Brady

2009-Jul-29 14:14 UTC

head link

Re: BTRFS file clone support for cp

Chris Mason wrote:> On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote:
>>
>> I can''t replicate it now, all tests I am doing report that
blocks used
>> before and after the clone are the same.  Probably yesterday the
>> difference I noticed was in reality the original file flushed to the
>> disk.
> 
> The clone will use some additional space for the metadata required to
> point to the cloned blocks.  It isn''t exactly O(1) it is
O(metadata for
> the file).
Thanks for the clarification Chris.
So the just committed change in cp will
link the destination file to the extents of the source.

We may need to play around with fallocate()
if we want to get back to the original
cp semantics of actually allocating space
on the file system for the new file.

I''ll test this when I get an up to date btrfs
and when the fallocate interface in glibc settles down.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2009-Jul-29 16:10 UTC

head link

Re: BTRFS file clone support for cp

On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady
wrote:> Chris Mason wrote:
> > On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote:
> >>
> >> I can''t replicate it now, all tests I am doing report
that blocks used
> >> before and after the clone are the same.  Probably yesterday the
> >> difference I noticed was in reality the original file flushed to
the
> >> disk.
> > 
> > The clone will use some additional space for the metadata required to
> > point to the cloned blocks.  It isn''t exactly O(1) it is
O(metadata for
> > the file).
> 
> Thanks for the clarification Chris.
> So the just committed change in cp will
> link the destination file to the extents of the source.
> 
> We may need to play around with fallocate()
> if we want to get back to the original
> cp semantics of actually allocating space
> on the file system for the new file.
Well, best to just use the original cp code.  I was talking with
Giuseppe about this as well, I think we should the option to do regular
cp via a flag.

There will soon be a reflink system call that can be used on ocfs2 and
btrfs as well.  Thanks for adding this to glibc!

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chris Mason

2009-Jul-29 16:18 UTC

head link

Re: BTRFS file clone support for cp

On Wed, Jul 29, 2009 at 12:10:14PM -0400, Chris Mason
wrote:> On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady wrote:
> > Chris Mason wrote:
> > > On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano
wrote:
> > >>
> > >> I can''t replicate it now, all tests I am doing
report that blocks used
> > >> before and after the clone are the same.  Probably yesterday
the
> > >> difference I noticed was in reality the original file flushed
to the
> > >> disk.
> > > 
> > > The clone will use some additional space for the metadata
required to
> > > point to the cloned blocks.  It isn''t exactly O(1) it is
O(metadata for
> > > the file).
> > 
> > Thanks for the clarification Chris.
> > So the just committed change in cp will
> > link the destination file to the extents of the source.
> > 
> > We may need to play around with fallocate()
> > if we want to get back to the original
> > cp semantics of actually allocating space
> > on the file system for the new file.
> 
> Well, best to just use the original cp code.  I was talking with
> Giuseppe about this as well, I think we should the option to do regular
> cp via a flag.
> 
> There will soon be a reflink system call that can be used on ocfs2 and
> btrfs as well.  Thanks for adding this to glibc!
Um, cp, not glibc, sorry ;)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pádraig Brady

2009-Jul-29 18:14 UTC

head link

Re: BTRFS file clone support for cp

Chris Mason wrote:> On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady wrote:
>>
>> We may need to play around with fallocate()
>> if we want to get back to the original
>> cp semantics of actually allocating space
>> on the file system for the new file.
> 
> Well, best to just use the original cp code.  I was talking with
> Giuseppe about this as well, I think we should the option to do regular
> cp via a flag.
Right. Well we can turn off this cloning by doing --sparse={never,always}
but that has side effects. If we need an option then maybe we should have
it turn on cloning rather than restore default cp behaviour?
The side effects I thought of earlier, of COW without corresponding allocation
were possible fragmentation on write or unexpected/mishandled ENOSPC.
Also for endangered mechanical disks, subsequent processing could
be slowed as the head seeks between the old and new data to be copied.
Perhaps these are a small price to pay, especially considering that
solid state disks will only be affected by the write()=ENOSPC issue.

At the moment we have these linking options:

cp -l, --link #for hardlinks
cp -s, --symbolic-link #for symlinks

So perhaps we should support:

cp --link={soft,hard,cow}
for symlink(), link() and reflink() respectively?
I.E. link to the name, inode or extents respectively.
> There will soon be a reflink system call that can be used on ocfs2 and
> btrfs as well.  Thanks for adding this to glibc!
I was thinking there would be a generic syscall for this.
So cp should call reflink() instead when it becomes available.

thanks for the info!
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Joel Becker

2009-Jul-30 00:57 UTC

head link

Re: BTRFS file clone support for cp

On Wed, Jul 29, 2009 at 07:14:37PM +0100, Pádraig Brady
wrote:> Chris Mason wrote:
> > On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady wrote:
> >>
> >> We may need to play around with fallocate()
> >> if we want to get back to the original
> >> cp semantics of actually allocating space
> >> on the file system for the new file.
> >
> > Well, best to just use the original cp code.  I was talking with
> > Giuseppe about this as well, I think we should the option to do
regular
> > cp via a flag.
> 
> Right. Well we can turn off this cloning by doing --sparse={never,always}
> but that has side effects. If we need an option then maybe we should have
> it turn on cloning rather than restore default cp behaviour?
> The side effects I thought of earlier, of COW without corresponding
allocation
> were possible fragmentation on write or unexpected/mishandled ENOSPC.
> Also for endangered mechanical disks, subsequent processing could
> be slowed as the head seeks between the old and new data to be copied.
> Perhaps these are a small price to pay, especially considering that
> solid state disks will only be affected by the write()=ENOSPC issue.
> 
> At the moment we have these linking options:
> 
> cp -l, --link #for hardlinks
> cp -s, --symbolic-link #for symlinks
> 
> So perhaps we should support:
> 
> cp --link={soft,hard,cow}
> for symlink(), link() and reflink() respectively?
> I.E. link to the name, inode or extents respectively.
	I''ve cooked up ''ln -r'' for reflinks, which works for
ln(1) but
not for cp(1).  I have a git tree with the (in-flux) code on
oss.oracle.com:

[View]
http://oss.oracle.com/git/?p=jlbec/reflink.git;a=summary
[Pull]
git://oss.oracle.com/git/jlbec/reflink.git master

	This repository isn''t designed to be an authorative patch for
coreutils.  Instead it provides a reflink(1) program that is actually ln
-r in disguise.  Later work would be to get coreutils updated
"properly".

Joel

-- 

"This is the end, beautiful friend.
 This is the end, my only friend the end
 Of our elaborate plans, the end
 Of everything that stands, the end
 No safety or surprise, the end
 I''ll never look into your eyes again."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Meyering

2009-Jul-30 07:39 UTC

head link

Re: BTRFS file clone support for cp

Joel Becker wrote:
> On Wed, Jul 29, 2009 at 07:14:37PM +0100, Pádraig Brady wrote:
>> Chris Mason wrote:
>> > On Wed, Jul 29, 2009 at 03:14:49PM +0100, Pádraig Brady wrote:
>> >>
>> >> We may need to play around with fallocate()
>> >> if we want to get back to the original
>> >> cp semantics of actually allocating space
>> >> on the file system for the new file.
>> >
>> > Well, best to just use the original cp code.  I was talking with
>> > Giuseppe about this as well, I think we should the option to do
regular
>> > cp via a flag.
>>
>> Right. Well we can turn off this cloning by doing
--sparse={never,always}
>> but that has side effects. If we need an option then maybe we should
have
>> it turn on cloning rather than restore default cp behaviour?
>> The side effects I thought of earlier, of COW without corresponding
allocation
>> were possible fragmentation on write or unexpected/mishandled ENOSPC.
>> Also for endangered mechanical disks, subsequent processing could
>> be slowed as the head seeks between the old and new data to be copied.
>> Perhaps these are a small price to pay, especially considering that
>> solid state disks will only be affected by the write()=ENOSPC issue.
>>
>> At the moment we have these linking options:
>>
>> cp -l, --link #for hardlinks
>> cp -s, --symbolic-link #for symlinks
>>
>> So perhaps we should support:
>>
>> cp --link={soft,hard,cow}
>> for symlink(), link() and reflink() respectively?
>> I.E. link to the name, inode or extents respectively.
>
> 	I''ve cooked up ''ln -r'' for reflinks, which
works for ln(1) but
> not for cp(1).
Thanks.  I haven''t looked, but after reading about the reflink syscall
[http://lwn.net/Articles/332802/] had come to the same conclusion:
this feature belongs with ln rather than with cp.

Besides, putting the new behavior on a new option avoids
the current semantic change we would otherwise induce in cp.

Joel Becker

2009-Jul-30 08:21 UTC

head link

Re: BTRFS file clone support for cp

On Thu, Jul 30, 2009 at 09:39:17AM +0200, Jim Meyering
wrote:> Joel Becker wrote:
> > 	I''ve cooked up ''ln -r'' for reflinks, which
works for ln(1) but
> > not for cp(1).
> 
> Thanks.  I haven''t looked, but after reading about the reflink
syscall
> [http://lwn.net/Articles/332802/] had come to the same conclusion:
> this feature belongs with ln rather than with cp.
> 
> Besides, putting the new behavior on a new option avoids
> the current semantic change we would otherwise induce in cp.
	Well, I don''t see any reason cp(1) can''t take advantage of
reflink(2).  I just think that cp(1) should look at reflink(2) as an
optimization, not a specific methodology.
	What do I mean?  If you want to say "I know what a reflink is,
and that''s exactly what I want", you want "ln -r".  But
say you want a
"cp --snap" that tries to take a snapshot regardless of the backend. 
It
could use reflink(2) on filesystems that support it, or perhaps a
passthrough call to the underlying storage, or who knows what.  I can
also imagine a "cp --shallow" that is "if you can cow, do it,
otherwise
do a normal cp".

Joel

-- 

"I think it would be a good idea."  
        - Mahatma Ghandi, when asked what he thought of Western
          civilization

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

Pádraig Brady

2009-Jul-30 08:40 UTC

head link

Re: BTRFS file clone support for cp

Jim Meyering wrote:> Joel Becker wrote:
> 
>> On Wed, Jul 29, 2009 at 07:14:37PM +0100, Pádraig Brady wrote:
>>>
>>> At the moment we have these linking options:
>>>
>>> cp -l, --link #for hardlinks
>>> cp -s, --symbolic-link #for symlinks
>>>
>>> So perhaps we should support:
>>>
>>> cp --link={soft,hard,cow}
>>> for symlink(), link() and reflink() respectively?
>>> I.E. link to the name, inode or extents respectively.
>>
>> 	I''ve cooked up ''ln -r'' for reflinks, which
works for ln(1) but
>> not for cp(1).
> 
> Thanks.  I haven''t looked, but after reading about the reflink
syscall
> [http://lwn.net/Articles/332802/] had come to the same conclusion:
> this feature belongs with ln rather than with cp.
Right. It definitely should be in ln anyway.
> Besides, putting the new behavior on a new option avoids
> the current semantic change we would otherwise induce in cp.
Yes doing reflink() in cp by default currently can
be problematic as discussed, especially on mechanical hard disks.
Though in future I can see most users of cp preferring
reflink() to be done, rather than read()/write(). Ponder...

In any case putting --link=cow or --reflink or whatever in cp
could be very useful for creating writeable snapshot branches.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andi Kleen

2009-Jul-30 09:26 UTC

head link

Re: BTRFS file clone support for cp

Jim Meyering <jim@meyering.net> writes:>
> Thanks.  I haven''t looked, but after reading about the reflink
syscall
> [http://lwn.net/Articles/332802/] had come to the same conclusion:
> this feature belongs with ln rather than with cp.
cp already has -l so it would make sense to extend that too.
> Besides, putting the new behavior on a new option avoids
> the current semantic change we would otherwise induce in cp.
I don''t see how semantics change in a user visible way.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pádraig Brady

2009-Jul-30 10:02 UTC

head link

Re: BTRFS file clone support for cp

Andi Kleen wrote:> Jim Meyering <jim@meyering.net> writes:
>> Thanks.  I haven''t looked, but after reading about the reflink
syscall
>> [http://lwn.net/Articles/332802/] had come to the same conclusion:
>> this feature belongs with ln rather than with cp.
> 
> cp already has -l so it would make sense to extend that too.
> 
>> Besides, putting the new behavior on a new option avoids
>> the current semantic change we would otherwise induce in cp.
> 
> I don''t see how semantics change in a user visible way.
I was thinking that doing reflink() in cp has the following
user visible advantages/disadvantages:

Advantages:
  very quick copy
  less space used

Disadvantages:
  disk head seeking deferred to modification process
  possible fragmentation on write
  possible ENOSPC on write

The disk head seeking issue will go away with time.
I''m not sure if the other disadvantages exist or whether
they could be alleviated with fallocate() or something.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Meyering

2009-Jul-30 10:16 UTC

head link

Re: BTRFS file clone support for cp

Andi Kleen wrote:
> Jim Meyering <jim@meyering.net> writes:
>>
>> Thanks.  I haven''t looked, but after reading about the reflink
syscall
>> [http://lwn.net/Articles/332802/] had come to the same conclusion:
>> this feature belongs with ln rather than with cp.
>
> cp already has -l so it would make sense to extend that too.
Good point.
>> Besides, putting the new behavior on a new option avoids
>> the current semantic change we would otherwise induce in cp.
>
> I don''t see how semantics change in a user visible way.
With classic cp, if I copy a 1GB non-sparse file and there''s less
space than that available, cp fails with ENOSPC.
With this new feature, it succeeds even if there are
just a few blocks available.

Also, consider (buggy!) code that then depends on being able to modify
that file in-place, and that "knows" it doesn''t need to check
for ENOSPC.
Sure, they should always check for write failure, but still.  It is
a change.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Tomasz Chmielewski

2009-Jul-30 10:21 UTC

head link

Re: BTRFS file clone support for cp

Jim Meyering wrote:
> With classic cp, if I copy a 1GB non-sparse file and there''s less
> space than that available, cp fails with ENOSPC.
> With this new feature, it succeeds even if there are
> just a few blocks available.
Is it good or bad?

> Also, consider (buggy!) code that then depends on being able to modify
> that file in-place, and that "knows" it doesn''t need to
check for ENOSPC.
> Sure, they should always check for write failure, but still.  It is
> a change.
On a multiuser system, that (buggy) tool would fail anyway if something 
else adds enough new data to the filesystem in the meantime.

But sure, it''s a change.


-- 
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andi Kleen

2009-Jul-30 10:54 UTC

head link

Re: BTRFS file clone support for cp

> 
> With classic cp, if I copy a 1GB non-sparse file and there''s less
> space than that available, cp fails with ENOSPC.
> With this new feature, it succeeds even if there are
> just a few blocks available.
> 
> Also, consider (buggy!) code that then depends on being able to modify
> that file in-place, and that "knows" it doesn''t need to
check for ENOSPC.
> Sure, they should always check for write failure, but still.  It is
> a change.
Fair point, although I suspect there are cases where ENOSPC
on non extending write can already happen on specific file systems. e.g. on 
btrfs it might happen when the tree gets rebalanced? Or perhaps on nilfs2
when the garbage collector doesn''t run in time. Wouldn''t
surprise
me if there weren''t more cases already.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ric Wheeler

2009-Jul-30 16:28 UTC

head link

Re: BTRFS file clone support for cp

On 07/30/2009 04:40 AM, Pádraig Brady wrote:> Jim Meyering wrote:
>    
>> Joel Becker wrote:
>>
>>      
>>> On Wed, Jul 29, 2009 at 07:14:37PM +0100, Pádraig Brady wrote:
>>>        
>>>> At the moment we have these linking options:
>>>>
>>>> cp -l, --link #for hardlinks
>>>> cp -s, --symbolic-link #for symlinks
>>>>
>>>> So perhaps we should support:
>>>>
>>>> cp --link={soft,hard,cow}
>>>> for symlink(), link() and reflink() respectively?
>>>> I.E. link to the name, inode or extents respectively.
>>>>          
>>> 	I''ve cooked up ''ln -r'' for reflinks,
which works for ln(1) but
>>> not for cp(1).
>>>        
>> Thanks.  I haven''t looked, but after reading about the reflink
syscall
>> [http://lwn.net/Articles/332802/] had come to the same conclusion:
>> this feature belongs with ln rather than with cp.
>>      
>
> Right. It definitely should be in ln anyway.
>
>    
>> Besides, putting the new behavior on a new option avoids
>> the current semantic change we would otherwise induce in cp.
>>      
>
> Yes doing reflink() in cp by default currently can
> be problematic as discussed, especially on mechanical hard disks.
> Though in future I can see most users of cp preferring
> reflink() to be done, rather than read()/write(). Ponder...
>
>    
I think that doing reflink by default would be a horrible idea - one 
good reason to copy a file is to increase your level of fault tolerance 
and reflink magically avoids that :-)

reflink is a neat feature, but should be used on purpose in my opinion,

ric
> In any case putting --link=cow or --reflink or whatever in cp
> could be very useful for creating writeable snapshot branches.
>
> cheers,
> Pádraig.
>    
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jim Meyering

2009-Jul-30 16:48 UTC

head link

Re: BTRFS file clone support for cp

Ric Wheeler wrote:> I think that doing reflink by default would be a horrible idea - one
> good reason to copy a file is to increase your level of fault
> tolerance and reflink magically avoids that :-)
Good point.
This would constitute another user-visible semantic change in cp:
a disk fault that affects any non-metadata block of a ref-linked file
affects both copies.

GNU cp will soon attempt this only when a --reflink option is specified.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Giuseppe Scrivano

2009-Jul-30 17:28 UTC

head link

Re: BTRFS file clone support for cp

Hi Pádraig,

thanks for the comments.

Pádraig Brady <P@draigBrady.com> writes:
> # 300MB seems to be the minimum size for a btrfs with default
> parameters.
Actually, it seems the minimum space required is 256MB.  Using a 255MB
image I get: "device btrfs.img is too small (must be at least 256 MB)"

> # FIXME: use `truncate --allocate` when it becomes available, which
> # may allow unmarking this as an expensive test.
Are you sure that this feature will make the test less expensive?  Still
the test files must be written there, so in the best case (considering
the fallocate done in 0s) only the dd cost will be saved but still it
looks like an expensive test.

In the version I attached, I am using a sparse file (truncate --size)
and it seems to work fine.  Is it correct or am I missing something?

I haven''t looked yet but probably there are other tests that can take
advantage of sparse files instead of using "dd".

I am also considering the Jim''s note doing the umount in the cleanup_
function.

Cheers,
Giuseppe


From 7add4b337b7db0a63bca0dd0fe0f146f175163f8 Mon Sep 17 00:00:00 2001
From: Giuseppe Scrivano <gscrivano@gnu.org>
Date: Wed, 29 Jul 2009 20:31:20 +0200
Subject: [PATCH] tests: add a test for btrfs'' copy-on-write file clone
operation

* tests/Makefile.am: Consider the new test.
* tests/cp/file-clone: New file.
---
 tests/Makefile.am   |    1 +
 tests/cp/file-clone |   58 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 0 deletions(-)
 create mode 100755 tests/cp/file-clone

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 59737a0..9841aa3 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -20,6 +20,7 @@ EXTRA_DIST =		\
 
 root_tests =					\
   chown/basic					\
+  cp/file-clone				\
   cp/cp-a-selinux				\
   cp/preserve-gid				\
   cp/special-bits				\
diff --git a/tests/cp/file-clone b/tests/cp/file-clone
new file mode 100755
index 0000000..c65b9cb
--- /dev/null
+++ b/tests/cp/file-clone
@@ -0,0 +1,58 @@
+#!/bin/sh
+# Make sure file-clone on a btrfs file system works properly.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+
+if test "$VERBOSE" = yes; then
+  set -x
+  cp --version
+fi
+
+. $srcdir/test-lib.sh
+
+require_root_
+require_sparse_support_
+#expensive_
+
+cleanup_(){ umount btrfs; }
+
+fail=0
+
+mkfs.btrfs --version || skip_test_ "btrfs userland tools not
installed"
+
+# 256MB seems to be the minimum size for a btrfs with default parameters.
+truncate --size=256M btrfs.img  || framework_failure
+
+mkfs.btrfs btrfs.img  || framework_failure
+
+mkdir btrfs || framework_failure
+
+mount -t btrfs -o loop btrfs.img btrfs || framework_failure
+
+dd bs=1M count=200 if=/dev/zero of=btrfs/alloc.test || framework_failure
+
+# If the file is cloned, only additional space for metadata is required.
+# Two 200MB files can be present even if the total file system space is 256MB.
+cp btrfs/alloc.test btrfs/clone.test || fail=1
+rm btrfs/clone.test
+
+# When --sparse={always,never} is used, the file is copied without any cloning.
+# Use --sparse=never to be sure the file is copied without holes and it is not
+# possible since there is not enough free space.
+cp --sparse=never btrfs/alloc.test btrfs/clone.test && fail=1
+
+Exit $fail
-- 
1.6.3.3
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Joel Becker

2009-Jul-30 18:05 UTC

head link

Re: BTRFS file clone support for cp

On Thu, Jul 30, 2009 at 12:54:16PM +0200, Andi Kleen
wrote:> > With classic cp, if I copy a 1GB non-sparse file and there''s
less
> > space than that available, cp fails with ENOSPC.
> > With this new feature, it succeeds even if there are
> > just a few blocks available.
> > 
> > Also, consider (buggy!) code that then depends on being able to modify
> > that file in-place, and that "knows" it doesn''t
need to check for ENOSPC.
> > Sure, they should always check for write failure, but still.  It is
> > a change.
> 
> Fair point, although I suspect there are cases where ENOSPC
> on non extending write can already happen on specific file systems. e.g. on
> btrfs it might happen when the tree gets rebalanced? Or perhaps on nilfs2
> when the garbage collector doesn''t run in time. Wouldn''t
surprise
> me if there weren''t more cases already.
	In some sense, using btrfs, nilfs2i, ocfs2 with refcount trees
enabled, or any other CoW-ish filesystem is a tacit approval of the
delayed ENOSPC.  The same can be said of "thin provisioning" LUNs.
However, the other concerns are still valid.  A user invoking vanilla
cp(1) expects two independent storage regions for the data.
	(Oh, and what about future support of de-duping in filesystems?
:-)

Joel

-- 

"Anything that is too stupid to be spoken is sung."  
        - Voltaire

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Pádraig Brady

2009-Jul-30 23:28 UTC

head link

Re: BTRFS file clone support for cp

Joel Becker wrote:> 	In some sense, using btrfs, nilfs2i, ocfs2 with refcount trees
> enabled, or any other CoW-ish filesystem is a tacit approval of the
> delayed ENOSPC.  The same can be said of "thin provisioning"
LUNs.
> However, the other concerns are still valid.  A user invoking vanilla
> cp(1) expects two independent storage regions for the data.
> 	(Oh, and what about future support of de-duping in filesystems?
> :-)
I maintain an app to de-dupe at http://www.pixelbeat.org/fslint/
and I''ll be adding reflink support as soon as it becomes available.
From a filesystem point of view, one thing that would help speed
this up (and many other things like rsync etc.) would be to allow
one to associate say a sha-3 hash or whatever with the file, which
the filesystem would automatically clear when the file data changes.
So in general having a special set of extended attributes that
were auto cleared on file modification would be very useful for
lots of stuff.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Jul 2009 - Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp

Re: BTRFS file clone support for cp