Marti Raudsepp
2011-Feb-13  15:49 UTC
btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
Hi list!
It seems I have found a serious regression in compressed btrfs in
kernel 2.6.37. When creating a small file (less than the block size)
and then cp/mv it to *another* file system, an appropriate number of
zeroes gets written to the destination file. Case in point:
% echo foobar > foobar
% hexdump -C foobar
00000000  66 6f 6f 62 61 72 0a                              |foobar.|
00000007
% mv foobar /tmp
% hexdump -C /tmp/foobar
00000000  00 00 00 00 00 00 00                              |.......|
00000007
% cp foobar foobar2
% hexdump -C foobar2
00000000  00 00 00 00 00 00 00                              |.......|
00000007
Via strace I found that mv doesn''t even attempt to read anything:
open("foobar", O_RDONLY|O_NOFOLLOW)     = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=7, ...}) = 0
open("/tmp/foobar", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
ioctl(3, FS_IOC_FIEMAP, 0x7fff62f6bfa0) = 0
write(4, "\0\0\0\0\0\0\0", 7)           = 7
What''s that, is FS_IOC_FIEMAP telling it that it''s a sparse
file?
Compare with ext4:
ioctl(3, FS_IOC_FIEMAP, 0x7fff2c576a90) = 0
lseek(3, 0, SEEK_SET)                   = 0
read(3, "foobar\n", 4096)               = 7
write(4, "foobar\n", 7)                 = 7
I''m currently running on 2.6.37, x86_64 using Arch Linux -testing with
coreutils 8.10. Filesystem is mounted from LVM2 to /usr/src with -o
noatime,compress
This only seems to occur with compressed file systems (either zlib or
LZO). A person on IRC also reproduced the same problem in 2.6.28-rc.
I''m pretty sure this used to work correctly around 2.6.35 or 2.6.36.
This is 100% reproducible here. If anyone has trouble reproducing
this, I can dig further and provide information as needed.
Regards,
Marti
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik
2011-Feb-13  15:57 UTC
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
On Sun, Feb 13, 2011 at 05:49:42PM +0200, Marti Raudsepp wrote:> Hi list! > > It seems I have found a serious regression in compressed btrfs in > kernel 2.6.37. When creating a small file (less than the block size) > and then cp/mv it to *another* file system, an appropriate number of > zeroes gets written to the destination file. Case in point: > > % echo foobar > foobar > % hexdump -C foobar > 00000000 66 6f 6f 62 61 72 0a |foobar.| > 00000007 > % mv foobar /tmp > % hexdump -C /tmp/foobar > 00000000 00 00 00 00 00 00 00 |.......| > 00000007 > % cp foobar foobar2 > % hexdump -C foobar2 > 00000000 00 00 00 00 00 00 00 |.......| > 00000007 > > Via strace I found that mv doesn''t even attempt to read anything: > > open("foobar", O_RDONLY|O_NOFOLLOW) = 3 > fstat(3, {st_mode=S_IFREG|0664, st_size=7, ...}) = 0 > open("/tmp/foobar", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4 > fstat(4, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 > ioctl(3, FS_IOC_FIEMAP, 0x7fff62f6bfa0) = 0 > write(4, "\0\0\0\0\0\0\0", 7) = 7 > > What''s that, is FS_IOC_FIEMAP telling it that it''s a sparse file? > Compare with ext4: > > ioctl(3, FS_IOC_FIEMAP, 0x7fff2c576a90) = 0 > lseek(3, 0, SEEK_SET) = 0 > read(3, "foobar\n", 4096) = 7 > write(4, "foobar\n", 7) = 7 > > I''m currently running on 2.6.37, x86_64 using Arch Linux -testing with > coreutils 8.10. Filesystem is mounted from LVM2 to /usr/src with -o > noatime,compress > > This only seems to occur with compressed file systems (either zlib or > LZO). A person on IRC also reproduced the same problem in 2.6.28-rc. > I''m pretty sure this used to work correctly around 2.6.35 or 2.6.36. > > This is 100% reproducible here. If anyone has trouble reproducing > this, I can dig further and provide information as needed. >Does the same problem happen when you use cp --sparse=never? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Marti Raudsepp
2011-Feb-13  16:07 UTC
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
On Sun, Feb 13, 2011 at 17:57, Josef Bacik <josef@redhat.com> wrote:> Does the same problem happen when you use cp --sparse=never?You are right. cp --sparse=never does not cause data loss. Regards, Marti -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik
2011-Feb-13  16:13 UTC
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
On Sun, Feb 13, 2011 at 06:07:36PM +0200, Marti Raudsepp wrote:> On Sun, Feb 13, 2011 at 17:57, Josef Bacik <josef@redhat.com> wrote: > > Does the same problem happen when you use cp --sparse=never? > > You are right. cp --sparse=never does not cause data loss. >So fiemap probably isn''t doing the right thing when compression is enabled, which doesn''t suprise me since we don''t do the right thing with delalloc either. I will try and get to this soon. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hugo Mills
2011-Feb-13  16:31 UTC
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
On Sun, Feb 13, 2011 at 05:49:42PM +0200, Marti Raudsepp wrote:> Hi list! > > It seems I have found a serious regression in compressed btrfs in > kernel 2.6.37. When creating a small file (less than the block size) > and then cp/mv it to *another* file system, an appropriate number of > zeroes gets written to the destination file. Case in point:[snip]> I''m currently running on 2.6.37, x86_64 using Arch Linux -testing with > coreutils 8.10. Filesystem is mounted from LVM2 to /usr/src with -o > noatime,compress > > This only seems to occur with compressed file systems (either zlib or > LZO). A person on IRC also reproduced the same problem in 2.6.28-rc. > I''m pretty sure this used to work correctly around 2.6.35 or 2.6.36.This would seem to be the same effect that we''ve had reported on IRC by at least two Gentoo users, of files full of zeroes in their build system. We''ll follow up with them over there and see if it''s the same bug. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I must be musical: I''ve got *loads* of CDs ---
Chris Mason
2011-Feb-14  15:01 UTC
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
Excerpts from Josef Bacik''s message of 2011-02-13 11:13:30 -0500:> On Sun, Feb 13, 2011 at 06:07:36PM +0200, Marti Raudsepp wrote: > > On Sun, Feb 13, 2011 at 17:57, Josef Bacik <josef@redhat.com> wrote: > > > Does the same problem happen when you use cp --sparse=never? > > > > You are right. cp --sparse=never does not cause data loss. > > > > So fiemap probably isn''t doing the right thing when compression is enabled, > which doesn''t suprise me since we don''t do the right thing with delalloc either. > I will try and get to this soon. Thanks,This might be a bug in the cp code. We''re setting the disk extent to zero but setting different flags to say we''re inline and compressed. The cp fiemap code might be ignoring the flags? Or, it could just be delalloc ;) -chris
Marti Raudsepp
2011-Feb-14  17:58 UTC
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
On Mon, Feb 14, 2011 at 17:01, Chris Mason <chris.mason@oracle.com> wrote:> Or, it could just be delalloc ;)I suspect delalloc. After creating the file, filefrag reports "1 extent found", but for some reason it doesn''t actually print out details of the extent. After a "sync" call, the extent appears and "cp" starts working as expected: % rm -f foo bar % echo foo > foo % sync % filefrag -v foo Filesystem type is: 9123683e File size of foo is 4 (1 block, blocksize 4096) ext logical physical expected length flags 0 0 0 4096 not_aligned,inline,eof foo: 1 extent found % cp foo bar % hexdump bar 0000000 6f66 0a6f 0000004 Without sync: % rm -f foo bar % echo foo > foo % filefrag -v foo Filesystem type is: 9123683e File size of foo is 4 (1 block, blocksize 4096) ext logical physical expected length flags foo: 1 extent found % cp foo bar % hexdump bar 0000000 0000 0000 0000004 Regards, Marti
Chris Mason
2011-Feb-14  18:01 UTC
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
Excerpts from Marti Raudsepp''s message of 2011-02-14 12:58:17 -0500:> On Mon, Feb 14, 2011 at 17:01, Chris Mason <chris.mason@oracle.com> wrote: > > Or, it could just be delalloc ;) > > I suspect delalloc. After creating the file, filefrag reports "1 > extent found", but for some reason it doesn''t actually print out > details of the extent. > > After a "sync" call, the extent appears and "cp" starts working as expected:Great, that''s a ton easier than fixing cp. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pádraig Brady
2011-Feb-15  11:30 UTC
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
On 14/02/11 17:58, Marti Raudsepp wrote:> On Mon, Feb 14, 2011 at 17:01, Chris Mason <chris.mason@oracle.com> wrote: >> Or, it could just be delalloc ;) > > I suspect delalloc. After creating the file, filefrag reports "1 > extent found", but for some reason it doesn''t actually print out > details of the extent.That''s a bug in `filefrag -v` that I noticed independently yesterday. Without -v it will correctly report 0 extents. I''ve already suggested a patch to fix upstream.> After a "sync" call, the extent appears and "cp" starts working as expected:About that sync. I''ve noticed on ext4 loop back at least (and I suspect BTRFS is the same) that specifying FIEMAP_FLAG_SYNC (which cp does) is ineffective. I worked around this for cp tests by explicitly syncing with: dd if=/dev/null of=foo conv=notrunc,fdatasync> % rm -f foo bar > % echo foo > foo > % sync > % filefrag -v foo > Filesystem type is: 9123683e > File size of foo is 4 (1 block, blocksize 4096) > ext logical physical expected length flags > 0 0 0 4096 not_aligned,inline,eof > foo: 1 extent found > % cp foo bar > % hexdump bar > 0000000 6f66 0a6f > 0000004OK that''s fine for normal files. cp (from coreutils >= 8.10) may still do the wrong thing as it currently ignores FIEMAP_EXTENT_DATA_ENCRYPTED and FIEMAP_EXTENT_ENCODED as I''ve already reported: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg08356.html I''d appreciate some `filefrag -v` output from a large compressed file. cheers, Pádraig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Josef Bacik
2011-Feb-15  13:18 UTC
Re: btrfs: compression breaks cp and cross-FS mv, FS_IOC_FIEMAP bug?
On Tue, Feb 15, 2011 at 11:30:38AM +0000, Pádraig Brady wrote:> On 14/02/11 17:58, Marti Raudsepp wrote: > > On Mon, Feb 14, 2011 at 17:01, Chris Mason <chris.mason@oracle.com> wrote: > >> Or, it could just be delalloc ;) > > > > I suspect delalloc. After creating the file, filefrag reports "1 > > extent found", but for some reason it doesn''t actually print out > > details of the extent. > > That''s a bug in `filefrag -v` that I noticed independently yesterday. > Without -v it will correctly report 0 extents. > I''ve already suggested a patch to fix upstream. > > > After a "sync" call, the extent appears and "cp" starts working as expected: > > About that sync. > I''ve noticed on ext4 loop back at least (and I suspect BTRFS is the same) > that specifying FIEMAP_FLAG_SYNC (which cp does) is ineffective. > I worked around this for cp tests by explicitly syncing with: > dd if=/dev/null of=foo conv=notrunc,fdatasync >Well thats not good, thats all take care of in the generic code before it gets to the fs, I''ll take a look at that when I try and fix delalloc fiemap for btrfs. Thanks, Josef