Oh hai Sorry if it''s already fixed, but at least in fedora''s 2.6.32.3-10.fc12.i686.PAE btrfs has a regression which can be illustrated by running the following dumb test: === cut here ==#define _GNU_SOURCE #define _FILE_OFFSET_BITS 64 #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int main() { int fd = open("testfile", O_CLOEXEC | O_CREAT | O_NOATIME | O_WRONLY, 0666); fallocate(fd, 0, 0, 100); write(fd, "test", 4); close(fd); } === cut here == if you run it on ext4, it will create a 4-byte file with "test" in it. On btrfs, however, the file size would be 4096, and the remaining space will be filled with zeroes. I will try more recent kernels with that though. -- This message represents the official view of the voices in my head -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> if you run it on ext4, it will create a 4-byte file with "test" in it.> On btrfs, however, the file size would be 4096, and the remaining > space will be filled with zeroes. My fallocate man page says: Because allocation is done in block size chunks, fallocate() may allocate a larger range than that which was specified. so the btrfs behavior seems OK to me. You say this is a regression. What btrfs version behaved differently? - R. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2010 at 5:27 PM, Roland Dreier <rdreier@cisco.com> wrote:> My fallocate man page says: > > Because allocation is done in block size chunks, fallocate() may > allocate a larger range than that which was specified. > > so the btrfs behavior seems OK to me. > > You say this is a regression. What btrfs version behaved differently?Err. My wording may not be very precise. I see this as the regression from every other filesystem, to the extent that rsync --preallocate will give you directory tree full of corrupted files. Also, your man page also says something else: If FALLOC_FL_KEEP_SIZE flag is not specified in mode, the default behavior is almost same as when this flag is specified. The only difference is that on success, the file size will be changed if offset + len is greater than the file size. This default behavior closely resembles the behavior of the posix_fallocate(3) library function, and is intended as a method of optimally implementing that function. See, here nothing says to which value file size will be changed. Also, posix_fallocate man page (which is actually a shortcut to fallocate(fd, 0, ...)) says: If the size of the file is less than offset+len, then the file is increased to this size; otherwise the file size is left unchanged. Nowhere there it is said that size is ceil()''d to a blocksize. Moreover, I never saw an app that does fallocate() and then ftruncate() - every single app out there assumes that posix_fallocate (and fallocate, fwiw), while may allocate more space, will set the file size to something different than offset+len. -- This message represents the official view of the voices in my head -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2010/1/14 i <i@stingr.net>:> Nowhere there it is said that size is ceil()''d to a blocksize. Moreover, I never saw an app that does fallocate() and then ftruncate() - every single app out there assumes that posix_fallocate (and fallocate, fwiw), while may allocate more space, will set the file size to something different than offset+len.Err, of course it''s to just offset+len. I may need more sleep (this won''t purge away the fallocate problem though). -- This message represents the official view of the voices in my head -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 14 Jan 2010 09:27:38 -0800, Roland Dreier <rdreier@cisco.com> wrote:> > > if you run it on ext4, it will create a 4-byte file with "test" in it. > > On btrfs, however, the file size would be 4096, and the remaining > > space will be filled with zeroes. > > My fallocate man page says: > > Because allocation is done in block size chunks, fallocate() may > allocate a larger range than that which was specified. >But with zero as the mode value it should not have updated i_size. Looking at the latest linus tree it seems to be doing the right thing -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2010 at 09:27:38AM -0800, Roland Dreier wrote:> > > if you run it on ext4, it will create a 4-byte file with "test" in it. > > On btrfs, however, the file size would be 4096, and the remaining > > space will be filled with zeroes. > > My fallocate man page says: > > Because allocation is done in block size chunks, fallocate() may > allocate a larger range than that which was specified. > > so the btrfs behavior seems OK to me. > > You say this is a regression. What btrfs version behaved differently?The file size should still be 4 bytes I think, even if we allocate 4096. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2010 at 7:20 PM, Chris Mason <chris.mason@oracle.com> wrote:> The file size should still be 4 bytes I think, even if we allocate 4096.Yes, to clarify - I changed the code to be fallocate(fd, 0, 0, 4); and then posix_fallocate(fd, 0, 4); (which is the same I guess) and tried it on different filesystems, and on btrfs I ended up with entire block (of zeroes). If it''s fixed in latest tree it''s fine, I guess that fix isn''t in fedora''s 2.6.32.3 -- This message represents the official view of the voices in my head -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 14, 2010 at 8:33 PM, Paul Komkoff <i@stingr.net> wrote:> If it''s fixed in latest tree it''s fine, I guess that fix isn''t in > fedora''s 2.6.32.3Sorry for popping up again, but did anyone fix this/verified there''s no problem in recent kernels? For some reasons I cannot run latest git so I''m stuck with fedora kernels, and every one I have around me (with btrfs) has this problem. Thanks. -- This message represents the official view of the voices in my head -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 19 Jan 2010 15:18:26 +0000, Paul Komkoff <i@stingr.net> wrote:> On Thu, Jan 14, 2010 at 8:33 PM, Paul Komkoff <i@stingr.net> wrote: > > If it''s fixed in latest tree it''s fine, I guess that fix isn''t in > > fedora''s 2.6.32.3 > > Sorry for popping up again, but did anyone fix this/verified there''s > no problem in recent kernels? For some reasons I cannot run latest git > so I''m stuck with fedora kernels, and every one I have around me (with > btrfs) has this problem. >the below change fixes this for me on btrfs commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Jan 20 12:57:53 2010 +0530 btrfs: Use correct values when updating inode i_size on fallocate Even though we allocate more, we should be updating inode i_size as per the arguments passed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5440bab..db406a4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5789,7 +5789,7 @@ out_fail: } static int prealloc_file_range(struct inode *inode, u64 start, u64 end, - u64 alloc_hint, int mode) + u64 alloc_hint, int mode, loff_t actual_len) { struct btrfs_trans_handle *trans; struct btrfs_root *root = BTRFS_I(inode)->root; @@ -5798,6 +5798,7 @@ static int prealloc_file_range(struct inode *inode, u64 start, u64 end, u64 cur_offset = start; u64 num_bytes = end - start; int ret = 0; + u64 i_size; while (num_bytes > 0) { alloc_size = min(num_bytes, root->fs_info->max_extent); @@ -5836,8 +5837,12 @@ static int prealloc_file_range(struct inode *inode, u64 start, u64 end, BTRFS_I(inode)->flags |= BTRFS_INODE_PREALLOC; if (!(mode & FALLOC_FL_KEEP_SIZE) && cur_offset > inode->i_size) { - i_size_write(inode, cur_offset); - btrfs_ordered_update_i_size(inode, cur_offset, NULL); + if (cur_offset > actual_len) + i_size = actual_len; + else + i_size = cur_offset; + i_size_write(inode, i_size); + btrfs_ordered_update_i_size(inode, i_size, NULL); } ret = btrfs_update_inode(trans, root, inode); @@ -5930,7 +5935,7 @@ static long btrfs_fallocate(struct inode *inode, int mode, !test_bit(EXTENT_FLAG_PREALLOC, &em->flags))) { ret = prealloc_file_range(inode, cur_offset, last_byte, - alloc_hint, mode); + alloc_hint, mode, offset+len); if (ret < 0) { free_extent_map(em); break; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 20, 2010 at 7:28 AM, Aneesh Kumar K. V <aneesh.kumar@linux.vnet.ibm.com> wrote:> the below change fixes this for me on btrfsThanks a million, now I guess we''re waiting for Chris to pull it. Will it qualify for stable update? -- This message represents the official view of the voices in my head -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html