Hi everyone, This pull request is pretty beefy, it ended up merging a number of long running projects and cleanup queues. I''ve got btrfs patches in the new kernel.org btrfs repo. There are two different branches with the same changes. for-linus is against 3.1 and has also been tested against Linus'' tree as of yesterday. git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus The next-merged branch is the same btrfs code, but has a merge commit for the current linux-next tree. There was only a single conflict, linux-next has a fix for code that no longer exists, so the merge is just to take my code. I know Linus won''t end up using this, it''s just to demonstrate the conflict in case he ends up with that patch. git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git next-merged The big features in this pull are: Many cleanups and optimizations from Josef. Many of these fixup the enospc throttling in btrfs, where we try to start IO to make sure we can do all the allocations we''ve promised we''ll do. The end result is a dramatic improvement in random write workloads among many others. Arne Jansen and Jan Schmidt have improved the scrubber and provided utilities to walk btrfs'' many backrefs. The scrubber is much faster thanks to extensive btree readahead and instead of just telling you a specific block is bad, it tells you which btree or which file was impacted by that bad block. There are also progs updates to give you the same backref walking from the command line. SUSE and Fujitsu have a nice set of error handling fixes, and Li Zefan also closed out some problems in the mount -o autodefrag mode. I kicked in an array of backup tree roots. If a given mount fails to go through because a tree root is bad, you can mount -o recovery and it''ll walk through the array and try older versions of the FS. I also spent a lot of time refining Fujitsu''s log tree improvements. This code has been around for quite a while, and I really wanted to get it in this time. But yesterday I hit corruptions when I mixed heavy fsyncs with heavy snapshotting, and I wasn''t able to fix things in time for this merge window. Josef Bacik (60) commits (+1847/-1238): Btrfs: don''t check bytes_pinned to determine if we should commit the transaction (+0/-11) Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_check (+10/-10) Btrfs: be smarter about committing the transaction in reserve_metadata_bytes (+67/-19) Btrfs: release metadata from global reserve if we have to fallback for unlink (+4/-1) Btrfs: wait for ordered extents if we''re in trouble when shrinking delalloc (+17/-8) Btrfs: add a io_ctl struct and helpers for dealing with the space cache (+375/-318) Btrfs: seperate out btrfs_block_rsv_check out into 2 different functions (+37/-24) Btrfs: check the return value of filemap_write_and_wait in the space cache (+5/-2) Btrfs: don''t increase the block_rsv''s size when emergency allocating space (+0/-3) Btrfs: use the global reserve when truncating the free space cache inode (+17/-5) Btrfs: make sure to unset trans->block_rsv before running delayed refs (+11/-0) Btrfs: only reserve space in fallocate if we have to do a preallocate (+16/-6) Btrfs: stop passing a trans handle all around the reservation code (+39/-43) Btrfs: skip looking for delalloc if we don''t have ->fill_delalloc (+5/-1) Btrfs: if we have a lot of pinned space, commit the transaction (+15/-0) Btrfs: release trans metadata bytes before flushing delayed refs (+3/-8) Btrfs: make a delayed_block_rsv for the delayed item insertion (+13/-7) Btrfs: ratelimit the generation printk for the free space cache (+7/-5) Btrfs: use the global reserve as a backup for deleting inodes (+11/-1) Btrfs: allow shrink_delalloc flush the needed reclaimed pages (+3/-2) Btrfs: only inherit btrfs specific flags when creating files (+11/-6) Btrfs: break out of orphan cleanup if we can''t make progress (+11/-0) Btrfs: move stuff around in btrfs_inode to get better packing (+3/-3) Btrfs: reserve some space for an orphan item when unlinking (+8/-1) Btrfs: wait for ordered extents if we didn''t reclaim enough (+1/-1) Btrfs: check unused against how much space we actually want (+1/-1) Btrfs: put the block group cache after we commit the super (+3/-3) Btrfs: inline checksums into the disk free space cache (+172/-68) Btrfs: use the inode''s mapping mask for allocating pages (+18/-6) Btrfs: fix space leak when we fail to make an allocation (+14/-6) Btrfs: don''t skip writing out a empty block groups cache (+6/-4) Btrfs: use the transactions block_rsv for the csum root (+10/-6) Btrfs: fix call to btrfs_search_slot in free space cache (+1/-1) Btrfs: use d_obtain_alias when mounting subvol/subvolid (+1/-24) Btrfs: allow us to overcommit our enospc reservations (+88/-18) Btrfs: don''t get the block_rsv in btrfs_free_tree_block (+0/-4) Btrfs: handle enospc accounting for free space inodes (+47/-23) Btrfs: reduce the amount of space needed for truncates (+15/-4) Btrfs: kill the orphan space calculation for snapshots (+0/-90) Btrfs: use bytes_may_use for all ENOSPC reservations (+112/-82) Btrfs: optimize how we account for space in truncate (+29/-29) Btrfs: fix how we reserve space for deleting inodes (+38/-11) Btrfs: don''t flush the cache inode before writing it (+0/-4) Btrfs: take overflow into account in reserving space (+1/-1) Btrfs: don''t try to commit in btrfs_block_rsv_check (+4/-25) Btrfs: fix the amount of space reserved for unlink (+10/-1) Btrfs: fix regression in re-setting a large xattr (+11/-0) Btrfs: introduce mount option no_space_cache (+22/-10) Btrfs: delay iput when deleting a block group (+1/-1) Btrfs: kill btrfs_truncate_reserve_metadata (+0/-34) Btrfs: fix how we mount subvol=<whatever> (+135/-64) Btrfs: calculate checksum space correctly (+118/-8) Btrfs: kill the durable block rsv stuff (+17/-101) Btrfs: fix delayed insertion reservation (+49/-8) Btrfs: fix orphan cleanup regression (+17/-19) Btrfs: kill unused parts of block_rsv (+6/-22) Btrfs: introduce convert_extent_bit (+190/-0) Btrfs: set truncate block rsv''s size (+2/-0) Btrfs: kill reserved_bytes in inode (+0/-8) Btrfs: stop using write_one_page (+20/-67) Jan Schmidt (13) commits (+1954/-303): btrfs: new ioctls to do logical->inode and inode->path resolving (+162/-0) btrfs scrub: add fixup code for errors on nodatasum files (+183/-6) btrfs: integrating raid-repair and scrub-fixup-nodatasum (+67/-25) btrfs: Moved repair code from inode.c to extent_io.c (+393/-159) btrfs: added helper functions to iterate backrefs (+851/-1) btrfs: btrfs_multi_bio replaced with btrfs_bio (+90/-78) btrfs: Do not use bio->bi_bdev after submission (+1/-1) btrfs: add mirror_num to extent_read_full_page (+6/-6) btrfs scrub: print paths of corrupted files (+163/-6) btrfs scrub: use int for mirror_num, not u64 (+4/-4) btrfs scrub: bugfix: mirror_num off by one (+6/-6) btrfs scrub: added unverified_errors (+26/-11) btrfs: Put mirror_num in bi_bdev (+2/-0) Chris Mason (10) commits (+490/-67): Btrfs: make sure to flush queued bios if write_cache_pages waits (+22/-10) Btrfs: don''t wait as long for more batches during SSD log commit (+2/-2) Btrfs: fix extent_buffer leak in the metadata IO error handling (+1/-0) Btrfs: ClearPageError during writepage and clean_tree_block (+10/-1) Btrfs: make sure btrfs_remove_free_space doesn''t leak EAGAIN (+3/-1) Btrfs: fix the new inspection ioctls for 32 bit compat (+15/-16) Btrfs: stop the readahead threads on failed mount (+1/-0) Btrfs: fix extent pinning bugs in the tree log (+59/-8) Btrfs: fix race during transaction joins (+8/-5) Btrfs: add a log of past tree roots (+369/-24) Li Zefan (7) commits (+47/-24): Btrfs: fix wrong max_to_defrag in btrfs_defrag_file() (+1/-1) Btrfs: honor extent thresh during defragmentation (+26/-11) Btrfs: remove BUG_ON() in compress_file_range() (+5/-1) Btrfs: use i_size_read() in btrfs_defrag_file() (+4/-3) Btrfs: fix defragmentation regression (+4/-2) Btrfs: fix direct-io vs nodatacow (+1/-2) Btrfs: fix array bound checking (+6/-4) Ilya Dryomov (6) commits (+22/-21): Btrfs: pass the correct root to lookup_free_space_inode() (+1/-1) Btrfs: rename btrfs_bio multi -> bbio for consistency (+15/-15) Btrfs: fix a potential btrfs_bio leak on scrub fixups (+1/-0) Btrfs: stop leaking btrfs_bios on readahead (+2/-0) Btrfs: fix a bug when opening seed devices (+1/-1) Btrfs: close all bdevs on mount failure (+2/-4) Arne Jansen (6) commits (+1130/-70): btrfs: add an extra wait mode to read_extent_buffer_pages (+9/-6) btrfs: initial readahead code and prototypes (+967/-1) btrfs: add READAHEAD extent buffer flag (+35/-0) btrfs: state information for readahead (+31/-0) btrfs: use readahead API for scrub (+50/-62) btrfs: hooks for readahead (+38/-1) David Sterba (3) commits (+105/-77): btrfs: do not allow mounting non-subvolumes via subvol option (+19/-0) btrfs: separate superblock items out of fs_info (+78/-76) btrfs: ratelimit WARN_ON in use_block_rsv (+8/-1) Zheng Yan (1) commits (+4/-1): btrfs: check file extent backref offset underflow Lukas Czerner (1) commits (+5/-1): btrfs: return EINVAL if start > total_bytes in fitrim ioctl Daniel J Blueman (1) commits (+6/-2): btrfs: fix oops on failure path Liu Bo (1) commits (+1/-1): Btrfs: do not set EXTENT_DIRTY along with EXTENT_DELALLOC Diego Calleja (1) commits (+1/-3): btrfs: fix memory leak in btrfs_defrag_file Jeff Liu (1) commits (+8/-2): btrfs: trivial fix, a potential memory leak in btrfs_parse_early_options() Miao Xie (1) commits (+7/-0): Btrfs: fix race between multi-task space allocation and caching space Tsutomu Itoh (1) commits (+7/-10): Btrfs: fix return value of btrfs_get_acl() Total: (113) commits fs/btrfs/Makefile | 3 +- fs/btrfs/acl.c | 17 +- fs/btrfs/backref.c | 776 +++++++++++++++++++++++++++++++++++ fs/btrfs/backref.h | 62 +++ fs/btrfs/btrfs_inode.h | 17 +- fs/btrfs/compression.c | 3 +- fs/btrfs/ctree.c | 10 +- fs/btrfs/ctree.h | 198 ++++++++-- fs/btrfs/delayed-inode.c | 50 ++- fs/btrfs/disk-io.c | 434 +++++++++++++++++--- fs/btrfs/disk-io.h | 4 +- fs/btrfs/extent-tree.c | 848 +++++++++++++++++++++++--------------- fs/btrfs/extent_io.c | 614 +++++++++++++++++++++++++++- fs/btrfs/extent_io.h | 23 +- fs/btrfs/file-item.c | 17 +- fs/btrfs/file.c | 25 +- fs/btrfs/free-space-cache.c | 926 +++++++++++++++++++++++++---------------- fs/btrfs/inode-map.c | 6 +- fs/btrfs/inode.c | 457 +++++++-------------- fs/btrfs/ioctl.c | 227 +++++++++-- fs/btrfs/ioctl.h | 29 ++ fs/btrfs/print-tree.c | 8 +- fs/btrfs/reada.c | 951 +++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/relocation.c | 24 +- fs/btrfs/scrub.c | 591 ++++++++++++++++++++++----- fs/btrfs/super.c | 298 +++++++++----- fs/btrfs/transaction.c | 146 +++---- fs/btrfs/tree-log.c | 19 +- fs/btrfs/volumes.c | 207 ++++++---- fs/btrfs/volumes.h | 18 +- fs/btrfs/xattr.c | 11 + 31 files changed, 5416 insertions(+), 1603 deletions(-)
2011/11/6 Chris Mason <chris.mason@oracle.com>: Hi Chris, and thanks a lot for your work.> Arne Jansen and Jan Schmidt have improved the scrubber and provided > utilities to walk btrfs'' many backrefs. The scrubber is much faster > thanks to extensive btree readahead and instead of just telling you a > specific block is bad, it tells you which btree or which file was > impacted by that bad block.Using your for-linus branch, on latest Linus'' git tree, with latest git tools, I''ve got this: root@Q45:/home/gelma/dev/prg/btrfs# ./btrfs scrub start -Br /dev/md126 ERROR: scrubbing /dev/md126 failed for device id 1 (Cannot allocate memory) scrub canceled for 11827b37-1ba0-4b3e-883d-2746987724ca scrub started at Sun Nov 6 20:23:46 2011 and was aborted after 0 seconds total bytes scrubbed: 0.00 with 0 errors root@Q45:/home/gelma/dev/prg/btrfs# ./btrfs scrub start -Br /home/ ERROR: scrubbing /home/ failed for device id 1 (Cannot allocate memory) scrub canceled for 11827b37-1ba0-4b3e-883d-2746987724ca scrub started at Sun Nov 6 20:25:01 2011 and was aborted after 0 seconds total bytes scrubbed: 0.00 with 0 errors Thanks a lot for your time, Andrea> > There are also progs updates to give you the same backref walking from > the command line. > > SUSE and Fujitsu have a nice set of error handling fixes, and Li Zefan > also closed out some problems in the mount -o autodefrag mode. > > I kicked in an array of backup tree roots. If a given mount fails to go > through because a tree root is bad, you can mount -o recovery and it''ll > walk through the array and try older versions of the FS. > > I also spent a lot of time refining Fujitsu''s log tree improvements. > This code has been around for quite a while, and I really wanted to get > it in this time. But yesterday I hit corruptions when I mixed heavy > fsyncs with heavy snapshotting, and I wasn''t able to fix things in time > for this merge window. > > Josef Bacik (60) commits (+1847/-1238): > Btrfs: don''t check bytes_pinned to determine if we should commit the transaction (+0/-11) > Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_check (+10/-10) > Btrfs: be smarter about committing the transaction in reserve_metadata_bytes (+67/-19) > Btrfs: release metadata from global reserve if we have to fallback for unlink (+4/-1) > Btrfs: wait for ordered extents if we''re in trouble when shrinking delalloc (+17/-8) > Btrfs: add a io_ctl struct and helpers for dealing with the space cache (+375/-318) > Btrfs: seperate out btrfs_block_rsv_check out into 2 different functions (+37/-24) > Btrfs: check the return value of filemap_write_and_wait in the space cache (+5/-2) > Btrfs: don''t increase the block_rsv''s size when emergency allocating space (+0/-3) > Btrfs: use the global reserve when truncating the free space cache inode (+17/-5) > Btrfs: make sure to unset trans->block_rsv before running delayed refs (+11/-0) > Btrfs: only reserve space in fallocate if we have to do a preallocate (+16/-6) > Btrfs: stop passing a trans handle all around the reservation code (+39/-43) > Btrfs: skip looking for delalloc if we don''t have ->fill_delalloc (+5/-1) > Btrfs: if we have a lot of pinned space, commit the transaction (+15/-0) > Btrfs: release trans metadata bytes before flushing delayed refs (+3/-8) > Btrfs: make a delayed_block_rsv for the delayed item insertion (+13/-7) > Btrfs: ratelimit the generation printk for the free space cache (+7/-5) > Btrfs: use the global reserve as a backup for deleting inodes (+11/-1) > Btrfs: allow shrink_delalloc flush the needed reclaimed pages (+3/-2) > Btrfs: only inherit btrfs specific flags when creating files (+11/-6) > Btrfs: break out of orphan cleanup if we can''t make progress (+11/-0) > Btrfs: move stuff around in btrfs_inode to get better packing (+3/-3) > Btrfs: reserve some space for an orphan item when unlinking (+8/-1) > Btrfs: wait for ordered extents if we didn''t reclaim enough (+1/-1) > Btrfs: check unused against how much space we actually want (+1/-1) > Btrfs: put the block group cache after we commit the super (+3/-3) > Btrfs: inline checksums into the disk free space cache (+172/-68) > Btrfs: use the inode''s mapping mask for allocating pages (+18/-6) > Btrfs: fix space leak when we fail to make an allocation (+14/-6) > Btrfs: don''t skip writing out a empty block groups cache (+6/-4) > Btrfs: use the transactions block_rsv for the csum root (+10/-6) > Btrfs: fix call to btrfs_search_slot in free space cache (+1/-1) > Btrfs: use d_obtain_alias when mounting subvol/subvolid (+1/-24) > Btrfs: allow us to overcommit our enospc reservations (+88/-18) > Btrfs: don''t get the block_rsv in btrfs_free_tree_block (+0/-4) > Btrfs: handle enospc accounting for free space inodes (+47/-23) > Btrfs: reduce the amount of space needed for truncates (+15/-4) > Btrfs: kill the orphan space calculation for snapshots (+0/-90) > Btrfs: use bytes_may_use for all ENOSPC reservations (+112/-82) > Btrfs: optimize how we account for space in truncate (+29/-29) > Btrfs: fix how we reserve space for deleting inodes (+38/-11) > Btrfs: don''t flush the cache inode before writing it (+0/-4) > Btrfs: take overflow into account in reserving space (+1/-1) > Btrfs: don''t try to commit in btrfs_block_rsv_check (+4/-25) > Btrfs: fix the amount of space reserved for unlink (+10/-1) > Btrfs: fix regression in re-setting a large xattr (+11/-0) > Btrfs: introduce mount option no_space_cache (+22/-10) > Btrfs: delay iput when deleting a block group (+1/-1) > Btrfs: kill btrfs_truncate_reserve_metadata (+0/-34) > Btrfs: fix how we mount subvol=<whatever> (+135/-64) > Btrfs: calculate checksum space correctly (+118/-8) > Btrfs: kill the durable block rsv stuff (+17/-101) > Btrfs: fix delayed insertion reservation (+49/-8) > Btrfs: fix orphan cleanup regression (+17/-19) > Btrfs: kill unused parts of block_rsv (+6/-22) > Btrfs: introduce convert_extent_bit (+190/-0) > Btrfs: set truncate block rsv''s size (+2/-0) > Btrfs: kill reserved_bytes in inode (+0/-8) > Btrfs: stop using write_one_page (+20/-67) > > Jan Schmidt (13) commits (+1954/-303): > btrfs: new ioctls to do logical->inode and inode->path resolving (+162/-0) > btrfs scrub: add fixup code for errors on nodatasum files (+183/-6) > btrfs: integrating raid-repair and scrub-fixup-nodatasum (+67/-25) > btrfs: Moved repair code from inode.c to extent_io.c (+393/-159) > btrfs: added helper functions to iterate backrefs (+851/-1) > btrfs: btrfs_multi_bio replaced with btrfs_bio (+90/-78) > btrfs: Do not use bio->bi_bdev after submission (+1/-1) > btrfs: add mirror_num to extent_read_full_page (+6/-6) > btrfs scrub: print paths of corrupted files (+163/-6) > btrfs scrub: use int for mirror_num, not u64 (+4/-4) > btrfs scrub: bugfix: mirror_num off by one (+6/-6) > btrfs scrub: added unverified_errors (+26/-11) > btrfs: Put mirror_num in bi_bdev (+2/-0) > > Chris Mason (10) commits (+490/-67): > Btrfs: make sure to flush queued bios if write_cache_pages waits (+22/-10) > Btrfs: don''t wait as long for more batches during SSD log commit (+2/-2) > Btrfs: fix extent_buffer leak in the metadata IO error handling (+1/-0) > Btrfs: ClearPageError during writepage and clean_tree_block (+10/-1) > Btrfs: make sure btrfs_remove_free_space doesn''t leak EAGAIN (+3/-1) > Btrfs: fix the new inspection ioctls for 32 bit compat (+15/-16) > Btrfs: stop the readahead threads on failed mount (+1/-0) > Btrfs: fix extent pinning bugs in the tree log (+59/-8) > Btrfs: fix race during transaction joins (+8/-5) > Btrfs: add a log of past tree roots (+369/-24) > > Li Zefan (7) commits (+47/-24): > Btrfs: fix wrong max_to_defrag in btrfs_defrag_file() (+1/-1) > Btrfs: honor extent thresh during defragmentation (+26/-11) > Btrfs: remove BUG_ON() in compress_file_range() (+5/-1) > Btrfs: use i_size_read() in btrfs_defrag_file() (+4/-3) > Btrfs: fix defragmentation regression (+4/-2) > Btrfs: fix direct-io vs nodatacow (+1/-2) > Btrfs: fix array bound checking (+6/-4) > > Ilya Dryomov (6) commits (+22/-21): > Btrfs: pass the correct root to lookup_free_space_inode() (+1/-1) > Btrfs: rename btrfs_bio multi -> bbio for consistency (+15/-15) > Btrfs: fix a potential btrfs_bio leak on scrub fixups (+1/-0) > Btrfs: stop leaking btrfs_bios on readahead (+2/-0) > Btrfs: fix a bug when opening seed devices (+1/-1) > Btrfs: close all bdevs on mount failure (+2/-4) > > Arne Jansen (6) commits (+1130/-70): > btrfs: add an extra wait mode to read_extent_buffer_pages (+9/-6) > btrfs: initial readahead code and prototypes (+967/-1) > btrfs: add READAHEAD extent buffer flag (+35/-0) > btrfs: state information for readahead (+31/-0) > btrfs: use readahead API for scrub (+50/-62) > btrfs: hooks for readahead (+38/-1) > > David Sterba (3) commits (+105/-77): > btrfs: do not allow mounting non-subvolumes via subvol option (+19/-0) > btrfs: separate superblock items out of fs_info (+78/-76) > btrfs: ratelimit WARN_ON in use_block_rsv (+8/-1) > > Zheng Yan (1) commits (+4/-1): > btrfs: check file extent backref offset underflow > > Lukas Czerner (1) commits (+5/-1): > btrfs: return EINVAL if start > total_bytes in fitrim ioctl > > Daniel J Blueman (1) commits (+6/-2): > btrfs: fix oops on failure path > > Liu Bo (1) commits (+1/-1): > Btrfs: do not set EXTENT_DIRTY along with EXTENT_DELALLOC > > Diego Calleja (1) commits (+1/-3): > btrfs: fix memory leak in btrfs_defrag_file > > Jeff Liu (1) commits (+8/-2): > btrfs: trivial fix, a potential memory leak in btrfs_parse_early_options() > > Miao Xie (1) commits (+7/-0): > Btrfs: fix race between multi-task space allocation and caching space > > Tsutomu Itoh (1) commits (+7/-10): > Btrfs: fix return value of btrfs_get_acl() > > Total: (113) commits > > fs/btrfs/Makefile | 3 +- > fs/btrfs/acl.c | 17 +- > fs/btrfs/backref.c | 776 +++++++++++++++++++++++++++++++++++ > fs/btrfs/backref.h | 62 +++ > fs/btrfs/btrfs_inode.h | 17 +- > fs/btrfs/compression.c | 3 +- > fs/btrfs/ctree.c | 10 +- > fs/btrfs/ctree.h | 198 ++++++++-- > fs/btrfs/delayed-inode.c | 50 ++- > fs/btrfs/disk-io.c | 434 +++++++++++++++++--- > fs/btrfs/disk-io.h | 4 +- > fs/btrfs/extent-tree.c | 848 +++++++++++++++++++++++--------------- > fs/btrfs/extent_io.c | 614 +++++++++++++++++++++++++++- > fs/btrfs/extent_io.h | 23 +- > fs/btrfs/file-item.c | 17 +- > fs/btrfs/file.c | 25 +- > fs/btrfs/free-space-cache.c | 926 +++++++++++++++++++++++++---------------- > fs/btrfs/inode-map.c | 6 +- > fs/btrfs/inode.c | 457 +++++++-------------- > fs/btrfs/ioctl.c | 227 +++++++++-- > fs/btrfs/ioctl.h | 29 ++ > fs/btrfs/print-tree.c | 8 +- > fs/btrfs/reada.c | 951 +++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/relocation.c | 24 +- > fs/btrfs/scrub.c | 591 ++++++++++++++++++++++----- > fs/btrfs/super.c | 298 +++++++++----- > fs/btrfs/transaction.c | 146 +++---- > fs/btrfs/tree-log.c | 19 +- > fs/btrfs/volumes.c | 207 ++++++---- > fs/btrfs/volumes.h | 18 +- > fs/btrfs/xattr.c | 11 + > 31 files changed, 5416 insertions(+), 1603 deletions(-) > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On Sun, Nov 06, 2011 at 01:38:51PM -0500, Chris Mason wrote:> Hi everyone, > > This pull request is pretty beefy, it ended up merging a number of long > running projects and cleanup queues. I''ve got btrfs patches in the new > kernel.org btrfs repo. There are two different branches with the same > changes. for-linus is against 3.1 and has also been tested against > Linus'' tree as of yesterday. > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linusWell, I had one more commit in my working directory that I neglected to run git commit on. It fixes an oops during log replay. The new head of my for-linus branch is: commit 7c7e82a77fe3d89ae50824aa7c897454675eb4c4 Author: Chris Mason <chris.mason@oracle.com> Date: Sun Nov 6 18:50:56 2011 -0500 Btrfs: check for a null fs root when writing to the backup root log If you''re pulling just that one commit it''ll have this diffstat: disk-io.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) The diffstat for the whole bunch is below. Sorry for the noise. -chris Total: (114) commits fs/btrfs/Makefile | 3 +- fs/btrfs/acl.c | 17 +- fs/btrfs/backref.c | 776 +++++++++++++++++++++++++++++++++++ fs/btrfs/backref.h | 62 +++ fs/btrfs/btrfs_inode.h | 17 +- fs/btrfs/compression.c | 3 +- fs/btrfs/ctree.c | 10 +- fs/btrfs/ctree.h | 198 ++++++++-- fs/btrfs/delayed-inode.c | 50 ++- fs/btrfs/disk-io.c | 441 +++++++++++++++++--- fs/btrfs/disk-io.h | 4 +- fs/btrfs/extent-tree.c | 848 +++++++++++++++++++++++--------------- fs/btrfs/extent_io.c | 614 +++++++++++++++++++++++++++- fs/btrfs/extent_io.h | 23 +- fs/btrfs/file-item.c | 17 +- fs/btrfs/file.c | 25 +- fs/btrfs/free-space-cache.c | 926 +++++++++++++++++++++++++---------------- fs/btrfs/inode-map.c | 6 +- fs/btrfs/inode.c | 457 +++++++-------------- fs/btrfs/ioctl.c | 227 +++++++++-- fs/btrfs/ioctl.h | 29 ++ fs/btrfs/print-tree.c | 8 +- fs/btrfs/reada.c | 951 +++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/relocation.c | 24 +- fs/btrfs/scrub.c | 591 ++++++++++++++++++++++----- fs/btrfs/super.c | 298 +++++++++----- fs/btrfs/transaction.c | 146 +++---- fs/btrfs/tree-log.c | 19 +- fs/btrfs/volumes.c | 207 ++++++---- fs/btrfs/volumes.h | 18 +- fs/btrfs/xattr.c | 11 + 31 files changed, 5423 insertions(+), 1603 deletions(-)
On 06.11.2011 20:29, Andrea Gelmini wrote:> 2011/11/6 Chris Mason <chris.mason@oracle.com>: > > Hi Chris, > and thanks a lot for your work. > >> Arne Jansen and Jan Schmidt have improved the scrubber and provided >> utilities to walk btrfs'' many backrefs. The scrubber is much faster >> thanks to extensive btree readahead and instead of just telling you a >> specific block is bad, it tells you which btree or which file was >> impacted by that bad block. > > Using your for-linus branch, on latest Linus'' git tree, with latest > git tools, > I''ve got this: > > root@Q45:/home/gelma/dev/prg/btrfs# ./btrfs scrub start -Br /dev/md126 > ERROR: scrubbing /dev/md126 failed for device id 1 (Cannot allocate memory) > scrub canceled for 11827b37-1ba0-4b3e-883d-2746987724ca > scrub started at Sun Nov 6 20:23:46 2011 and was aborted after 0 seconds > total bytes scrubbed: 0.00 with 0 errors > root@Q45:/home/gelma/dev/prg/btrfs# ./btrfs scrub start -Br /home/ > ERROR: scrubbing /home/ failed for device id 1 (Cannot allocate memory) > scrub canceled for 11827b37-1ba0-4b3e-883d-2746987724ca > scrub started at Sun Nov 6 20:25:01 2011 and was aborted after 0 seconds > total bytes scrubbed: 0.00 with 0 errorsOn what platform are you running this? Can you please try this after a fresh boot? Maybe there''s an allocation that can''t be served with a badly fragmented memory. Thanks, Arne> > Thanks a lot for your time, > Andrea >>
On Mon, 07 Nov 2011 10:37:16 +0100 Arne Jansen <sensille@gmx.net> wrote:> > I''ve got this: > > > > root@Q45:/home/gelma/dev/prg/btrfs# ./btrfs scrub start -Br /dev/md126 > > ERROR: scrubbing /dev/md126 failed for device id 1 (Cannot allocate memory) > > scrub canceled for 11827b37-1ba0-4b3e-883d-2746987724ca > > scrub started at Sun Nov 6 20:23:46 2011 and was aborted after 0 seconds > > total bytes scrubbed: 0.00 with 0 errors > > root@Q45:/home/gelma/dev/prg/btrfs# ./btrfs scrub start -Br /home/ > > ERROR: scrubbing /home/ failed for device id 1 (Cannot allocate memory) > > scrub canceled for 11827b37-1ba0-4b3e-883d-2746987724ca > > scrub started at Sun Nov 6 20:25:01 2011 and was aborted after 0 seconds > > total bytes scrubbed: 0.00 with 0 errors > > On what platform are you running this? Can you please try this after > a fresh boot? Maybe there''s an allocation that can''t be served with > a badly fragmented memory.If so, shouldn''t there also be a corresponding dmesg warning about "Unable to allocate....", which would confirm or rule this out? So before following the "did you try turning it off and on again" advice (and throwing away useful debug info), I''d suggest checking/saving dmesg first. -- With respect, Roman ~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free."
On 07.11.2011 10:49, Roman Mamedov wrote:> On Mon, 07 Nov 2011 10:37:16 +0100 > Arne Jansen <sensille@gmx.net> wrote: > >>> I''ve got this: >>> >>> root@Q45:/home/gelma/dev/prg/btrfs# ./btrfs scrub start -Br /dev/md126 >>> ERROR: scrubbing /dev/md126 failed for device id 1 (Cannot allocate memory) >>> scrub canceled for 11827b37-1ba0-4b3e-883d-2746987724ca >>> scrub started at Sun Nov 6 20:23:46 2011 and was aborted after 0 seconds >>> total bytes scrubbed: 0.00 with 0 errors >>> root@Q45:/home/gelma/dev/prg/btrfs# ./btrfs scrub start -Br /home/ >>> ERROR: scrubbing /home/ failed for device id 1 (Cannot allocate memory) >>> scrub canceled for 11827b37-1ba0-4b3e-883d-2746987724ca >>> scrub started at Sun Nov 6 20:25:01 2011 and was aborted after 0 seconds >>> total bytes scrubbed: 0.00 with 0 errors >> >> On what platform are you running this? Can you please try this after >> a fresh boot? Maybe there''s an allocation that can''t be served with >> a badly fragmented memory. > > If so, shouldn''t there also be a corresponding dmesg warning about "Unable to allocate....", which would confirm or rule this out? > So before following the "did you try turning it off and on again" advice (and throwing away useful debug info), I''d suggest checking/saving dmesg first. >You''re right of course. The advice was not meant as a fix, but as a means to gather more information.
2011/11/7 Arne Jansen <sensille@gmx.net>:> On what platform are you running this? Can you please try this after > a fresh boot? Maybe there''s an allocation that can''t be served with > a badly fragmented memory.Hi Arne, and thanks a lot for your reply. So: a) it''s a fresh Ubuntu 11.04, with latest Linus git tree (31555213f03bca37d2c02e10946296052f4ecfcd), and latest btrfs progs (13eced9a0c2b6bd6bc38e6f0f46a1977b1167e67), plus Chris btrfs branch for-linus; b) same problem after a fresh boot. Thanks, Andrea
Hi Andrea, On 07.11.2011 13:42, Andrea Gelmini wrote:> 2011/11/7 Arne Jansen <sensille@gmx.net>: >> On what platform are you running this? Can you please try this after >> a fresh boot? Maybe there''s an allocation that can''t be served with >> a badly fragmented memory. > > Hi Arne, > and thanks a lot for your reply. > So: > a) it''s a fresh Ubuntu 11.04, with latest Linus git tree > (31555213f03bca37d2c02e10946296052f4ecfcd), and latest btrfs progs > (13eced9a0c2b6bd6bc38e6f0f46a1977b1167e67), plus Chris btrfs branch > for-linus; > b) same problem after a fresh boot.is it 32 or 64 bit? Thanks, Arne> > Thanks, > Andrea
2011/11/7 Roman Mamedov <rm@romanrm.ru>:> If so, shouldn''t there also be a corresponding dmesg warning about "Unable to allocate....", which would confirm or rule this out? > So before following the "did you try turning it off and on again" advice (and throwing away useful debug info), I''d suggest checking/saving dmesg first.Hi Roman, and thanks a lot for your reply. I did check dmesg and /var/log also, but I didn''t find complain about it. So, I tried with a loopback device and it works well. Maybe the problem is about my device stack? Bottom up my home is on a: md + luks + lvm + md + btrfs. Thanks a lot for your time, Andrea
2011/11/7 Arne Jansen <sensille@gmx.net>:> is it 32 or 64 bit?64bit. Please take a look at my other reply. Thanks a lot for your time, Andrea
On 07.11.2011 13:50, Andrea Gelmini wrote:> 2011/11/7 Arne Jansen <sensille@gmx.net>: >> is it 32 or 64 bit? > > 64bit. > Please take a look at my other reply.Can you please have a look with strace to make sure it''s really the ioctl the ENOMEM originates froM? Thanks, Arne> > Thanks a lot for your time, > Andrea-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Nov 6, 2011 at 1:38 PM, Chris Mason <chris.mason@oracle.com> wrote:> Hi everyone, > > This pull request is pretty beefy, it ended up merging a number of long > running projects and cleanup queues. I''ve got btrfs patches in the new > kernel.org btrfs repo. There are two different branches with the same > changes. for-linus is against 3.1 and has also been tested against > Linus'' tree as of yesterday.[91795.123286] device label ROOT devid 1 transid 3331 /dev/sdi2 [91795.123538] btrfs: open_ctree failed FS created on 3.1 (x64), mounted once on 3.2-rc1 (i386), got that whenI tried to mount on 3.1 (x64) again. Format change in 3.2 or 32/64 bit compatibility issues?
On Tue, Nov 08, 2011 at 12:55:40PM -0500, Dan Merillat wrote:> On Sun, Nov 6, 2011 at 1:38 PM, Chris Mason <chris.mason@oracle.com> wrote: > > Hi everyone, > > > > This pull request is pretty beefy, it ended up merging a number of long > > running projects and cleanup queues. I''ve got btrfs patches in the new > > kernel.org btrfs repo. There are two different branches with the same > > changes. for-linus is against 3.1 and has also been tested against > > Linus'' tree as of yesterday. > > [91795.123286] device label ROOT devid 1 transid 3331 /dev/sdi2 > [91795.123538] btrfs: open_ctree failed > > FS created on 3.1 (x64), mounted once on 3.2-rc1 (i386), got that > whenI tried to mount on 3.1 (x64) again. Format change in 3.2 or > 32/64 bit compatibility issues?I''m trying to reproduce right now but I did many bounces between 3.2 and 3.1 code before releasing. I didn''t try jumping between 32 and 64 bit. Are there any other messages in dmesg? Could you please see what btrfs-debug-tree says? -chris
On Tue, Nov 08, 2011 at 01:27:28PM -0500, Chris Mason wrote:> On Tue, Nov 08, 2011 at 12:55:40PM -0500, Dan Merillat wrote: > > On Sun, Nov 6, 2011 at 1:38 PM, Chris Mason <chris.mason@oracle.com> wrote: > > > Hi everyone, > > > > > > This pull request is pretty beefy, it ended up merging a number of long > > > running projects and cleanup queues. I''ve got btrfs patches in the new > > > kernel.org btrfs repo. There are two different branches with the same > > > changes. for-linus is against 3.1 and has also been tested against > > > Linus'' tree as of yesterday. > > > > [91795.123286] device label ROOT devid 1 transid 3331 /dev/sdi2 > > [91795.123538] btrfs: open_ctree failed > > > > FS created on 3.1 (x64), mounted once on 3.2-rc1 (i386), got that > > whenI tried to mount on 3.1 (x64) again. Format change in 3.2 or > > 32/64 bit compatibility issues? > > I''m trying to reproduce right now but I did many bounces between 3.2 and > 3.1 code before releasing. I didn''t try jumping between 32 and 64 bit. > > Are there any other messages in dmesg? Could you please see what > btrfs-debug-tree says?Ok, so I spun the wheel going between 32 and 64 and 3.1 and 3.2. I''m not having trouble with basic tests. So, we''ll have to dig in and see why the open is failing. btrfsck or btrfs-debug-tree will help. -chris
On Tue, Nov 08, 2011 at 08:47:56AM +0100, Arne Jansen wrote:> On 07.11.2011 13:50, Andrea Gelmini wrote: > > 2011/11/7 Arne Jansen <sensille@gmx.net>: > >> is it 32 or 64 bit? > > > > 64bit. > > Please take a look at my other reply. > > Can you please have a look with strace to make sure it''s > really the ioctl the ENOMEM originates froM?Looks like bio_add_page() is failing and we''re getting the enomem from there. LVM is only letting us put one page in each bio. -chris
On Tue, Nov 8, 2011 at 3:17 PM, Chris Mason <chris.mason@oracle.com> wrote:> On Tue, Nov 08, 2011 at 01:27:28PM -0500, Chris Mason wrote: >> On Tue, Nov 08, 2011 at 12:55:40PM -0500, Dan Merillat wrote: >> > On Sun, Nov 6, 2011 at 1:38 PM, Chris Mason <chris.mason@oracle.com> wrote: >> > > Hi everyone, >> > > >> > > This pull request is pretty beefy, it ended up merging a number of long >> > > running projects and cleanup queues. I''ve got btrfs patches in the new >> > > kernel.org btrfs repo. There are two different branches with the same >> > > changes. for-linus is against 3.1 and has also been tested against >> > > Linus'' tree as of yesterday. >> > >> > [91795.123286] device label ROOT devid 1 transid 3331 /dev/sdi2 >> > [91795.123538] btrfs: open_ctree failed >> > >> > FS created on 3.1 (x64), mounted once on 3.2-rc1 (i386), got that >> > whenI tried to mount on 3.1 (x64) again. Format change in 3.2 or >> > 32/64 bit compatibility issues? >> >> I''m trying to reproduce right now but I did many bounces between 3.2 and >> 3.1 code before releasing. I didn''t try jumping between 32 and 64 bit. >> >> Are there any other messages in dmesg? Could you please see what >> btrfs-debug-tree says? > > Ok, so I spun the wheel going between 32 and 64 and 3.1 and 3.2. I''m > not having trouble with basic tests. > > So, we''ll have to dig in and see why the open is failing. btrfsck or > btrfs-debug-tree will help.This is on a USB device, however I had used the filesystem quite a bit on the 64bit machine before moving it to the 32bit 3.2 box. It''s still mountable on the 32bit box even when I get the open_ctree failed on 3.1 [140865.425067] device label ROOT devid 1 transid 3436 /dev/sdi2 [140865.426291] btrfs: open_ctree failed harik@fileserver:~/src/3.0/3.2-rc1$ sudo btrfsck /dev/sdi2 [sudo] password for harik: found 3105894400 bytes used err is 0 total csum bytes: 2916272 total tree bytes: 119631872 total fs tree bytes: 109928448 btree space waste bytes: 33045213 file data blocks allocated: 5391962112 referenced 2984988672 Btrfs Btrfs v0.19 http://dl.dropbox.com/u/1071112/btrfs-debug-tree.sdi2.bz2 Exact kernel that won''t mount is linus 3.1 + Author: David Sterba <dsterba@suse.cz> Date: Wed Aug 3 11:08:02 2011 -0700 btrfs: allow cross-subvolume file clone Author: Li Zefan <lizf@cn.fujitsu.com> Date: Fri Sep 2 15:56:25 2011 +0800 Btrfs: fix defragmentation regression -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 08, 2011 at 08:07:01PM -0500, Chris Mason wrote:> Looks like bio_add_page() is failing and we''re getting the enomem from > there. LVM is only letting us put one page in each bio.Yes, at the moment all bio based DM targets only allow single page I/O.
On 09.11.2011 08:48, Christoph Hellwig wrote:> On Tue, Nov 08, 2011 at 08:07:01PM -0500, Chris Mason wrote: >> Looks like bio_add_page() is failing and we''re getting the enomem from >> there. LVM is only letting us put one page in each bio. > > Yes, at the moment all bio based DM targets only allow single page I/O.That''s... unexpected. I guess this won''t change with 3.3? For 3.3 I have to rework that part from scrub to account for Chris'' bigblocks. If it can wait that long I''d prefer to fix both at once. Chris? -Arne
On Wed, Nov 09, 2011 at 10:06:55AM +0100, Arne Jansen wrote:> That''s... unexpected. I guess this won''t change with 3.3? For 3.3 I have > to rework that part from scrub to account for Chris'' bigblocks. If it can > wait that long I''d prefer to fix both at once. Chris?device mapper has always been like that. It makes splitting requests over multiple targets a lot easier for them. With increasin I/O sizes that is something which will have to be fixed sooner or later, though.
Am 09.11.2011 08:48, schrieb Christoph Hellwig:> On Tue, Nov 08, 2011 at 08:07:01PM -0500, Chris Mason wrote: >> Looks like bio_add_page() is failing and we''re getting the enomem from >> there. LVM is only letting us put one page in each bio. > > Yes, at the moment all bio based DM targets only allow single page I/O.Wait. If I got that correctly, each bio_add_page needs special ENOMEM treatment (assuming the target could always be a device mapper target), right? "grep bio_add_page fs/btrfs/*.c" will make you unhappy. -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 09, 2011 at 11:29:31AM +0100, Jan Schmidt wrote:> Am 09.11.2011 08:48, schrieb Christoph Hellwig: > >On Tue, Nov 08, 2011 at 08:07:01PM -0500, Chris Mason wrote: > >>Looks like bio_add_page() is failing and we''re getting the enomem from > >>there. LVM is only letting us put one page in each bio. > > > >Yes, at the moment all bio based DM targets only allow single page I/O. > > Wait. If I got that correctly, each bio_add_page needs special > ENOMEM treatment (assuming the target could always be a device > mapper target), right?Each bio_add_page caller needs to expect it can''t add more than a page worth of data. If you look at callers what write large amounts of data (XFS, mpage code) you''ll always see a pattern of: bio = bio_alloc(); while (bytes_left) { len = min(bytes_left, PAGE_SIZE); bytes = bio_add_page(io, ..., len); if (bytes < len) { submit_bio(bio); bio = bio_alloc(); } update_indices(); } submit_bio(bio);
On 09.11.2011 11:29, Jan Schmidt wrote:> "grep bio_add_page fs/btrfs/*.c" will make you unhappy.Phew. Actually, not that unhappy. Many of the lines seeming to have no return value check are in fact part of a long or-ed if-statement. Seems okay after a closer look. -Jan
On Wed, Nov 09, 2011 at 10:06:55AM +0100, Arne Jansen wrote:> On 09.11.2011 08:48, Christoph Hellwig wrote: > > On Tue, Nov 08, 2011 at 08:07:01PM -0500, Chris Mason wrote: > >> Looks like bio_add_page() is failing and we''re getting the enomem from > >> there. LVM is only letting us put one page in each bio. > > > > Yes, at the moment all bio based DM targets only allow single page I/O. > > That''s... unexpected. I guess this won''t change with 3.3? For 3.3 I have > to rework that part from scrub to account for Chris'' bigblocks. If it can > wait that long I''d prefer to fix both at once. Chris?For 3.2, lets just do one page per bio and crank up the bios per device to correspond. -chris
On Wed, Nov 09, 2011 at 11:29:31AM +0100, Jan Schmidt wrote:> Am 09.11.2011 08:48, schrieb Christoph Hellwig: > >On Tue, Nov 08, 2011 at 08:07:01PM -0500, Chris Mason wrote: > >>Looks like bio_add_page() is failing and we''re getting the enomem from > >>there. LVM is only letting us put one page in each bio. > > > >Yes, at the moment all bio based DM targets only allow single page I/O. > > Wait. If I got that correctly, each bio_add_page needs special > ENOMEM treatment (assuming the target could always be a device > mapper target), right? > > "grep bio_add_page fs/btrfs/*.c" will make you unhappy.You can always add a single page into a bio. We do need to deal better with mixed devices where some have a low limit and some a high limit. -chris