clinew@linux.vnet.ibm.com
2012-Dec-18 07:13 UTC
[PATCH] [RFC v2] Btrfs: Subpagesize blocksize (WIP).
From: Wade Cline <clinew@linux.vnet.ibm.com> v1 -> v2: - Added Signed-off-by tag (it''s kind of important). This patch is only an RFC. My internship is ending and I was hoping to get some feedback and incorporate any suggestions people may have before my internship ends along with life as we know it (this Friday). The filesystem should mount/umount properly but tends towards the explosive side when writes start happening. My current focus is on checksumming issues and also an error when releasing extent buffers when creating a large file with ''dd''... and probably any other method. There''s still a significant amount of work that needs to be done before this should be incorporated into mainline. A couple of notes: - Based off of Josef''s btrfs-next branch, commit 8d089a86e45b34d7bc534d955e9d8543609f7e42 - C99-style comments are "meta-comments" where I''d like more feedback; they aren''t permanent but make ''checkpatch'' moan. - extent_buffer allocation and freeing need their code paths merged; they''re currently in separate functions and are both very ugly. - The patch itself will eventually need to be broken down into smaller pieces if at all possible... Signed-off-by: Wade Cline <clinew@linux.vnet.ibm.com> --- fs/btrfs/ctree.h | 11 +- fs/btrfs/disk-io.c | 110 +++++++-- fs/btrfs/extent_io.c | 632 ++++++++++++++++++++++++++++++++++++++----- fs/btrfs/extent_io.h | 7 + fs/btrfs/file.c | 9 +- fs/btrfs/free-space-cache.c | 2 + fs/btrfs/inode.c | 38 ++- fs/btrfs/ioctl.c | 4 +- 8 files changed, 709 insertions(+), 104 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index fbaaf20..c786a58 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1938,14 +1938,19 @@ static inline void btrfs_set_token_##name(struct extent_buffer *eb, \ #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits) \ static inline u##bits btrfs_##name(struct extent_buffer *eb) \ { \ - type *p = page_address(eb->pages[0]); \ - u##bits res = le##bits##_to_cpu(p->member); \ + type *p; \ + u##bits res; \ + \ + p = page_address(eb->pages[0]) + (eb->start & (PAGE_SIZE - 1)); \ + res = le##bits##_to_cpu(p->member); \ return res; \ } \ static inline void btrfs_set_##name(struct extent_buffer *eb, \ u##bits val) \ { \ - type *p = page_address(eb->pages[0]); \ + type *p; \ + \ + p = page_address(eb->pages[0]) + (eb->start & (PAGE_SIZE - 1)); \ p->member = cpu_to_le##bits(val); \ } diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f633af8..00b80b7 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -373,6 +373,24 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, WAIT_COMPLETE, btree_get_extent, mirror_num); if (!ret) { + /* + * I think that this is bad and should be moved + * into btree_readpage_end_io_hook(), but that + * it should apply to a single block at a time. + * That may be difficult and would make the + * function name a misnomer, but mostly I hate + * the silly goto. + */ + if (eb->len < PAGE_SIZE && + !extent_buffer_uptodate(eb)) { + if (csum_tree_block(root, eb, 1)) { + ret = -EIO; + goto bad; + } else { + set_extent_buffer_uptodate(eb); + } + } + if (!verify_parent_transid(io_tree, eb, parent_transid, 0)) break; @@ -385,6 +403,7 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, * there is no reason to read the other copies, they won''t be * any less wrong. */ +bad: if (test_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags)) break; @@ -416,29 +435,55 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, * checksum a dirty tree block before IO. This has extra checks to make sure * we only fill in the checksum field in the first page of a multi-page block */ - -static int csum_dirty_buffer(struct btrfs_root *root, struct page *page) +static int csum_dirty_buffer(struct btrfs_root *root, struct page *page, + unsigned int offset, unsigned int len) { - struct extent_io_tree *tree; u64 start = (u64)page->index << PAGE_CACHE_SHIFT; u64 found_start; struct extent_buffer *eb; - tree = &BTRFS_I(page->mapping->host)->io_tree; + if (!PageUptodate(page)) { + WARN_ON(1); + return 0; + } eb = (struct extent_buffer *)page->private; - if (page != eb->pages[0]) - return 0; + if (eb->len >= PAGE_SIZE) { + if (eb->pages[0] != page) + return 0; + } else { + start += offset; + while (eb->start != start) { + eb = eb->next; + BUG_ON(!eb); + } +next: + if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) + WARN_ON(1); + if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) + WARN_ON(1); + if (eb->pages[0] != page) + WARN_ON(1); + } + found_start = btrfs_header_bytenr(eb); if (found_start != start) { WARN_ON(1); return 0; } - if (!PageUptodate(page)) { - WARN_ON(1); - return 0; - } + csum_tree_block(root, eb, 0); + + if (eb->len < PAGE_SIZE) { + len -= eb->len; + BUG_ON(len & (eb->len - 1)); + if (len) { + start += eb->len; + eb = eb->next; + goto next; + } + } + return 0; } @@ -579,6 +624,19 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, tree = &BTRFS_I(page->mapping->host)->io_tree; eb = (struct extent_buffer *)page->private; + if (eb->len < PAGE_SIZE) { + /* Find the eb that tried to submit a read request. This is + * a little bit funky. */ + do { + if (!atomic_read(&eb->io_pages)) + continue; + if (test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags) || + test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) + continue; + break; + } while ((eb = eb->next)); + BUG_ON(!eb); + } /* the pending IO might have been the only thing that kept this buffer * in memory. Make sure we have a ref for all this other checks @@ -615,8 +673,11 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb), eb, found_level); - ret = csum_tree_block(root, eb, 1); - if (ret) { + /* + * Subpagesize blocksize checksumming is currently done in + * btree_read_extent_buffer_pages(). + */ + if (eb->len >= PAGE_SIZE && csum_tree_block(root, eb, 1)) { ret = -EIO; goto err; } @@ -631,8 +692,15 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, ret = -EIO; } - if (!ret) + /* + * For subpagesize blocksize, only the page needs to be set + * up-to-date; each extent_buffer is set up-to-date when it is + * checksummed. + */ + if (eb->len >= PAGE_SIZE) set_extent_buffer_uptodate(eb); + else + SetPageUptodate(eb->pages[0]); err: if (test_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags)) { clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags); @@ -828,7 +896,8 @@ static int btree_csum_one_bio(struct bio *bio) WARN_ON(bio->bi_vcnt <= 0); while (bio_index < bio->bi_vcnt) { root = BTRFS_I(bvec->bv_page->mapping->host)->root; - ret = csum_dirty_buffer(root, bvec->bv_page); + ret = csum_dirty_buffer(root, bvec->bv_page, bvec->bv_offset, + bvec->bv_len); if (ret) break; bio_index++; @@ -1007,9 +1076,13 @@ static int btree_set_page_dirty(struct page *page) BUG_ON(!PagePrivate(page)); eb = (struct extent_buffer *)page->private; BUG_ON(!eb); - BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)); - BUG_ON(!atomic_read(&eb->refs)); - btrfs_assert_tree_locked(eb); + /* There doesn''t seem to be a method for passing the correct eb + * to this function, so no sanity checks for subpagesize blocksize. */ + if (eb->len >= PAGE_SIZE) { + BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)); + BUG_ON(!atomic_read(&eb->refs)); + btrfs_assert_tree_locked(eb); + } #endif return __set_page_dirty_nobuffers(page); } @@ -2400,11 +2473,14 @@ int open_ctree(struct super_block *sb, goto fail_sb_buffer; } +#if 0 + // Hmm. How to deal wth this for subpagesize blocksize? if (sectorsize != PAGE_SIZE) { printk(KERN_WARNING "btrfs: Incompatible sector size(%lu) " "found on %s\n", (unsigned long)sectorsize, sb->s_id); goto fail_sb_buffer; } +#endif mutex_lock(&fs_info->chunk_mutex); ret = btrfs_read_sys_array(tree_root); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1b319df..c1e052e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2519,7 +2519,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, int contig = 0; int this_compressed = bio_flags & EXTENT_BIO_COMPRESSED; int old_compressed = prev_bio_flags & EXTENT_BIO_COMPRESSED; - size_t page_size = min_t(size_t, size, PAGE_CACHE_SIZE); + size_t bio_size = min_t(size_t, size, PAGE_CACHE_SIZE); if (bio_ret && *bio_ret) { bio = *bio_ret; @@ -2530,8 +2530,8 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, sector; if (prev_bio_flags != bio_flags || !contig || - merge_bio(tree, page, offset, page_size, bio, bio_flags) || - bio_add_page(bio, page, page_size, offset) < page_size) { + merge_bio(tree, page, offset, bio_size, bio, bio_flags) || + bio_add_page(bio, page, bio_size, offset) < bio_size) { ret = submit_one_bio(rw, bio, mirror_num, prev_bio_flags); if (ret < 0) @@ -2550,7 +2550,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, if (!bio) return -ENOMEM; - bio_add_page(bio, page, page_size, offset); + bio_add_page(bio, page, bio_size, offset); bio->bi_end_io = end_io_func; bio->bi_private = tree; @@ -3168,14 +3168,28 @@ static void end_bio_extent_buffer_writepage(struct bio *bio, int err) int uptodate = err == 0; struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1; struct extent_buffer *eb; + unsigned int offset; + unsigned int bv_len; + u64 start; int done; do { struct page *page = bvec->bv_page; + offset = bvec->bv_offset; + bv_len = bvec->bv_len; + start = ((u64)page->index << PAGE_CACHE_SHIFT) + offset; bvec--; eb = (struct extent_buffer *)page->private; BUG_ON(!eb); + if (eb->len < PAGE_SIZE) { + while (eb->start != start) { + eb = eb->next; + BUG_ON(!eb); + } + } + +next_eb: done = atomic_dec_and_test(&eb->io_pages); if (!uptodate || test_bit(EXTENT_BUFFER_IOERR, &eb->bflags)) { @@ -3184,12 +3198,50 @@ static void end_bio_extent_buffer_writepage(struct bio *bio, int err) SetPageError(page); } - end_page_writeback(page); + if (eb->len >= PAGE_SIZE) { + end_page_writeback(page); - if (!done) - continue; + if (!done) + continue; - end_extent_buffer_writeback(eb); + end_extent_buffer_writeback(eb); + } else { + /* Sanity checks. */ + if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) + WARN_ON(1); + + /* Ensure I/O page count is zero. */ + if (!done) + WARN_ON(1); + + /* Clear the extent buffer''s writeback flag. */ + end_extent_buffer_writeback(eb); + + /* + * See if any other extent buffers exists within the + * page. + */ + bv_len -= eb->len; + BUG_ON(bv_len & (eb->len - 1)); + if (bv_len) { + eb = eb->next; + goto next_eb; + } + + /* Clear the page writeback flag. */ + eb = (struct extent_buffer *)page->private; + BUG_ON(!eb); /* Can this even happen? */ + do { + if (!eb) { + end_page_writeback(page); + break; + } + if (test_bit(EXTENT_BUFFER_WRITEBACK, + &eb->bflags)) + break; + eb = eb->next; + } while (1); + } } while (bvec >= bio->bi_io_vec); bio_put(bio); @@ -3202,7 +3254,8 @@ static int write_one_eb(struct extent_buffer *eb, struct extent_page_data *epd) { struct block_device *bdev = fs_info->fs_devices->latest_bdev; - u64 offset = eb->start; + u64 start = eb->start; + unsigned long offset = eb->start & (PAGE_CACHE_SIZE - 1); unsigned long i, num_pages; unsigned long bio_flags = 0; int rw = (epd->sync_io ? WRITE_SYNC : WRITE); @@ -3219,10 +3272,10 @@ static int write_one_eb(struct extent_buffer *eb, clear_page_dirty_for_io(p); set_page_writeback(p); - ret = submit_extent_page(rw, eb->tree, p, offset >> 9, - PAGE_CACHE_SIZE, 0, bdev, &epd->bio, - -1, end_bio_extent_buffer_writepage, - 0, epd->bio_flags, bio_flags); + ret = submit_extent_page(rw, eb->tree, p, start >> 9, eb->len, + offset, bdev, &epd->bio, -1, + end_bio_extent_buffer_writepage, 0, + epd->bio_flags, bio_flags); epd->bio_flags = bio_flags; if (ret) { set_bit(EXTENT_BUFFER_IOERR, &eb->bflags); @@ -3232,7 +3285,7 @@ static int write_one_eb(struct extent_buffer *eb, ret = -EIO; break; } - offset += PAGE_CACHE_SIZE; + start += PAGE_CACHE_SIZE; update_nr_written(p, wbc, 1); unlock_page(p); } @@ -3252,7 +3305,7 @@ int btree_write_cache_pages(struct address_space *mapping, { struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree; struct btrfs_fs_info *fs_info = BTRFS_I(mapping->host)->root->fs_info; - struct extent_buffer *eb, *prev_eb = NULL; + struct extent_buffer *eb, *next, *prev_eb = NULL; struct extent_page_data epd = { .bio = NULL, .tree = tree, @@ -3326,17 +3379,41 @@ retry: spin_unlock(&mapping->private_lock); continue; } + prev_eb = eb; + +next_eb: + next = eb->next; ret = atomic_inc_not_zero(&eb->refs); - spin_unlock(&mapping->private_lock); - if (!ret) - continue; + if (eb->len >= PAGE_SIZE) { + spin_unlock(&mapping->private_lock); + if (!ret) + continue; + } else { + if (!ret) + goto inc_eb; + spin_unlock(&mapping->private_lock); + + if (!test_bit(EXTENT_BUFFER_DIRTY, + &eb->bflags)) { + spin_lock(&mapping->private_lock); + atomic_dec(&eb->refs); + ret = 0; + goto inc_eb; + } + } - prev_eb = eb; ret = lock_extent_buffer_for_io(eb, fs_info, &epd); if (!ret) { + if (!(eb->len >= PAGE_SIZE)) + spin_lock(&mapping->private_lock); + free_extent_buffer(eb); - continue; + + if (eb->len >= PAGE_SIZE) + continue; + else + goto inc_eb; } ret = write_one_eb(eb, fs_info, wbc, &epd); @@ -3345,8 +3422,26 @@ retry: free_extent_buffer(eb); break; } + + if (eb->len >= PAGE_SIZE) { + free_extent_buffer(eb); + goto written; + } + + if (next) + spin_lock(&mapping->private_lock); free_extent_buffer(eb); +inc_eb: + if (!next) { + if (spin_is_locked(&mapping->private_lock)) + spin_unlock(&mapping->private_lock); + goto written; + } + eb = next; + goto next_eb; + +written: /* * the filesystem may choose to bump up nr_to_write. * We have to make sure to honor the new nr_to_write @@ -4000,6 +4095,18 @@ static void __free_extent_buffer(struct extent_buffer *eb) kmem_cache_free(extent_buffer_cache, eb); } +/* Helper function to free extent buffers when there are multiple + * extent buffers per page. */ +static void __free_extent_buffers(struct extent_buffer *eb) +{ + struct extent_buffer *next; + + do { + next = eb->next; + __free_extent_buffer(eb); + } while ((eb = next)); +} + static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, u64 start, unsigned long len, @@ -4017,6 +4124,7 @@ static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, eb->len = len; eb->tree = tree; eb->bflags = 0; + eb->next = NULL; rwlock_init(&eb->lock); atomic_set(&eb->write_locks, 0); atomic_set(&eb->read_locks, 0); @@ -4054,6 +4162,62 @@ static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, return eb; } +/* Allocates an array of extent buffers for the specified page. + * Should be called with the mapping''s spin lock set. */ +static struct extent_buffer *__alloc_extent_buffers(struct extent_io_tree *tree, + struct page *page, + gfp_t mask) +{ + u32 blocksize_bits; + struct btrfs_inode *inode; + struct extent_buffer *eb_head; + struct extent_buffer *eb_cur; + u64 start; + unsigned long len; + int i; + + /* Initialize variables. */ + inode = BTRFS_I(tree->mapping->host); + blocksize_bits = inode->vfs_inode.i_sb->s_blocksize_bits; + + /* Calculate extent buffer dimensions. */ + start = page->index << PAGE_CACHE_SHIFT; + len = inode->root->leafsize; + + /* Allocate the head extent buffer. */ + eb_head = __alloc_extent_buffer(tree, start, len, GFP_NOFS); + if (!eb_head) { + WARN_ON(1); + return NULL; + } + start += len; + eb_head->pages[0] = page; + eb_cur = eb_head; + + /* Allocate the other extent buffers. */ + for (i = 1; i < (PAGE_CACHE_SIZE >> blocksize_bits); i++) { + eb_cur->next = __alloc_extent_buffer(tree, start, len, + GFP_NOFS); + if (!eb_cur->next) { + WARN_ON(1); + goto free_ebs; + } + start += len; + eb_cur = eb_cur->next; + eb_cur->pages[0] = page; + } + + /* Return the extent buffer head. */ + return eb_head; + +free_ebs: + /* Free each extent buffer. */ + // TODO: Implement. + pr_crit(KERN_CRIT "HACK: Need to implement this...\n"); + WARN_ON(1); + return NULL; +} + struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src) { unsigned long i; @@ -4170,12 +4334,121 @@ static void btrfs_release_extent_buffer_page(struct extent_buffer *eb, } /* + * Frees the page if all extent buffers belonging to the page are not + * referernced. The extent buffers themselves must be free afterwards, too... + * ret: 0 if the page did not need to be freed; 1 if the page was freed. + */ +static int btrfs_release_extent_buffers_page(struct extent_buffer *eb, + struct extent_buffer **eb_head) +{ + struct extent_buffer *eb_cur; + struct extent_buffer *eb_temp; + struct page *page; + int mapped = !test_bit(EXTENT_BUFFER_DUMMY, &eb->bflags); + int ret = 0; + + if (extent_buffer_under_io(eb)) + BUG_ON(1); + + // ...is this even possible? + if (!num_extent_pages(eb->start, eb->len)) { + WARN_ON(1); + return ret; + } + + page = extent_buffer_page(eb, 0); + if (page && mapped) { + spin_lock(&page->mapping->private_lock); + /* + * We do this since we''ll remove the pages after we''ve + * removed the eb from the radix tree, so we could race + * and have this page now attached to the new eb. So + * only clear page_private if it''s still connected to + * this eb. + */ + if (!PagePrivate(page)) { + spin_unlock(&page->mapping->private_lock); + } else { + /* Find the page eb corresponding to our eb. */ + eb_cur = (struct extent_buffer *)page->private; + while (eb_cur->start != eb->start) { + eb_cur = eb_cur->next; + BUG_ON(!eb_cur); + } + + /* See if a new eb has been attached to the page. */ + if (eb_cur != eb) { + spin_unlock(&page->mapping->private_lock); + ret = 1; + goto page_release; + } + + /* See if any other extent_buffer is using the page. */ + eb_cur = (struct extent_buffer *)page->private; + do { + /* Check for any other references on the eb. */ + spin_lock(&eb_cur->refs_lock); + if (!atomic_dec_and_test(&eb_cur->refs)) { + atomic_inc(&eb_cur->refs); + spin_unlock(&eb_cur->refs_lock); + eb_temp = eb_cur; + eb_cur = (struct extent_buffer *) + page->private; + while (eb_cur != eb_temp) { + atomic_inc(&eb_cur->refs); + eb_cur = eb_cur->next; + } + spin_unlock( + &page->mapping->private_lock); + goto page_release; + } + spin_unlock(&eb_cur->refs_lock); + } while ((eb_cur = eb_cur->next) != NULL); + + /* Sanity checks. */ + eb_cur = (struct extent_buffer *)page->private; + do { + BUG_ON(extent_buffer_under_io(eb_cur)); + } while ((eb_cur = eb_cur->next) != NULL); + BUG_ON(PageDirty(page)); + BUG_ON(PageWriteback(page)); + /* + * We need to make sure we haven''t been attached + * to a new eb. + */ + eb_cur = (struct extent_buffer *)page->private; + *eb_head = eb_cur; + eb_temp = NULL; + ClearPagePrivate(page); + set_page_private(page, 0); + /* One for the page private. */ + page_cache_release(page); + ret = 1; + spin_unlock(&page->mapping->private_lock); + } + } + +page_release: + if (page) { + /* One for when we alloced the page */ + page_cache_release(page); + } + return ret; +} + +/* * Helper for releasing the extent buffer. */ static inline void btrfs_release_extent_buffer(struct extent_buffer *eb) { - btrfs_release_extent_buffer_page(eb, 0); - __free_extent_buffer(eb); + if (eb->len >= PAGE_SIZE) { + btrfs_release_extent_buffer_page(eb, 0); + __free_extent_buffer(eb); + } else { + struct extent_buffer *eb_head; + if (btrfs_release_extent_buffers_page(eb, &eb_head)) + __free_extent_buffers(eb_head); + } } static void check_buffer_tree_ref(struct extent_buffer *eb) @@ -4222,16 +4495,153 @@ static void mark_extent_buffer_accessed(struct extent_buffer *eb) struct extent_buffer *alloc_extent_buffer(struct extent_io_tree *tree, u64 start, unsigned long len) { - unsigned long num_pages = num_extent_pages(start, len); - unsigned long i; - unsigned long index = start >> PAGE_CACHE_SHIFT; + /* Allocate a new extent_buffer depending on blocksize*/ + if (len < PAGE_CACHE_SIZE) + return alloc_extent_buffer_multiple(tree, start, len); + return alloc_extent_buffer_single(tree, start, len); +} + +struct extent_buffer *alloc_extent_buffer_multiple(struct extent_io_tree *tree, + u64 start, + unsigned long len) { + + struct address_space *mapping; + u32 blocksize_bits; + struct btrfs_inode *btrfs_inode; + struct extent_buffer *eb_cur; + struct extent_buffer *eb_head; + struct extent_buffer *exists; + unsigned long index; + struct page *page; + int ret; + + /* Initialize variables. */ + btrfs_inode = BTRFS_I(tree->mapping->host); + blocksize_bits = btrfs_inode->vfs_inode.i_sb->s_blocksize_bits; + + /* Sanity checks. */ + WARN_ON(num_extent_pages(start, len) > 1); + + /* See if the extent_buffer already exists in the radix tree. */ + rcu_read_lock(); + eb_cur = radix_tree_lookup(&tree->buffer, start >> blocksize_bits); + if (eb_cur && atomic_inc_not_zero(&eb_cur->refs)) { + rcu_read_unlock(); + mark_extent_buffer_accessed(eb_cur); + return eb_cur; + } + rcu_read_unlock(); + + /* Find the page in the mapping. */ + index = start >> PAGE_CACHE_SHIFT; + mapping = tree->mapping; + page = find_or_create_page(mapping, index, GFP_NOFS); + if (!page) { + WARN_ON(1); + return NULL; + } + + /* Allocate each extent buffer for the page. */ + eb_head = __alloc_extent_buffers(tree, page, GFP_NOFS); + if (!eb_head) { + WARN_ON(1); + return NULL; + } + + /* See if extent buffers have already been allocated for + * this page. */ + spin_lock(&mapping->private_lock); + if (PagePrivate(page)) { + /* + * We could have already allocated an eb for this page + * and attached one so lets see if we can get a ref on + * the existing eb, and if we can we know it''s good and + * we can just return that one, else we know we can just + * overwrite page->private. + */ + eb_cur = (struct extent_buffer *)page->private; + while (eb_cur->start != start) { + eb_cur = eb_cur->next; + BUG_ON(!eb_cur); + } + check_buffer_tree_ref(eb_cur); + spin_unlock(&mapping->private_lock); + unlock_page(page); + mark_extent_buffer_accessed(eb_cur); + __free_extent_buffers(eb_head); + return eb_cur; + } + + /* Bind the extent buffer to the page. */ + attach_extent_buffer_page(eb_head, page); + spin_unlock(&mapping->private_lock); + WARN_ON(PageDirty(page)); + mark_page_accessed(page); + +again: + /* Set eb_cur to the buffer added. */ + eb_cur = eb_head; + while (start != eb_cur->start) { + eb_cur = eb_cur->next; + BUG_ON(!eb_cur); + } + + /* Preload the radix tree. */ + ret = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM); + if (ret) { + WARN_ON(1); + return NULL; + } + + /* Add the extent buffer to the radix tree. */ + spin_lock(&tree->buffer_lock); + ret = radix_tree_insert(&tree->buffer, + eb_cur->start >> blocksize_bits, + eb_cur); + if (ret == -EEXIST) { + exists = radix_tree_lookup(&tree->buffer, + eb_cur->start >> blocksize_bits); + if (exists->start != start) + BUG_ON(1); + if (!atomic_inc_not_zero(&exists->refs)) { + spin_unlock(&tree->buffer_lock); + radix_tree_preload_end(); + exists = NULL; + goto again; + } + spin_unlock(&tree->buffer_lock); + radix_tree_preload_end(); + mark_extent_buffer_accessed(exists); + WARN_ON(!atomic_dec_and_test(&eb_cur->refs)); + btrfs_release_extent_buffer(eb_cur); + return exists; + } + + /* Set the extent buffer''s tree-reference bits. */ + check_buffer_tree_ref(eb_cur); + spin_unlock(&tree->buffer_lock); + radix_tree_preload_end(); + + /* Not quite sure what this does. */ + SetPageChecked(eb_head->pages[0]); + unlock_page(eb_head->pages[0]); + + return eb_cur; +} + +struct extent_buffer *alloc_extent_buffer_single(struct extent_io_tree *tree, + u64 start, unsigned long len) { + struct address_space *mapping = tree->mapping; struct extent_buffer *eb; struct extent_buffer *exists = NULL; + unsigned long i; + unsigned long index = start >> PAGE_CACHE_SHIFT; + unsigned long num_pages = num_extent_pages(start, len); struct page *p; - struct address_space *mapping = tree->mapping; int uptodate = 1; int ret; + /* See if the extent_buffer already exists */ rcu_read_lock(); eb = radix_tree_lookup(&tree->buffer, start >> PAGE_CACHE_SHIFT); if (eb && atomic_inc_not_zero(&eb->refs)) { @@ -4350,9 +4760,17 @@ struct extent_buffer *find_extent_buffer(struct extent_io_tree *tree, u64 start, unsigned long len) { struct extent_buffer *eb; + struct btrfs_inode *btrfs_inode = BTRFS_I(tree->mapping->host); + u32 blocksize_bits = btrfs_inode->vfs_inode.i_sb->s_blocksize_bits; rcu_read_lock(); - eb = radix_tree_lookup(&tree->buffer, start >> PAGE_CACHE_SHIFT); + // This branch needs to be fixed when the allocation code is merged. + // Seriously. + if (blocksize_bits >= PAGE_CACHE_SHIFT) + eb = radix_tree_lookup(&tree->buffer, + start >> PAGE_CACHE_SHIFT); + else + eb = radix_tree_lookup(&tree->buffer, start >> blocksize_bits); if (eb && atomic_inc_not_zero(&eb->refs)) { rcu_read_unlock(); mark_extent_buffer_accessed(eb); @@ -4371,9 +4789,25 @@ static inline void btrfs_release_extent_buffer_rcu(struct rcu_head *head) __free_extent_buffer(eb); } -/* Expects to have eb->eb_lock already held */ +/* + * The RCU head must point to the first extent buffer belonging to a page. + */ +static inline void btrfs_release_extent_buffers_rcu(struct rcu_head *head) +{ + struct extent_buffer *eb + container_of(head, struct extent_buffer, rcu_head); + + do { + call_rcu(&eb->rcu_head, btrfs_release_extent_buffer_rcu); + } while ((eb = eb->next)); +} + +/* Expects to have eb->refs_lock already held */ static int release_extent_buffer(struct extent_buffer *eb, gfp_t mask) { + struct btrfs_inode *btrfs_inode = BTRFS_I(eb->tree->mapping->host); + u32 blocksize_bits = btrfs_inode->vfs_inode.i_sb->s_blocksize_bits; + WARN_ON(atomic_read(&eb->refs) == 0); if (atomic_dec_and_test(&eb->refs)) { if (test_bit(EXTENT_BUFFER_DUMMY, &eb->bflags)) { @@ -4381,17 +4815,35 @@ static int release_extent_buffer(struct extent_buffer *eb, gfp_t mask) } else { struct extent_io_tree *tree = eb->tree; + /* Dumb hack to make releasing the page easier. */ + if (eb->len < PAGE_SIZE) + atomic_inc(&eb->refs); + spin_unlock(&eb->refs_lock); + // This also needs to be fixed when allocation code is + // merged. spin_lock(&tree->buffer_lock); - radix_tree_delete(&tree->buffer, - eb->start >> PAGE_CACHE_SHIFT); + if (eb->len >= PAGE_SIZE) + radix_tree_delete(&tree->buffer, + eb->start >> blocksize_bits); + else + radix_tree_delete(&tree->buffer, + eb->start >> blocksize_bits); spin_unlock(&tree->buffer_lock); } /* Should be safe to release our pages at this point */ - btrfs_release_extent_buffer_page(eb, 0); - call_rcu(&eb->rcu_head, btrfs_release_extent_buffer_rcu); + if (eb->len >= PAGE_SIZE) { + btrfs_release_extent_buffer_page(eb, 0); + call_rcu(&eb->rcu_head, + btrfs_release_extent_buffer_rcu); + } else { + struct extent_buffer *eb_head; + if (btrfs_release_extent_buffers_page(eb, &eb_head)) + btrfs_release_extent_buffers_rcu( + &eb_head->rcu_head); + } return 1; } spin_unlock(&eb->refs_lock); @@ -4482,6 +4934,11 @@ int set_extent_buffer_dirty(struct extent_buffer *eb) for (i = 0; i < num_pages; i++) set_page_dirty(extent_buffer_page(eb, i)); + /* Run an additional sanity check here instead of + * in btree_set_page_dirty() since we can''t get the eb there for + * subpage blocksize. */ + if (eb->len < PAGE_SIZE) + btrfs_assert_tree_locked(eb); return was_dirty; } @@ -4503,11 +4960,14 @@ int clear_extent_buffer_uptodate(struct extent_buffer *eb) unsigned long num_pages; clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); - num_pages = num_extent_pages(eb->start, eb->len); - for (i = 0; i < num_pages; i++) { - page = extent_buffer_page(eb, i); - if (page) - ClearPageUptodate(page); + /* Ignore the page''s uptodate flag forsubpage blocksize. */ + if (eb->len >= PAGE_SIZE) { + num_pages = num_extent_pages(eb->start, eb->len); + for (i = 0; i < num_pages; i++) { + page = extent_buffer_page(eb, i); + if (page) + ClearPageUptodate(page); + } } return 0; } @@ -4518,11 +4978,16 @@ int set_extent_buffer_uptodate(struct extent_buffer *eb) struct page *page; unsigned long num_pages; + /* Set extent buffer up-to-date. */ set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); - num_pages = num_extent_pages(eb->start, eb->len); - for (i = 0; i < num_pages; i++) { - page = extent_buffer_page(eb, i); - SetPageUptodate(page); + + /* Set pages up-to-date. */ + if (eb->len >= PAGE_CACHE_SIZE) { + num_pages = num_extent_pages(eb->start, eb->len); + for (i = 0; i < num_pages; i++) { + page = extent_buffer_page(eb, i); + SetPageUptodate(page); + } } return 0; } @@ -4606,7 +5071,7 @@ int read_extent_buffer_pages(struct extent_io_tree *tree, } } if (all_uptodate) { - if (start_i == 0) + if (start_i == 0 && eb->len >= PAGE_SIZE) set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); goto unlock_exit; } @@ -4693,7 +5158,7 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, unsigned long *map_start, unsigned long *map_len) { - size_t offset = start & (PAGE_CACHE_SIZE - 1); + size_t offset; char *kaddr; struct page *p; size_t start_offset = eb->start & ((u64)PAGE_CACHE_SIZE - 1); @@ -4709,6 +5174,9 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, *map_start = 0; } else { offset = 0; + // I''m pretty sure that this is a) just plain wrong and + // b) will never realistically execute; not entirely sure, + // though... *map_start = ((u64)i << PAGE_CACHE_SHIFT) - start_offset; } @@ -4722,7 +5190,7 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, p = extent_buffer_page(eb, i); kaddr = page_address(p); *map = kaddr + offset; - *map_len = PAGE_CACHE_SIZE - offset; + *map_len = (PAGE_CACHE_SIZE - offset) & (eb->len - 1); return 0; } @@ -4996,6 +5464,7 @@ void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset, int try_release_extent_buffer(struct page *page, gfp_t mask) { struct extent_buffer *eb; + int ret; /* * We need to make sure noboody is attaching this page to an eb right @@ -5010,30 +5479,61 @@ int try_release_extent_buffer(struct page *page, gfp_t mask) eb = (struct extent_buffer *)page->private; BUG_ON(!eb); - /* - * This is a little awful but should be ok, we need to make sure that - * the eb doesn''t disappear out from under us while we''re looking at - * this page. - */ - spin_lock(&eb->refs_lock); - if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb)) { - spin_unlock(&eb->refs_lock); + if (eb->len >= PAGE_SIZE) { + /* + * This is a little awful but should be ok, we need to make + * sure that the eb doesn''t disappear out from under us while + * we''re looking at this page. + */ + spin_lock(&eb->refs_lock); + if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb)) { + spin_unlock(&eb->refs_lock); + spin_unlock(&page->mapping->private_lock); + return 0; + } spin_unlock(&page->mapping->private_lock); - return 0; - } - spin_unlock(&page->mapping->private_lock); - if ((mask & GFP_NOFS) == GFP_NOFS) - mask = GFP_NOFS; + if ((mask & GFP_NOFS) == GFP_NOFS) + mask = GFP_NOFS; - /* - * If tree ref isn''t set then we know the ref on this eb is a real ref, - * so just return, this page will likely be freed soon anyway. - */ - if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) { - spin_unlock(&eb->refs_lock); - return 0; - } + /* + * If tree ref isn''t set then we know the ref on this eb is a + * real ref, so just return, this page will likely be freed + * soon anyway. + */ + if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) { + spin_unlock(&eb->refs_lock); + return 0; + } - return release_extent_buffer(eb, mask); + return release_extent_buffer(eb, mask); + } else { + ret = 0; + do { + spin_lock(&eb->refs_lock); + if (atomic_read(&eb->refs) != 1 || + extent_buffer_under_io(eb)) { + spin_unlock(&eb->refs_lock); + continue; + } + spin_unlock(&page->mapping->private_lock); + + if ((mask & GFP_NOFS) == GFP_NOFS) + mask = GFP_NOFS; + + if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, + &eb->bflags)) { + spin_unlock(&eb->refs_lock); + spin_lock(&page->mapping->private_lock); + continue; + } + + /* No idea what to do with the ''ret'' here. */ + ret |= release_extent_buffer(eb, mask); + + spin_lock(&page->mapping->private_lock); + } while ((eb = eb->next) != NULL); + spin_unlock(&page->mapping->private_lock); + return ret; + } } diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 2eacfab..955ef5e 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -163,6 +163,9 @@ struct extent_buffer { wait_queue_head_t lock_wq; struct page *inline_pages[INLINE_EXTENT_BUFFER_PAGES]; struct page **pages; + + /* Acyclic linked list of extent_buffers belonging to a single page. */ + struct extent_buffer *next; }; static inline void extent_set_compress_type(unsigned long *bio_flags, @@ -270,6 +273,10 @@ void set_page_extent_mapped(struct page *page); struct extent_buffer *alloc_extent_buffer(struct extent_io_tree *tree, u64 start, unsigned long len); +struct extent_buffer *alloc_extent_buffer_single(struct extent_io_tree *tree, + u64 start, unsigned long len); +struct extent_buffer *alloc_extent_buffer_multiple(struct extent_io_tree *tree, + u64 start, unsigned long len); struct extent_buffer *alloc_dummy_extent_buffer(u64 start, unsigned long len); struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src); struct extent_buffer *find_extent_buffer(struct extent_io_tree *tree, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 3bff4d4..8745289 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1340,7 +1340,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, } ret = btrfs_delalloc_reserve_space(inode, - num_pages << PAGE_CACHE_SHIFT); + write_bytes); if (ret) break; @@ -1354,7 +1354,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, force_page_uptodate); if (ret) { btrfs_delalloc_release_space(inode, - num_pages << PAGE_CACHE_SHIFT); + write_bytes); break; } @@ -1392,8 +1392,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, spin_unlock(&BTRFS_I(inode)->lock); } btrfs_delalloc_release_space(inode, - (num_pages - dirty_pages) << - PAGE_CACHE_SHIFT); + write_bytes - copied); } if (copied > 0) { @@ -1402,7 +1401,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, NULL); if (ret) { btrfs_delalloc_release_space(inode, - dirty_pages << PAGE_CACHE_SHIFT); + copied); btrfs_drop_pages(pages, num_pages); break; } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 59ea2e4..1c0e254 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -960,6 +960,8 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, if (block_group) start = block_group->key.objectid; + else // Hmm I don''t recall putting this here. + start = (u64)-1; while (block_group && (start < block_group->key.objectid + block_group->key.offset)) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3368c10..11ff3dd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2040,22 +2040,38 @@ static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end, struct btrfs_root *root = BTRFS_I(inode)->root; struct btrfs_ordered_extent *ordered_extent = NULL; struct btrfs_workers *workers; + u64 block_size = 1 << inode->i_blkbits; + u64 io_size; + + if (block_size >= PAGE_CACHE_SIZE) + io_size = end - start + 1; + else + io_size = block_size; trace_btrfs_writepage_end_io_hook(page, start, end, uptodate); ClearPagePrivate2(page); - if (!btrfs_dec_test_ordered_pending(inode, &ordered_extent, start, - end - start + 1, uptodate)) - return 0; - - ordered_extent->work.func = finish_ordered_fn; - ordered_extent->work.flags = 0; - - if (btrfs_is_free_space_inode(inode)) - workers = &root->fs_info->endio_freespace_worker; - else - workers = &root->fs_info->endio_write_workers; - btrfs_queue_worker(workers, &ordered_extent->work); +next_block: + if (btrfs_dec_test_ordered_pending(inode, &ordered_extent, start, + io_size, uptodate)) { + ordered_extent->work.func = finish_ordered_fn; + ordered_extent->work.flags = 0; + + if (btrfs_is_free_space_inode(inode)) + workers = &root->fs_info->endio_freespace_worker; + else + workers = &root->fs_info->endio_write_workers; + btrfs_queue_worker(workers, &ordered_extent->work); + } + + // I think that writes are always block-size granularity. + if (block_size < PAGE_CACHE_SIZE) + BUG_ON(start & (io_size - 1)); // Welp, one way to make sure... + start += io_size; + if (start < end) + goto next_block; + // We overshot. I''m pretty sure that this is terrible. + BUG_ON(start != (end + 1)); return 0; } diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 657d83c..c0269df 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3937,8 +3937,8 @@ long btrfs_ioctl(struct file *file, unsigned int return btrfs_ioctl_qgroup_create(file, argp); case BTRFS_IOC_QGROUP_LIMIT: return btrfs_ioctl_qgroup_limit(file, argp); - case BTRFS_IOC_DEV_REPLACE: - return btrfs_ioctl_dev_replace(root, argp); + //case BTRFS_IOC_DEV_REPLACE: +// return btrfs_ioctl_dev_replace(root, argp); } return -ENOTTY; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Dec 17, 2012 at 11:13:25PM -0800, clinew@linux.vnet.ibm.com wrote:> From: Wade Cline <clinew@linux.vnet.ibm.com> > > v1 -> v2: > - Added Signed-off-by tag (it''s kind of important). > > This patch is only an RFC. My internship is ending and I was hoping > to get some feedback and incorporate any suggestions people may > have before my internship ends along with life as we know it (this > Friday). > > The filesystem should mount/umount properly but tends towards the > explosive side when writes start happening. My current focus is on > checksumming issues and also an error when releasing extent buffers > when creating a large file with ''dd''... and probably any other > method. There''s still a significant amount of work that needs to be > done before this should be incorporated into mainline. > > A couple of notes: > - Based off of Josef''s btrfs-next branch, commit > 8d089a86e45b34d7bc534d955e9d8543609f7e42 > - C99-style comments are "meta-comments" where I''d like more > feedback; they aren''t permanent but make ''checkpatch'' moan. > - extent_buffer allocation and freeing need their code paths > merged; they''re currently in separate functions and are both > very ugly. > - The patch itself will eventually need to be broken down > into smaller pieces if at all possible...Could you please first elaborate why we need this subpagesize stuff and any user case in this patch''s commit log? Or Am I missing something? thanks, liubo> > Signed-off-by: Wade Cline <clinew@linux.vnet.ibm.com> > --- > fs/btrfs/ctree.h | 11 +- > fs/btrfs/disk-io.c | 110 +++++++-- > fs/btrfs/extent_io.c | 632 ++++++++++++++++++++++++++++++++++++++----- > fs/btrfs/extent_io.h | 7 + > fs/btrfs/file.c | 9 +- > fs/btrfs/free-space-cache.c | 2 + > fs/btrfs/inode.c | 38 ++- > fs/btrfs/ioctl.c | 4 +- > 8 files changed, 709 insertions(+), 104 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index fbaaf20..c786a58 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -1938,14 +1938,19 @@ static inline void btrfs_set_token_##name(struct extent_buffer *eb, \ > #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits) \ > static inline u##bits btrfs_##name(struct extent_buffer *eb) \ > { \ > - type *p = page_address(eb->pages[0]); \ > - u##bits res = le##bits##_to_cpu(p->member); \ > + type *p; \ > + u##bits res; \ > + \ > + p = page_address(eb->pages[0]) + (eb->start & (PAGE_SIZE - 1)); \ > + res = le##bits##_to_cpu(p->member); \ > return res; \ > } \ > static inline void btrfs_set_##name(struct extent_buffer *eb, \ > u##bits val) \ > { \ > - type *p = page_address(eb->pages[0]); \ > + type *p; \ > + \ > + p = page_address(eb->pages[0]) + (eb->start & (PAGE_SIZE - 1)); \ > p->member = cpu_to_le##bits(val); \ > } > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index f633af8..00b80b7 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -373,6 +373,24 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, > WAIT_COMPLETE, > btree_get_extent, mirror_num); > if (!ret) { > + /* > + * I think that this is bad and should be moved > + * into btree_readpage_end_io_hook(), but that > + * it should apply to a single block at a time. > + * That may be difficult and would make the > + * function name a misnomer, but mostly I hate > + * the silly goto. > + */ > + if (eb->len < PAGE_SIZE && > + !extent_buffer_uptodate(eb)) { > + if (csum_tree_block(root, eb, 1)) { > + ret = -EIO; > + goto bad; > + } else { > + set_extent_buffer_uptodate(eb); > + } > + } > + > if (!verify_parent_transid(io_tree, eb, > parent_transid, 0)) > break; > @@ -385,6 +403,7 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, > * there is no reason to read the other copies, they won''t be > * any less wrong. > */ > +bad: > if (test_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags)) > break; > > @@ -416,29 +435,55 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, > * checksum a dirty tree block before IO. This has extra checks to make sure > * we only fill in the checksum field in the first page of a multi-page block > */ > - > -static int csum_dirty_buffer(struct btrfs_root *root, struct page *page) > +static int csum_dirty_buffer(struct btrfs_root *root, struct page *page, > + unsigned int offset, unsigned int len) > { > - struct extent_io_tree *tree; > u64 start = (u64)page->index << PAGE_CACHE_SHIFT; > u64 found_start; > struct extent_buffer *eb; > > - tree = &BTRFS_I(page->mapping->host)->io_tree; > + if (!PageUptodate(page)) { > + WARN_ON(1); > + return 0; > + } > > eb = (struct extent_buffer *)page->private; > - if (page != eb->pages[0]) > - return 0; > + if (eb->len >= PAGE_SIZE) { > + if (eb->pages[0] != page) > + return 0; > + } else { > + start += offset; > + while (eb->start != start) { > + eb = eb->next; > + BUG_ON(!eb); > + } > +next: > + if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) > + WARN_ON(1); > + if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) > + WARN_ON(1); > + if (eb->pages[0] != page) > + WARN_ON(1); > + } > + > found_start = btrfs_header_bytenr(eb); > if (found_start != start) { > WARN_ON(1); > return 0; > } > - if (!PageUptodate(page)) { > - WARN_ON(1); > - return 0; > - } > + > csum_tree_block(root, eb, 0); > + > + if (eb->len < PAGE_SIZE) { > + len -= eb->len; > + BUG_ON(len & (eb->len - 1)); > + if (len) { > + start += eb->len; > + eb = eb->next; > + goto next; > + } > + } > + > return 0; > } > > @@ -579,6 +624,19 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, > > tree = &BTRFS_I(page->mapping->host)->io_tree; > eb = (struct extent_buffer *)page->private; > + if (eb->len < PAGE_SIZE) { > + /* Find the eb that tried to submit a read request. This is > + * a little bit funky. */ > + do { > + if (!atomic_read(&eb->io_pages)) > + continue; > + if (test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags) || > + test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) > + continue; > + break; > + } while ((eb = eb->next)); > + BUG_ON(!eb); > + } > > /* the pending IO might have been the only thing that kept this buffer > * in memory. Make sure we have a ref for all this other checks > @@ -615,8 +673,11 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, > btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb), > eb, found_level); > > - ret = csum_tree_block(root, eb, 1); > - if (ret) { > + /* > + * Subpagesize blocksize checksumming is currently done in > + * btree_read_extent_buffer_pages(). > + */ > + if (eb->len >= PAGE_SIZE && csum_tree_block(root, eb, 1)) { > ret = -EIO; > goto err; > } > @@ -631,8 +692,15 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, > ret = -EIO; > } > > - if (!ret) > + /* > + * For subpagesize blocksize, only the page needs to be set > + * up-to-date; each extent_buffer is set up-to-date when it is > + * checksummed. > + */ > + if (eb->len >= PAGE_SIZE) > set_extent_buffer_uptodate(eb); > + else > + SetPageUptodate(eb->pages[0]); > err: > if (test_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags)) { > clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags); > @@ -828,7 +896,8 @@ static int btree_csum_one_bio(struct bio *bio) > WARN_ON(bio->bi_vcnt <= 0); > while (bio_index < bio->bi_vcnt) { > root = BTRFS_I(bvec->bv_page->mapping->host)->root; > - ret = csum_dirty_buffer(root, bvec->bv_page); > + ret = csum_dirty_buffer(root, bvec->bv_page, bvec->bv_offset, > + bvec->bv_len); > if (ret) > break; > bio_index++; > @@ -1007,9 +1076,13 @@ static int btree_set_page_dirty(struct page *page) > BUG_ON(!PagePrivate(page)); > eb = (struct extent_buffer *)page->private; > BUG_ON(!eb); > - BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)); > - BUG_ON(!atomic_read(&eb->refs)); > - btrfs_assert_tree_locked(eb); > + /* There doesn''t seem to be a method for passing the correct eb > + * to this function, so no sanity checks for subpagesize blocksize. */ > + if (eb->len >= PAGE_SIZE) { > + BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)); > + BUG_ON(!atomic_read(&eb->refs)); > + btrfs_assert_tree_locked(eb); > + } > #endif > return __set_page_dirty_nobuffers(page); > } > @@ -2400,11 +2473,14 @@ int open_ctree(struct super_block *sb, > goto fail_sb_buffer; > } > > +#if 0 > + // Hmm. How to deal wth this for subpagesize blocksize? > if (sectorsize != PAGE_SIZE) { > printk(KERN_WARNING "btrfs: Incompatible sector size(%lu) " > "found on %s\n", (unsigned long)sectorsize, sb->s_id); > goto fail_sb_buffer; > } > +#endif > > mutex_lock(&fs_info->chunk_mutex); > ret = btrfs_read_sys_array(tree_root); > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 1b319df..c1e052e 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -2519,7 +2519,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, > int contig = 0; > int this_compressed = bio_flags & EXTENT_BIO_COMPRESSED; > int old_compressed = prev_bio_flags & EXTENT_BIO_COMPRESSED; > - size_t page_size = min_t(size_t, size, PAGE_CACHE_SIZE); > + size_t bio_size = min_t(size_t, size, PAGE_CACHE_SIZE); > > if (bio_ret && *bio_ret) { > bio = *bio_ret; > @@ -2530,8 +2530,8 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, > sector; > > if (prev_bio_flags != bio_flags || !contig || > - merge_bio(tree, page, offset, page_size, bio, bio_flags) || > - bio_add_page(bio, page, page_size, offset) < page_size) { > + merge_bio(tree, page, offset, bio_size, bio, bio_flags) || > + bio_add_page(bio, page, bio_size, offset) < bio_size) { > ret = submit_one_bio(rw, bio, mirror_num, > prev_bio_flags); > if (ret < 0) > @@ -2550,7 +2550,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, > if (!bio) > return -ENOMEM; > > - bio_add_page(bio, page, page_size, offset); > + bio_add_page(bio, page, bio_size, offset); > bio->bi_end_io = end_io_func; > bio->bi_private = tree; > > @@ -3168,14 +3168,28 @@ static void end_bio_extent_buffer_writepage(struct bio *bio, int err) > int uptodate = err == 0; > struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1; > struct extent_buffer *eb; > + unsigned int offset; > + unsigned int bv_len; > + u64 start; > int done; > > do { > struct page *page = bvec->bv_page; > + offset = bvec->bv_offset; > + bv_len = bvec->bv_len; > + start = ((u64)page->index << PAGE_CACHE_SHIFT) + offset; > > bvec--; > eb = (struct extent_buffer *)page->private; > BUG_ON(!eb); > + if (eb->len < PAGE_SIZE) { > + while (eb->start != start) { > + eb = eb->next; > + BUG_ON(!eb); > + } > + } > + > +next_eb: > done = atomic_dec_and_test(&eb->io_pages); > > if (!uptodate || test_bit(EXTENT_BUFFER_IOERR, &eb->bflags)) { > @@ -3184,12 +3198,50 @@ static void end_bio_extent_buffer_writepage(struct bio *bio, int err) > SetPageError(page); > } > > - end_page_writeback(page); > + if (eb->len >= PAGE_SIZE) { > + end_page_writeback(page); > > - if (!done) > - continue; > + if (!done) > + continue; > > - end_extent_buffer_writeback(eb); > + end_extent_buffer_writeback(eb); > + } else { > + /* Sanity checks. */ > + if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) > + WARN_ON(1); > + > + /* Ensure I/O page count is zero. */ > + if (!done) > + WARN_ON(1); > + > + /* Clear the extent buffer''s writeback flag. */ > + end_extent_buffer_writeback(eb); > + > + /* > + * See if any other extent buffers exists within the > + * page. > + */ > + bv_len -= eb->len; > + BUG_ON(bv_len & (eb->len - 1)); > + if (bv_len) { > + eb = eb->next; > + goto next_eb; > + } > + > + /* Clear the page writeback flag. */ > + eb = (struct extent_buffer *)page->private; > + BUG_ON(!eb); /* Can this even happen? */ > + do { > + if (!eb) { > + end_page_writeback(page); > + break; > + } > + if (test_bit(EXTENT_BUFFER_WRITEBACK, > + &eb->bflags)) > + break; > + eb = eb->next; > + } while (1); > + } > } while (bvec >= bio->bi_io_vec); > > bio_put(bio); > @@ -3202,7 +3254,8 @@ static int write_one_eb(struct extent_buffer *eb, > struct extent_page_data *epd) > { > struct block_device *bdev = fs_info->fs_devices->latest_bdev; > - u64 offset = eb->start; > + u64 start = eb->start; > + unsigned long offset = eb->start & (PAGE_CACHE_SIZE - 1); > unsigned long i, num_pages; > unsigned long bio_flags = 0; > int rw = (epd->sync_io ? WRITE_SYNC : WRITE); > @@ -3219,10 +3272,10 @@ static int write_one_eb(struct extent_buffer *eb, > > clear_page_dirty_for_io(p); > set_page_writeback(p); > - ret = submit_extent_page(rw, eb->tree, p, offset >> 9, > - PAGE_CACHE_SIZE, 0, bdev, &epd->bio, > - -1, end_bio_extent_buffer_writepage, > - 0, epd->bio_flags, bio_flags); > + ret = submit_extent_page(rw, eb->tree, p, start >> 9, eb->len, > + offset, bdev, &epd->bio, -1, > + end_bio_extent_buffer_writepage, 0, > + epd->bio_flags, bio_flags); > epd->bio_flags = bio_flags; > if (ret) { > set_bit(EXTENT_BUFFER_IOERR, &eb->bflags); > @@ -3232,7 +3285,7 @@ static int write_one_eb(struct extent_buffer *eb, > ret = -EIO; > break; > } > - offset += PAGE_CACHE_SIZE; > + start += PAGE_CACHE_SIZE; > update_nr_written(p, wbc, 1); > unlock_page(p); > } > @@ -3252,7 +3305,7 @@ int btree_write_cache_pages(struct address_space *mapping, > { > struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree; > struct btrfs_fs_info *fs_info = BTRFS_I(mapping->host)->root->fs_info; > - struct extent_buffer *eb, *prev_eb = NULL; > + struct extent_buffer *eb, *next, *prev_eb = NULL; > struct extent_page_data epd = { > .bio = NULL, > .tree = tree, > @@ -3326,17 +3379,41 @@ retry: > spin_unlock(&mapping->private_lock); > continue; > } > + prev_eb = eb; > + > +next_eb: > + next = eb->next; > > ret = atomic_inc_not_zero(&eb->refs); > - spin_unlock(&mapping->private_lock); > - if (!ret) > - continue; > + if (eb->len >= PAGE_SIZE) { > + spin_unlock(&mapping->private_lock); > + if (!ret) > + continue; > + } else { > + if (!ret) > + goto inc_eb; > + spin_unlock(&mapping->private_lock); > + > + if (!test_bit(EXTENT_BUFFER_DIRTY, > + &eb->bflags)) { > + spin_lock(&mapping->private_lock); > + atomic_dec(&eb->refs); > + ret = 0; > + goto inc_eb; > + } > + } > > - prev_eb = eb; > ret = lock_extent_buffer_for_io(eb, fs_info, &epd); > if (!ret) { > + if (!(eb->len >= PAGE_SIZE)) > + spin_lock(&mapping->private_lock); > + > free_extent_buffer(eb); > - continue; > + > + if (eb->len >= PAGE_SIZE) > + continue; > + else > + goto inc_eb; > } > > ret = write_one_eb(eb, fs_info, wbc, &epd); > @@ -3345,8 +3422,26 @@ retry: > free_extent_buffer(eb); > break; > } > + > + if (eb->len >= PAGE_SIZE) { > + free_extent_buffer(eb); > + goto written; > + } > + > + if (next) > + spin_lock(&mapping->private_lock); > free_extent_buffer(eb); > > +inc_eb: > + if (!next) { > + if (spin_is_locked(&mapping->private_lock)) > + spin_unlock(&mapping->private_lock); > + goto written; > + } > + eb = next; > + goto next_eb; > + > +written: > /* > * the filesystem may choose to bump up nr_to_write. > * We have to make sure to honor the new nr_to_write > @@ -4000,6 +4095,18 @@ static void __free_extent_buffer(struct extent_buffer *eb) > kmem_cache_free(extent_buffer_cache, eb); > } > > +/* Helper function to free extent buffers when there are multiple > + * extent buffers per page. */ > +static void __free_extent_buffers(struct extent_buffer *eb) > +{ > + struct extent_buffer *next; > + > + do { > + next = eb->next; > + __free_extent_buffer(eb); > + } while ((eb = next)); > +} > + > static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, > u64 start, > unsigned long len, > @@ -4017,6 +4124,7 @@ static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, > eb->len = len; > eb->tree = tree; > eb->bflags = 0; > + eb->next = NULL; > rwlock_init(&eb->lock); > atomic_set(&eb->write_locks, 0); > atomic_set(&eb->read_locks, 0); > @@ -4054,6 +4162,62 @@ static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, > return eb; > } > > +/* Allocates an array of extent buffers for the specified page. > + * Should be called with the mapping''s spin lock set. */ > +static struct extent_buffer *__alloc_extent_buffers(struct extent_io_tree *tree, > + struct page *page, > + gfp_t mask) > +{ > + u32 blocksize_bits; > + struct btrfs_inode *inode; > + struct extent_buffer *eb_head; > + struct extent_buffer *eb_cur; > + u64 start; > + unsigned long len; > + int i; > + > + /* Initialize variables. */ > + inode = BTRFS_I(tree->mapping->host); > + blocksize_bits = inode->vfs_inode.i_sb->s_blocksize_bits; > + > + /* Calculate extent buffer dimensions. */ > + start = page->index << PAGE_CACHE_SHIFT; > + len = inode->root->leafsize; > + > + /* Allocate the head extent buffer. */ > + eb_head = __alloc_extent_buffer(tree, start, len, GFP_NOFS); > + if (!eb_head) { > + WARN_ON(1); > + return NULL; > + } > + start += len; > + eb_head->pages[0] = page; > + eb_cur = eb_head; > + > + /* Allocate the other extent buffers. */ > + for (i = 1; i < (PAGE_CACHE_SIZE >> blocksize_bits); i++) { > + eb_cur->next = __alloc_extent_buffer(tree, start, len, > + GFP_NOFS); > + if (!eb_cur->next) { > + WARN_ON(1); > + goto free_ebs; > + } > + start += len; > + eb_cur = eb_cur->next; > + eb_cur->pages[0] = page; > + } > + > + /* Return the extent buffer head. */ > + return eb_head; > + > +free_ebs: > + /* Free each extent buffer. */ > + // TODO: Implement. > + pr_crit(KERN_CRIT "HACK: Need to implement this...\n"); > + WARN_ON(1); > + return NULL; > +} > + > struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src) > { > unsigned long i; > @@ -4170,12 +4334,121 @@ static void btrfs_release_extent_buffer_page(struct extent_buffer *eb, > } > > /* > + * Frees the page if all extent buffers belonging to the page are not > + * referernced. The extent buffers themselves must be free afterwards, too... > + * ret: 0 if the page did not need to be freed; 1 if the page was freed. > + */ > +static int btrfs_release_extent_buffers_page(struct extent_buffer *eb, > + struct extent_buffer **eb_head) > +{ > + struct extent_buffer *eb_cur; > + struct extent_buffer *eb_temp; > + struct page *page; > + int mapped = !test_bit(EXTENT_BUFFER_DUMMY, &eb->bflags); > + int ret = 0; > + > + if (extent_buffer_under_io(eb)) > + BUG_ON(1); > + > + // ...is this even possible? > + if (!num_extent_pages(eb->start, eb->len)) { > + WARN_ON(1); > + return ret; > + } > + > + page = extent_buffer_page(eb, 0); > + if (page && mapped) { > + spin_lock(&page->mapping->private_lock); > + /* > + * We do this since we''ll remove the pages after we''ve > + * removed the eb from the radix tree, so we could race > + * and have this page now attached to the new eb. So > + * only clear page_private if it''s still connected to > + * this eb. > + */ > + if (!PagePrivate(page)) { > + spin_unlock(&page->mapping->private_lock); > + } else { > + /* Find the page eb corresponding to our eb. */ > + eb_cur = (struct extent_buffer *)page->private; > + while (eb_cur->start != eb->start) { > + eb_cur = eb_cur->next; > + BUG_ON(!eb_cur); > + } > + > + /* See if a new eb has been attached to the page. */ > + if (eb_cur != eb) { > + spin_unlock(&page->mapping->private_lock); > + ret = 1; > + goto page_release; > + } > + > + /* See if any other extent_buffer is using the page. */ > + eb_cur = (struct extent_buffer *)page->private; > + do { > + /* Check for any other references on the eb. */ > + spin_lock(&eb_cur->refs_lock); > + if (!atomic_dec_and_test(&eb_cur->refs)) { > + atomic_inc(&eb_cur->refs); > + spin_unlock(&eb_cur->refs_lock); > + eb_temp = eb_cur; > + eb_cur = (struct extent_buffer *) > + page->private; > + while (eb_cur != eb_temp) { > + atomic_inc(&eb_cur->refs); > + eb_cur = eb_cur->next; > + } > + spin_unlock( > + &page->mapping->private_lock); > + goto page_release; > + } > + spin_unlock(&eb_cur->refs_lock); > + } while ((eb_cur = eb_cur->next) != NULL); > + > + /* Sanity checks. */ > + eb_cur = (struct extent_buffer *)page->private; > + do { > + BUG_ON(extent_buffer_under_io(eb_cur)); > + } while ((eb_cur = eb_cur->next) != NULL); > + BUG_ON(PageDirty(page)); > + BUG_ON(PageWriteback(page)); > + /* > + * We need to make sure we haven''t been attached > + * to a new eb. > + */ > + eb_cur = (struct extent_buffer *)page->private; > + *eb_head = eb_cur; > + eb_temp = NULL; > + ClearPagePrivate(page); > + set_page_private(page, 0); > + /* One for the page private. */ > + page_cache_release(page); > + ret = 1; > + spin_unlock(&page->mapping->private_lock); > + } > + } > + > +page_release: > + if (page) { > + /* One for when we alloced the page */ > + page_cache_release(page); > + } > + return ret; > +} > + > +/* > * Helper for releasing the extent buffer. > */ > static inline void btrfs_release_extent_buffer(struct extent_buffer *eb) > { > - btrfs_release_extent_buffer_page(eb, 0); > - __free_extent_buffer(eb); > + if (eb->len >= PAGE_SIZE) { > + btrfs_release_extent_buffer_page(eb, 0); > + __free_extent_buffer(eb); > + } else { > + struct extent_buffer *eb_head; > + if (btrfs_release_extent_buffers_page(eb, &eb_head)) > + __free_extent_buffers(eb_head); > + } > } > > static void check_buffer_tree_ref(struct extent_buffer *eb) > @@ -4222,16 +4495,153 @@ static void mark_extent_buffer_accessed(struct extent_buffer *eb) > struct extent_buffer *alloc_extent_buffer(struct extent_io_tree *tree, > u64 start, unsigned long len) > { > - unsigned long num_pages = num_extent_pages(start, len); > - unsigned long i; > - unsigned long index = start >> PAGE_CACHE_SHIFT; > + /* Allocate a new extent_buffer depending on blocksize*/ > + if (len < PAGE_CACHE_SIZE) > + return alloc_extent_buffer_multiple(tree, start, len); > + return alloc_extent_buffer_single(tree, start, len); > +} > + > +struct extent_buffer *alloc_extent_buffer_multiple(struct extent_io_tree *tree, > + u64 start, > + unsigned long len) { > + > + struct address_space *mapping; > + u32 blocksize_bits; > + struct btrfs_inode *btrfs_inode; > + struct extent_buffer *eb_cur; > + struct extent_buffer *eb_head; > + struct extent_buffer *exists; > + unsigned long index; > + struct page *page; > + int ret; > + > + /* Initialize variables. */ > + btrfs_inode = BTRFS_I(tree->mapping->host); > + blocksize_bits = btrfs_inode->vfs_inode.i_sb->s_blocksize_bits; > + > + /* Sanity checks. */ > + WARN_ON(num_extent_pages(start, len) > 1); > + > + /* See if the extent_buffer already exists in the radix tree. */ > + rcu_read_lock(); > + eb_cur = radix_tree_lookup(&tree->buffer, start >> blocksize_bits); > + if (eb_cur && atomic_inc_not_zero(&eb_cur->refs)) { > + rcu_read_unlock(); > + mark_extent_buffer_accessed(eb_cur); > + return eb_cur; > + } > + rcu_read_unlock(); > + > + /* Find the page in the mapping. */ > + index = start >> PAGE_CACHE_SHIFT; > + mapping = tree->mapping; > + page = find_or_create_page(mapping, index, GFP_NOFS); > + if (!page) { > + WARN_ON(1); > + return NULL; > + } > + > + /* Allocate each extent buffer for the page. */ > + eb_head = __alloc_extent_buffers(tree, page, GFP_NOFS); > + if (!eb_head) { > + WARN_ON(1); > + return NULL; > + } > + > + /* See if extent buffers have already been allocated for > + * this page. */ > + spin_lock(&mapping->private_lock); > + if (PagePrivate(page)) { > + /* > + * We could have already allocated an eb for this page > + * and attached one so lets see if we can get a ref on > + * the existing eb, and if we can we know it''s good and > + * we can just return that one, else we know we can just > + * overwrite page->private. > + */ > + eb_cur = (struct extent_buffer *)page->private; > + while (eb_cur->start != start) { > + eb_cur = eb_cur->next; > + BUG_ON(!eb_cur); > + } > + check_buffer_tree_ref(eb_cur); > + spin_unlock(&mapping->private_lock); > + unlock_page(page); > + mark_extent_buffer_accessed(eb_cur); > + __free_extent_buffers(eb_head); > + return eb_cur; > + } > + > + /* Bind the extent buffer to the page. */ > + attach_extent_buffer_page(eb_head, page); > + spin_unlock(&mapping->private_lock); > + WARN_ON(PageDirty(page)); > + mark_page_accessed(page); > + > +again: > + /* Set eb_cur to the buffer added. */ > + eb_cur = eb_head; > + while (start != eb_cur->start) { > + eb_cur = eb_cur->next; > + BUG_ON(!eb_cur); > + } > + > + /* Preload the radix tree. */ > + ret = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM); > + if (ret) { > + WARN_ON(1); > + return NULL; > + } > + > + /* Add the extent buffer to the radix tree. */ > + spin_lock(&tree->buffer_lock); > + ret = radix_tree_insert(&tree->buffer, > + eb_cur->start >> blocksize_bits, > + eb_cur); > + if (ret == -EEXIST) { > + exists = radix_tree_lookup(&tree->buffer, > + eb_cur->start >> blocksize_bits); > + if (exists->start != start) > + BUG_ON(1); > + if (!atomic_inc_not_zero(&exists->refs)) { > + spin_unlock(&tree->buffer_lock); > + radix_tree_preload_end(); > + exists = NULL; > + goto again; > + } > + spin_unlock(&tree->buffer_lock); > + radix_tree_preload_end(); > + mark_extent_buffer_accessed(exists); > + WARN_ON(!atomic_dec_and_test(&eb_cur->refs)); > + btrfs_release_extent_buffer(eb_cur); > + return exists; > + } > + > + /* Set the extent buffer''s tree-reference bits. */ > + check_buffer_tree_ref(eb_cur); > + spin_unlock(&tree->buffer_lock); > + radix_tree_preload_end(); > + > + /* Not quite sure what this does. */ > + SetPageChecked(eb_head->pages[0]); > + unlock_page(eb_head->pages[0]); > + > + return eb_cur; > +} > + > +struct extent_buffer *alloc_extent_buffer_single(struct extent_io_tree *tree, > + u64 start, unsigned long len) { > + struct address_space *mapping = tree->mapping; > struct extent_buffer *eb; > struct extent_buffer *exists = NULL; > + unsigned long i; > + unsigned long index = start >> PAGE_CACHE_SHIFT; > + unsigned long num_pages = num_extent_pages(start, len); > struct page *p; > - struct address_space *mapping = tree->mapping; > int uptodate = 1; > int ret; > > + /* See if the extent_buffer already exists */ > rcu_read_lock(); > eb = radix_tree_lookup(&tree->buffer, start >> PAGE_CACHE_SHIFT); > if (eb && atomic_inc_not_zero(&eb->refs)) { > @@ -4350,9 +4760,17 @@ struct extent_buffer *find_extent_buffer(struct extent_io_tree *tree, > u64 start, unsigned long len) > { > struct extent_buffer *eb; > + struct btrfs_inode *btrfs_inode = BTRFS_I(tree->mapping->host); > + u32 blocksize_bits = btrfs_inode->vfs_inode.i_sb->s_blocksize_bits; > > rcu_read_lock(); > - eb = radix_tree_lookup(&tree->buffer, start >> PAGE_CACHE_SHIFT); > + // This branch needs to be fixed when the allocation code is merged. > + // Seriously. > + if (blocksize_bits >= PAGE_CACHE_SHIFT) > + eb = radix_tree_lookup(&tree->buffer, > + start >> PAGE_CACHE_SHIFT); > + else > + eb = radix_tree_lookup(&tree->buffer, start >> blocksize_bits); > if (eb && atomic_inc_not_zero(&eb->refs)) { > rcu_read_unlock(); > mark_extent_buffer_accessed(eb); > @@ -4371,9 +4789,25 @@ static inline void btrfs_release_extent_buffer_rcu(struct rcu_head *head) > __free_extent_buffer(eb); > } > > -/* Expects to have eb->eb_lock already held */ > +/* > + * The RCU head must point to the first extent buffer belonging to a page. > + */ > +static inline void btrfs_release_extent_buffers_rcu(struct rcu_head *head) > +{ > + struct extent_buffer *eb > + container_of(head, struct extent_buffer, rcu_head); > + > + do { > + call_rcu(&eb->rcu_head, btrfs_release_extent_buffer_rcu); > + } while ((eb = eb->next)); > +} > + > +/* Expects to have eb->refs_lock already held */ > static int release_extent_buffer(struct extent_buffer *eb, gfp_t mask) > { > + struct btrfs_inode *btrfs_inode = BTRFS_I(eb->tree->mapping->host); > + u32 blocksize_bits = btrfs_inode->vfs_inode.i_sb->s_blocksize_bits; > + > WARN_ON(atomic_read(&eb->refs) == 0); > if (atomic_dec_and_test(&eb->refs)) { > if (test_bit(EXTENT_BUFFER_DUMMY, &eb->bflags)) { > @@ -4381,17 +4815,35 @@ static int release_extent_buffer(struct extent_buffer *eb, gfp_t mask) > } else { > struct extent_io_tree *tree = eb->tree; > > + /* Dumb hack to make releasing the page easier. */ > + if (eb->len < PAGE_SIZE) > + atomic_inc(&eb->refs); > + > spin_unlock(&eb->refs_lock); > > + // This also needs to be fixed when allocation code is > + // merged. > spin_lock(&tree->buffer_lock); > - radix_tree_delete(&tree->buffer, > - eb->start >> PAGE_CACHE_SHIFT); > + if (eb->len >= PAGE_SIZE) > + radix_tree_delete(&tree->buffer, > + eb->start >> blocksize_bits); > + else > + radix_tree_delete(&tree->buffer, > + eb->start >> blocksize_bits); > spin_unlock(&tree->buffer_lock); > } > > /* Should be safe to release our pages at this point */ > - btrfs_release_extent_buffer_page(eb, 0); > - call_rcu(&eb->rcu_head, btrfs_release_extent_buffer_rcu); > + if (eb->len >= PAGE_SIZE) { > + btrfs_release_extent_buffer_page(eb, 0); > + call_rcu(&eb->rcu_head, > + btrfs_release_extent_buffer_rcu); > + } else { > + struct extent_buffer *eb_head; > + if (btrfs_release_extent_buffers_page(eb, &eb_head)) > + btrfs_release_extent_buffers_rcu( > + &eb_head->rcu_head); > + } > return 1; > } > spin_unlock(&eb->refs_lock); > @@ -4482,6 +4934,11 @@ int set_extent_buffer_dirty(struct extent_buffer *eb) > > for (i = 0; i < num_pages; i++) > set_page_dirty(extent_buffer_page(eb, i)); > + /* Run an additional sanity check here instead of > + * in btree_set_page_dirty() since we can''t get the eb there for > + * subpage blocksize. */ > + if (eb->len < PAGE_SIZE) > + btrfs_assert_tree_locked(eb); > return was_dirty; > } > > @@ -4503,11 +4960,14 @@ int clear_extent_buffer_uptodate(struct extent_buffer *eb) > unsigned long num_pages; > > clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); > - num_pages = num_extent_pages(eb->start, eb->len); > - for (i = 0; i < num_pages; i++) { > - page = extent_buffer_page(eb, i); > - if (page) > - ClearPageUptodate(page); > + /* Ignore the page''s uptodate flag forsubpage blocksize. */ > + if (eb->len >= PAGE_SIZE) { > + num_pages = num_extent_pages(eb->start, eb->len); > + for (i = 0; i < num_pages; i++) { > + page = extent_buffer_page(eb, i); > + if (page) > + ClearPageUptodate(page); > + } > } > return 0; > } > @@ -4518,11 +4978,16 @@ int set_extent_buffer_uptodate(struct extent_buffer *eb) > struct page *page; > unsigned long num_pages; > > + /* Set extent buffer up-to-date. */ > set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); > - num_pages = num_extent_pages(eb->start, eb->len); > - for (i = 0; i < num_pages; i++) { > - page = extent_buffer_page(eb, i); > - SetPageUptodate(page); > + > + /* Set pages up-to-date. */ > + if (eb->len >= PAGE_CACHE_SIZE) { > + num_pages = num_extent_pages(eb->start, eb->len); > + for (i = 0; i < num_pages; i++) { > + page = extent_buffer_page(eb, i); > + SetPageUptodate(page); > + } > } > return 0; > } > @@ -4606,7 +5071,7 @@ int read_extent_buffer_pages(struct extent_io_tree *tree, > } > } > if (all_uptodate) { > - if (start_i == 0) > + if (start_i == 0 && eb->len >= PAGE_SIZE) > set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); > goto unlock_exit; > } > @@ -4693,7 +5158,7 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, > unsigned long *map_start, > unsigned long *map_len) > { > - size_t offset = start & (PAGE_CACHE_SIZE - 1); > + size_t offset; > char *kaddr; > struct page *p; > size_t start_offset = eb->start & ((u64)PAGE_CACHE_SIZE - 1); > @@ -4709,6 +5174,9 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, > *map_start = 0; > } else { > offset = 0; > + // I''m pretty sure that this is a) just plain wrong and > + // b) will never realistically execute; not entirely sure, > + // though... > *map_start = ((u64)i << PAGE_CACHE_SHIFT) - start_offset; > } > > @@ -4722,7 +5190,7 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, > p = extent_buffer_page(eb, i); > kaddr = page_address(p); > *map = kaddr + offset; > - *map_len = PAGE_CACHE_SIZE - offset; > + *map_len = (PAGE_CACHE_SIZE - offset) & (eb->len - 1); > return 0; > } > > @@ -4996,6 +5464,7 @@ void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset, > int try_release_extent_buffer(struct page *page, gfp_t mask) > { > struct extent_buffer *eb; > + int ret; > > /* > * We need to make sure noboody is attaching this page to an eb right > @@ -5010,30 +5479,61 @@ int try_release_extent_buffer(struct page *page, gfp_t mask) > eb = (struct extent_buffer *)page->private; > BUG_ON(!eb); > > - /* > - * This is a little awful but should be ok, we need to make sure that > - * the eb doesn''t disappear out from under us while we''re looking at > - * this page. > - */ > - spin_lock(&eb->refs_lock); > - if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb)) { > - spin_unlock(&eb->refs_lock); > + if (eb->len >= PAGE_SIZE) { > + /* > + * This is a little awful but should be ok, we need to make > + * sure that the eb doesn''t disappear out from under us while > + * we''re looking at this page. > + */ > + spin_lock(&eb->refs_lock); > + if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb)) { > + spin_unlock(&eb->refs_lock); > + spin_unlock(&page->mapping->private_lock); > + return 0; > + } > spin_unlock(&page->mapping->private_lock); > - return 0; > - } > - spin_unlock(&page->mapping->private_lock); > > - if ((mask & GFP_NOFS) == GFP_NOFS) > - mask = GFP_NOFS; > + if ((mask & GFP_NOFS) == GFP_NOFS) > + mask = GFP_NOFS; > > - /* > - * If tree ref isn''t set then we know the ref on this eb is a real ref, > - * so just return, this page will likely be freed soon anyway. > - */ > - if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) { > - spin_unlock(&eb->refs_lock); > - return 0; > - } > + /* > + * If tree ref isn''t set then we know the ref on this eb is a > + * real ref, so just return, this page will likely be freed > + * soon anyway. > + */ > + if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) { > + spin_unlock(&eb->refs_lock); > + return 0; > + } > > - return release_extent_buffer(eb, mask); > + return release_extent_buffer(eb, mask); > + } else { > + ret = 0; > + do { > + spin_lock(&eb->refs_lock); > + if (atomic_read(&eb->refs) != 1 || > + extent_buffer_under_io(eb)) { > + spin_unlock(&eb->refs_lock); > + continue; > + } > + spin_unlock(&page->mapping->private_lock); > + > + if ((mask & GFP_NOFS) == GFP_NOFS) > + mask = GFP_NOFS; > + > + if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, > + &eb->bflags)) { > + spin_unlock(&eb->refs_lock); > + spin_lock(&page->mapping->private_lock); > + continue; > + } > + > + /* No idea what to do with the ''ret'' here. */ > + ret |= release_extent_buffer(eb, mask); > + > + spin_lock(&page->mapping->private_lock); > + } while ((eb = eb->next) != NULL); > + spin_unlock(&page->mapping->private_lock); > + return ret; > + } > } > diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h > index 2eacfab..955ef5e 100644 > --- a/fs/btrfs/extent_io.h > +++ b/fs/btrfs/extent_io.h > @@ -163,6 +163,9 @@ struct extent_buffer { > wait_queue_head_t lock_wq; > struct page *inline_pages[INLINE_EXTENT_BUFFER_PAGES]; > struct page **pages; > + > + /* Acyclic linked list of extent_buffers belonging to a single page. */ > + struct extent_buffer *next; > }; > > static inline void extent_set_compress_type(unsigned long *bio_flags, > @@ -270,6 +273,10 @@ void set_page_extent_mapped(struct page *page); > > struct extent_buffer *alloc_extent_buffer(struct extent_io_tree *tree, > u64 start, unsigned long len); > +struct extent_buffer *alloc_extent_buffer_single(struct extent_io_tree *tree, > + u64 start, unsigned long len); > +struct extent_buffer *alloc_extent_buffer_multiple(struct extent_io_tree *tree, > + u64 start, unsigned long len); > struct extent_buffer *alloc_dummy_extent_buffer(u64 start, unsigned long len); > struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src); > struct extent_buffer *find_extent_buffer(struct extent_io_tree *tree, > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 3bff4d4..8745289 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1340,7 +1340,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, > } > > ret = btrfs_delalloc_reserve_space(inode, > - num_pages << PAGE_CACHE_SHIFT); > + write_bytes); > if (ret) > break; > > @@ -1354,7 +1354,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, > force_page_uptodate); > if (ret) { > btrfs_delalloc_release_space(inode, > - num_pages << PAGE_CACHE_SHIFT); > + write_bytes); > break; > } > > @@ -1392,8 +1392,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, > spin_unlock(&BTRFS_I(inode)->lock); > } > btrfs_delalloc_release_space(inode, > - (num_pages - dirty_pages) << > - PAGE_CACHE_SHIFT); > + write_bytes - copied); > } > > if (copied > 0) { > @@ -1402,7 +1401,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, > NULL); > if (ret) { > btrfs_delalloc_release_space(inode, > - dirty_pages << PAGE_CACHE_SHIFT); > + copied); > btrfs_drop_pages(pages, num_pages); > break; > } > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index 59ea2e4..1c0e254 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -960,6 +960,8 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, > > if (block_group) > start = block_group->key.objectid; > + else // Hmm I don''t recall putting this here. > + start = (u64)-1; > > while (block_group && (start < block_group->key.objectid + > block_group->key.offset)) { > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 3368c10..11ff3dd 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -2040,22 +2040,38 @@ static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end, > struct btrfs_root *root = BTRFS_I(inode)->root; > struct btrfs_ordered_extent *ordered_extent = NULL; > struct btrfs_workers *workers; > + u64 block_size = 1 << inode->i_blkbits; > + u64 io_size; > + > + if (block_size >= PAGE_CACHE_SIZE) > + io_size = end - start + 1; > + else > + io_size = block_size; > > trace_btrfs_writepage_end_io_hook(page, start, end, uptodate); > > ClearPagePrivate2(page); > - if (!btrfs_dec_test_ordered_pending(inode, &ordered_extent, start, > - end - start + 1, uptodate)) > - return 0; > - > - ordered_extent->work.func = finish_ordered_fn; > - ordered_extent->work.flags = 0; > - > - if (btrfs_is_free_space_inode(inode)) > - workers = &root->fs_info->endio_freespace_worker; > - else > - workers = &root->fs_info->endio_write_workers; > - btrfs_queue_worker(workers, &ordered_extent->work); > +next_block: > + if (btrfs_dec_test_ordered_pending(inode, &ordered_extent, start, > + io_size, uptodate)) { > + ordered_extent->work.func = finish_ordered_fn; > + ordered_extent->work.flags = 0; > + > + if (btrfs_is_free_space_inode(inode)) > + workers = &root->fs_info->endio_freespace_worker; > + else > + workers = &root->fs_info->endio_write_workers; > + btrfs_queue_worker(workers, &ordered_extent->work); > + } > + > + // I think that writes are always block-size granularity. > + if (block_size < PAGE_CACHE_SIZE) > + BUG_ON(start & (io_size - 1)); // Welp, one way to make sure... > + start += io_size; > + if (start < end) > + goto next_block; > + // We overshot. I''m pretty sure that this is terrible. > + BUG_ON(start != (end + 1)); > > return 0; > } > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 657d83c..c0269df 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -3937,8 +3937,8 @@ long btrfs_ioctl(struct file *file, unsigned int > return btrfs_ioctl_qgroup_create(file, argp); > case BTRFS_IOC_QGROUP_LIMIT: > return btrfs_ioctl_qgroup_limit(file, argp); > - case BTRFS_IOC_DEV_REPLACE: > - return btrfs_ioctl_dev_replace(root, argp); > + //case BTRFS_IOC_DEV_REPLACE: > +// return btrfs_ioctl_dev_replace(root, argp); > } > > return -ENOTTY; > -- > 1.7.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On tue, 18 Dec 2012 15:30:51 +0800, Liu Bo wrote:> On Mon, Dec 17, 2012 at 11:13:25PM -0800, clinew@linux.vnet.ibm.com wrote: >> From: Wade Cline <clinew@linux.vnet.ibm.com> >> >> v1 -> v2: >> - Added Signed-off-by tag (it''s kind of important). >> >> This patch is only an RFC. My internship is ending and I was hoping >> to get some feedback and incorporate any suggestions people may >> have before my internship ends along with life as we know it (this >> Friday). >> >> The filesystem should mount/umount properly but tends towards the >> explosive side when writes start happening. My current focus is on >> checksumming issues and also an error when releasing extent buffers >> when creating a large file with ''dd''... and probably any other >> method. There''s still a significant amount of work that needs to be >> done before this should be incorporated into mainline. >> >> A couple of notes: >> - Based off of Josef''s btrfs-next branch, commit >> 8d089a86e45b34d7bc534d955e9d8543609f7e42 >> - C99-style comments are "meta-comments" where I''d like more >> feedback; they aren''t permanent but make ''checkpatch'' moan. >> - extent_buffer allocation and freeing need their code paths >> merged; they''re currently in separate functions and are both >> very ugly. >> - The patch itself will eventually need to be broken down >> into smaller pieces if at all possible... > > Could you please first elaborate why we need this subpagesize stuff and > any user case in this patch''s commit log? > Or Am I missing something?It is used on the machines on which the page size is larger than 4KB (Such as powerpc) Thanks Miao> > thanks, > liubo > >> >> Signed-off-by: Wade Cline <clinew@linux.vnet.ibm.com> >> --- >> fs/btrfs/ctree.h | 11 +- >> fs/btrfs/disk-io.c | 110 +++++++-- >> fs/btrfs/extent_io.c | 632 ++++++++++++++++++++++++++++++++++++++----- >> fs/btrfs/extent_io.h | 7 + >> fs/btrfs/file.c | 9 +- >> fs/btrfs/free-space-cache.c | 2 + >> fs/btrfs/inode.c | 38 ++- >> fs/btrfs/ioctl.c | 4 +- >> 8 files changed, 709 insertions(+), 104 deletions(-) >> >> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >> index fbaaf20..c786a58 100644 >> --- a/fs/btrfs/ctree.h >> +++ b/fs/btrfs/ctree.h >> @@ -1938,14 +1938,19 @@ static inline void btrfs_set_token_##name(struct extent_buffer *eb, \ >> #define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits) \ >> static inline u##bits btrfs_##name(struct extent_buffer *eb) \ >> { \ >> - type *p = page_address(eb->pages[0]); \ >> - u##bits res = le##bits##_to_cpu(p->member); \ >> + type *p; \ >> + u##bits res; \ >> + \ >> + p = page_address(eb->pages[0]) + (eb->start & (PAGE_SIZE - 1)); \ >> + res = le##bits##_to_cpu(p->member); \ >> return res; \ >> } \ >> static inline void btrfs_set_##name(struct extent_buffer *eb, \ >> u##bits val) \ >> { \ >> - type *p = page_address(eb->pages[0]); \ >> + type *p; \ >> + \ >> + p = page_address(eb->pages[0]) + (eb->start & (PAGE_SIZE - 1)); \ >> p->member = cpu_to_le##bits(val); \ >> } >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index f633af8..00b80b7 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -373,6 +373,24 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, >> WAIT_COMPLETE, >> btree_get_extent, mirror_num); >> if (!ret) { >> + /* >> + * I think that this is bad and should be moved >> + * into btree_readpage_end_io_hook(), but that >> + * it should apply to a single block at a time. >> + * That may be difficult and would make the >> + * function name a misnomer, but mostly I hate >> + * the silly goto. >> + */ >> + if (eb->len < PAGE_SIZE && >> + !extent_buffer_uptodate(eb)) { >> + if (csum_tree_block(root, eb, 1)) { >> + ret = -EIO; >> + goto bad; >> + } else { >> + set_extent_buffer_uptodate(eb); >> + } >> + } >> + >> if (!verify_parent_transid(io_tree, eb, >> parent_transid, 0)) >> break; >> @@ -385,6 +403,7 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, >> * there is no reason to read the other copies, they won''t be >> * any less wrong. >> */ >> +bad: >> if (test_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags)) >> break; >> >> @@ -416,29 +435,55 @@ static int btree_read_extent_buffer_pages(struct btrfs_root *root, >> * checksum a dirty tree block before IO. This has extra checks to make sure >> * we only fill in the checksum field in the first page of a multi-page block >> */ >> - >> -static int csum_dirty_buffer(struct btrfs_root *root, struct page *page) >> +static int csum_dirty_buffer(struct btrfs_root *root, struct page *page, >> + unsigned int offset, unsigned int len) >> { >> - struct extent_io_tree *tree; >> u64 start = (u64)page->index << PAGE_CACHE_SHIFT; >> u64 found_start; >> struct extent_buffer *eb; >> >> - tree = &BTRFS_I(page->mapping->host)->io_tree; >> + if (!PageUptodate(page)) { >> + WARN_ON(1); >> + return 0; >> + } >> >> eb = (struct extent_buffer *)page->private; >> - if (page != eb->pages[0]) >> - return 0; >> + if (eb->len >= PAGE_SIZE) { >> + if (eb->pages[0] != page) >> + return 0; >> + } else { >> + start += offset; >> + while (eb->start != start) { >> + eb = eb->next; >> + BUG_ON(!eb); >> + } >> +next: >> + if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) >> + WARN_ON(1); >> + if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) >> + WARN_ON(1); >> + if (eb->pages[0] != page) >> + WARN_ON(1); >> + } >> + >> found_start = btrfs_header_bytenr(eb); >> if (found_start != start) { >> WARN_ON(1); >> return 0; >> } >> - if (!PageUptodate(page)) { >> - WARN_ON(1); >> - return 0; >> - } >> + >> csum_tree_block(root, eb, 0); >> + >> + if (eb->len < PAGE_SIZE) { >> + len -= eb->len; >> + BUG_ON(len & (eb->len - 1)); >> + if (len) { >> + start += eb->len; >> + eb = eb->next; >> + goto next; >> + } >> + } >> + >> return 0; >> } >> >> @@ -579,6 +624,19 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, >> >> tree = &BTRFS_I(page->mapping->host)->io_tree; >> eb = (struct extent_buffer *)page->private; >> + if (eb->len < PAGE_SIZE) { >> + /* Find the eb that tried to submit a read request. This is >> + * a little bit funky. */ >> + do { >> + if (!atomic_read(&eb->io_pages)) >> + continue; >> + if (test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags) || >> + test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) >> + continue; >> + break; >> + } while ((eb = eb->next)); >> + BUG_ON(!eb); >> + } >> >> /* the pending IO might have been the only thing that kept this buffer >> * in memory. Make sure we have a ref for all this other checks >> @@ -615,8 +673,11 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, >> btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb), >> eb, found_level); >> >> - ret = csum_tree_block(root, eb, 1); >> - if (ret) { >> + /* >> + * Subpagesize blocksize checksumming is currently done in >> + * btree_read_extent_buffer_pages(). >> + */ >> + if (eb->len >= PAGE_SIZE && csum_tree_block(root, eb, 1)) { >> ret = -EIO; >> goto err; >> } >> @@ -631,8 +692,15 @@ static int btree_readpage_end_io_hook(struct page *page, u64 start, u64 end, >> ret = -EIO; >> } >> >> - if (!ret) >> + /* >> + * For subpagesize blocksize, only the page needs to be set >> + * up-to-date; each extent_buffer is set up-to-date when it is >> + * checksummed. >> + */ >> + if (eb->len >= PAGE_SIZE) >> set_extent_buffer_uptodate(eb); >> + else >> + SetPageUptodate(eb->pages[0]); >> err: >> if (test_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags)) { >> clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags); >> @@ -828,7 +896,8 @@ static int btree_csum_one_bio(struct bio *bio) >> WARN_ON(bio->bi_vcnt <= 0); >> while (bio_index < bio->bi_vcnt) { >> root = BTRFS_I(bvec->bv_page->mapping->host)->root; >> - ret = csum_dirty_buffer(root, bvec->bv_page); >> + ret = csum_dirty_buffer(root, bvec->bv_page, bvec->bv_offset, >> + bvec->bv_len); >> if (ret) >> break; >> bio_index++; >> @@ -1007,9 +1076,13 @@ static int btree_set_page_dirty(struct page *page) >> BUG_ON(!PagePrivate(page)); >> eb = (struct extent_buffer *)page->private; >> BUG_ON(!eb); >> - BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)); >> - BUG_ON(!atomic_read(&eb->refs)); >> - btrfs_assert_tree_locked(eb); >> + /* There doesn''t seem to be a method for passing the correct eb >> + * to this function, so no sanity checks for subpagesize blocksize. */ >> + if (eb->len >= PAGE_SIZE) { >> + BUG_ON(!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)); >> + BUG_ON(!atomic_read(&eb->refs)); >> + btrfs_assert_tree_locked(eb); >> + } >> #endif >> return __set_page_dirty_nobuffers(page); >> } >> @@ -2400,11 +2473,14 @@ int open_ctree(struct super_block *sb, >> goto fail_sb_buffer; >> } >> >> +#if 0 >> + // Hmm. How to deal wth this for subpagesize blocksize? >> if (sectorsize != PAGE_SIZE) { >> printk(KERN_WARNING "btrfs: Incompatible sector size(%lu) " >> "found on %s\n", (unsigned long)sectorsize, sb->s_id); >> goto fail_sb_buffer; >> } >> +#endif >> >> mutex_lock(&fs_info->chunk_mutex); >> ret = btrfs_read_sys_array(tree_root); >> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c >> index 1b319df..c1e052e 100644 >> --- a/fs/btrfs/extent_io.c >> +++ b/fs/btrfs/extent_io.c >> @@ -2519,7 +2519,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, >> int contig = 0; >> int this_compressed = bio_flags & EXTENT_BIO_COMPRESSED; >> int old_compressed = prev_bio_flags & EXTENT_BIO_COMPRESSED; >> - size_t page_size = min_t(size_t, size, PAGE_CACHE_SIZE); >> + size_t bio_size = min_t(size_t, size, PAGE_CACHE_SIZE); >> >> if (bio_ret && *bio_ret) { >> bio = *bio_ret; >> @@ -2530,8 +2530,8 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, >> sector; >> >> if (prev_bio_flags != bio_flags || !contig || >> - merge_bio(tree, page, offset, page_size, bio, bio_flags) || >> - bio_add_page(bio, page, page_size, offset) < page_size) { >> + merge_bio(tree, page, offset, bio_size, bio, bio_flags) || >> + bio_add_page(bio, page, bio_size, offset) < bio_size) { >> ret = submit_one_bio(rw, bio, mirror_num, >> prev_bio_flags); >> if (ret < 0) >> @@ -2550,7 +2550,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree, >> if (!bio) >> return -ENOMEM; >> >> - bio_add_page(bio, page, page_size, offset); >> + bio_add_page(bio, page, bio_size, offset); >> bio->bi_end_io = end_io_func; >> bio->bi_private = tree; >> >> @@ -3168,14 +3168,28 @@ static void end_bio_extent_buffer_writepage(struct bio *bio, int err) >> int uptodate = err == 0; >> struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1; >> struct extent_buffer *eb; >> + unsigned int offset; >> + unsigned int bv_len; >> + u64 start; >> int done; >> >> do { >> struct page *page = bvec->bv_page; >> + offset = bvec->bv_offset; >> + bv_len = bvec->bv_len; >> + start = ((u64)page->index << PAGE_CACHE_SHIFT) + offset; >> >> bvec--; >> eb = (struct extent_buffer *)page->private; >> BUG_ON(!eb); >> + if (eb->len < PAGE_SIZE) { >> + while (eb->start != start) { >> + eb = eb->next; >> + BUG_ON(!eb); >> + } >> + } >> + >> +next_eb: >> done = atomic_dec_and_test(&eb->io_pages); >> >> if (!uptodate || test_bit(EXTENT_BUFFER_IOERR, &eb->bflags)) { >> @@ -3184,12 +3198,50 @@ static void end_bio_extent_buffer_writepage(struct bio *bio, int err) >> SetPageError(page); >> } >> >> - end_page_writeback(page); >> + if (eb->len >= PAGE_SIZE) { >> + end_page_writeback(page); >> >> - if (!done) >> - continue; >> + if (!done) >> + continue; >> >> - end_extent_buffer_writeback(eb); >> + end_extent_buffer_writeback(eb); >> + } else { >> + /* Sanity checks. */ >> + if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) >> + WARN_ON(1); >> + >> + /* Ensure I/O page count is zero. */ >> + if (!done) >> + WARN_ON(1); >> + >> + /* Clear the extent buffer''s writeback flag. */ >> + end_extent_buffer_writeback(eb); >> + >> + /* >> + * See if any other extent buffers exists within the >> + * page. >> + */ >> + bv_len -= eb->len; >> + BUG_ON(bv_len & (eb->len - 1)); >> + if (bv_len) { >> + eb = eb->next; >> + goto next_eb; >> + } >> + >> + /* Clear the page writeback flag. */ >> + eb = (struct extent_buffer *)page->private; >> + BUG_ON(!eb); /* Can this even happen? */ >> + do { >> + if (!eb) { >> + end_page_writeback(page); >> + break; >> + } >> + if (test_bit(EXTENT_BUFFER_WRITEBACK, >> + &eb->bflags)) >> + break; >> + eb = eb->next; >> + } while (1); >> + } >> } while (bvec >= bio->bi_io_vec); >> >> bio_put(bio); >> @@ -3202,7 +3254,8 @@ static int write_one_eb(struct extent_buffer *eb, >> struct extent_page_data *epd) >> { >> struct block_device *bdev = fs_info->fs_devices->latest_bdev; >> - u64 offset = eb->start; >> + u64 start = eb->start; >> + unsigned long offset = eb->start & (PAGE_CACHE_SIZE - 1); >> unsigned long i, num_pages; >> unsigned long bio_flags = 0; >> int rw = (epd->sync_io ? WRITE_SYNC : WRITE); >> @@ -3219,10 +3272,10 @@ static int write_one_eb(struct extent_buffer *eb, >> >> clear_page_dirty_for_io(p); >> set_page_writeback(p); >> - ret = submit_extent_page(rw, eb->tree, p, offset >> 9, >> - PAGE_CACHE_SIZE, 0, bdev, &epd->bio, >> - -1, end_bio_extent_buffer_writepage, >> - 0, epd->bio_flags, bio_flags); >> + ret = submit_extent_page(rw, eb->tree, p, start >> 9, eb->len, >> + offset, bdev, &epd->bio, -1, >> + end_bio_extent_buffer_writepage, 0, >> + epd->bio_flags, bio_flags); >> epd->bio_flags = bio_flags; >> if (ret) { >> set_bit(EXTENT_BUFFER_IOERR, &eb->bflags); >> @@ -3232,7 +3285,7 @@ static int write_one_eb(struct extent_buffer *eb, >> ret = -EIO; >> break; >> } >> - offset += PAGE_CACHE_SIZE; >> + start += PAGE_CACHE_SIZE; >> update_nr_written(p, wbc, 1); >> unlock_page(p); >> } >> @@ -3252,7 +3305,7 @@ int btree_write_cache_pages(struct address_space *mapping, >> { >> struct extent_io_tree *tree = &BTRFS_I(mapping->host)->io_tree; >> struct btrfs_fs_info *fs_info = BTRFS_I(mapping->host)->root->fs_info; >> - struct extent_buffer *eb, *prev_eb = NULL; >> + struct extent_buffer *eb, *next, *prev_eb = NULL; >> struct extent_page_data epd = { >> .bio = NULL, >> .tree = tree, >> @@ -3326,17 +3379,41 @@ retry: >> spin_unlock(&mapping->private_lock); >> continue; >> } >> + prev_eb = eb; >> + >> +next_eb: >> + next = eb->next; >> >> ret = atomic_inc_not_zero(&eb->refs); >> - spin_unlock(&mapping->private_lock); >> - if (!ret) >> - continue; >> + if (eb->len >= PAGE_SIZE) { >> + spin_unlock(&mapping->private_lock); >> + if (!ret) >> + continue; >> + } else { >> + if (!ret) >> + goto inc_eb; >> + spin_unlock(&mapping->private_lock); >> + >> + if (!test_bit(EXTENT_BUFFER_DIRTY, >> + &eb->bflags)) { >> + spin_lock(&mapping->private_lock); >> + atomic_dec(&eb->refs); >> + ret = 0; >> + goto inc_eb; >> + } >> + } >> >> - prev_eb = eb; >> ret = lock_extent_buffer_for_io(eb, fs_info, &epd); >> if (!ret) { >> + if (!(eb->len >= PAGE_SIZE)) >> + spin_lock(&mapping->private_lock); >> + >> free_extent_buffer(eb); >> - continue; >> + >> + if (eb->len >= PAGE_SIZE) >> + continue; >> + else >> + goto inc_eb; >> } >> >> ret = write_one_eb(eb, fs_info, wbc, &epd); >> @@ -3345,8 +3422,26 @@ retry: >> free_extent_buffer(eb); >> break; >> } >> + >> + if (eb->len >= PAGE_SIZE) { >> + free_extent_buffer(eb); >> + goto written; >> + } >> + >> + if (next) >> + spin_lock(&mapping->private_lock); >> free_extent_buffer(eb); >> >> +inc_eb: >> + if (!next) { >> + if (spin_is_locked(&mapping->private_lock)) >> + spin_unlock(&mapping->private_lock); >> + goto written; >> + } >> + eb = next; >> + goto next_eb; >> + >> +written: >> /* >> * the filesystem may choose to bump up nr_to_write. >> * We have to make sure to honor the new nr_to_write >> @@ -4000,6 +4095,18 @@ static void __free_extent_buffer(struct extent_buffer *eb) >> kmem_cache_free(extent_buffer_cache, eb); >> } >> >> +/* Helper function to free extent buffers when there are multiple >> + * extent buffers per page. */ >> +static void __free_extent_buffers(struct extent_buffer *eb) >> +{ >> + struct extent_buffer *next; >> + >> + do { >> + next = eb->next; >> + __free_extent_buffer(eb); >> + } while ((eb = next)); >> +} >> + >> static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, >> u64 start, >> unsigned long len, >> @@ -4017,6 +4124,7 @@ static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, >> eb->len = len; >> eb->tree = tree; >> eb->bflags = 0; >> + eb->next = NULL; >> rwlock_init(&eb->lock); >> atomic_set(&eb->write_locks, 0); >> atomic_set(&eb->read_locks, 0); >> @@ -4054,6 +4162,62 @@ static struct extent_buffer *__alloc_extent_buffer(struct extent_io_tree *tree, >> return eb; >> } >> >> +/* Allocates an array of extent buffers for the specified page. >> + * Should be called with the mapping''s spin lock set. */ >> +static struct extent_buffer *__alloc_extent_buffers(struct extent_io_tree *tree, >> + struct page *page, >> + gfp_t mask) >> +{ >> + u32 blocksize_bits; >> + struct btrfs_inode *inode; >> + struct extent_buffer *eb_head; >> + struct extent_buffer *eb_cur; >> + u64 start; >> + unsigned long len; >> + int i; >> + >> + /* Initialize variables. */ >> + inode = BTRFS_I(tree->mapping->host); >> + blocksize_bits = inode->vfs_inode.i_sb->s_blocksize_bits; >> + >> + /* Calculate extent buffer dimensions. */ >> + start = page->index << PAGE_CACHE_SHIFT; >> + len = inode->root->leafsize; >> + >> + /* Allocate the head extent buffer. */ >> + eb_head = __alloc_extent_buffer(tree, start, len, GFP_NOFS); >> + if (!eb_head) { >> + WARN_ON(1); >> + return NULL; >> + } >> + start += len; >> + eb_head->pages[0] = page; >> + eb_cur = eb_head; >> + >> + /* Allocate the other extent buffers. */ >> + for (i = 1; i < (PAGE_CACHE_SIZE >> blocksize_bits); i++) { >> + eb_cur->next = __alloc_extent_buffer(tree, start, len, >> + GFP_NOFS); >> + if (!eb_cur->next) { >> + WARN_ON(1); >> + goto free_ebs; >> + } >> + start += len; >> + eb_cur = eb_cur->next; >> + eb_cur->pages[0] = page; >> + } >> + >> + /* Return the extent buffer head. */ >> + return eb_head; >> + >> +free_ebs: >> + /* Free each extent buffer. */ >> + // TODO: Implement. >> + pr_crit(KERN_CRIT "HACK: Need to implement this...\n"); >> + WARN_ON(1); >> + return NULL; >> +} >> + >> struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src) >> { >> unsigned long i; >> @@ -4170,12 +4334,121 @@ static void btrfs_release_extent_buffer_page(struct extent_buffer *eb, >> } >> >> /* >> + * Frees the page if all extent buffers belonging to the page are not >> + * referernced. The extent buffers themselves must be free afterwards, too... >> + * ret: 0 if the page did not need to be freed; 1 if the page was freed. >> + */ >> +static int btrfs_release_extent_buffers_page(struct extent_buffer *eb, >> + struct extent_buffer **eb_head) >> +{ >> + struct extent_buffer *eb_cur; >> + struct extent_buffer *eb_temp; >> + struct page *page; >> + int mapped = !test_bit(EXTENT_BUFFER_DUMMY, &eb->bflags); >> + int ret = 0; >> + >> + if (extent_buffer_under_io(eb)) >> + BUG_ON(1); >> + >> + // ...is this even possible? >> + if (!num_extent_pages(eb->start, eb->len)) { >> + WARN_ON(1); >> + return ret; >> + } >> + >> + page = extent_buffer_page(eb, 0); >> + if (page && mapped) { >> + spin_lock(&page->mapping->private_lock); >> + /* >> + * We do this since we''ll remove the pages after we''ve >> + * removed the eb from the radix tree, so we could race >> + * and have this page now attached to the new eb. So >> + * only clear page_private if it''s still connected to >> + * this eb. >> + */ >> + if (!PagePrivate(page)) { >> + spin_unlock(&page->mapping->private_lock); >> + } else { >> + /* Find the page eb corresponding to our eb. */ >> + eb_cur = (struct extent_buffer *)page->private; >> + while (eb_cur->start != eb->start) { >> + eb_cur = eb_cur->next; >> + BUG_ON(!eb_cur); >> + } >> + >> + /* See if a new eb has been attached to the page. */ >> + if (eb_cur != eb) { >> + spin_unlock(&page->mapping->private_lock); >> + ret = 1; >> + goto page_release; >> + } >> + >> + /* See if any other extent_buffer is using the page. */ >> + eb_cur = (struct extent_buffer *)page->private; >> + do { >> + /* Check for any other references on the eb. */ >> + spin_lock(&eb_cur->refs_lock); >> + if (!atomic_dec_and_test(&eb_cur->refs)) { >> + atomic_inc(&eb_cur->refs); >> + spin_unlock(&eb_cur->refs_lock); >> + eb_temp = eb_cur; >> + eb_cur = (struct extent_buffer *) >> + page->private; >> + while (eb_cur != eb_temp) { >> + atomic_inc(&eb_cur->refs); >> + eb_cur = eb_cur->next; >> + } >> + spin_unlock( >> + &page->mapping->private_lock); >> + goto page_release; >> + } >> + spin_unlock(&eb_cur->refs_lock); >> + } while ((eb_cur = eb_cur->next) != NULL); >> + >> + /* Sanity checks. */ >> + eb_cur = (struct extent_buffer *)page->private; >> + do { >> + BUG_ON(extent_buffer_under_io(eb_cur)); >> + } while ((eb_cur = eb_cur->next) != NULL); >> + BUG_ON(PageDirty(page)); >> + BUG_ON(PageWriteback(page)); >> + /* >> + * We need to make sure we haven''t been attached >> + * to a new eb. >> + */ >> + eb_cur = (struct extent_buffer *)page->private; >> + *eb_head = eb_cur; >> + eb_temp = NULL; >> + ClearPagePrivate(page); >> + set_page_private(page, 0); >> + /* One for the page private. */ >> + page_cache_release(page); >> + ret = 1; >> + spin_unlock(&page->mapping->private_lock); >> + } >> + } >> + >> +page_release: >> + if (page) { >> + /* One for when we alloced the page */ >> + page_cache_release(page); >> + } >> + return ret; >> +} >> + >> +/* >> * Helper for releasing the extent buffer. >> */ >> static inline void btrfs_release_extent_buffer(struct extent_buffer *eb) >> { >> - btrfs_release_extent_buffer_page(eb, 0); >> - __free_extent_buffer(eb); >> + if (eb->len >= PAGE_SIZE) { >> + btrfs_release_extent_buffer_page(eb, 0); >> + __free_extent_buffer(eb); >> + } else { >> + struct extent_buffer *eb_head; >> + if (btrfs_release_extent_buffers_page(eb, &eb_head)) >> + __free_extent_buffers(eb_head); >> + } >> } >> >> static void check_buffer_tree_ref(struct extent_buffer *eb) >> @@ -4222,16 +4495,153 @@ static void mark_extent_buffer_accessed(struct extent_buffer *eb) >> struct extent_buffer *alloc_extent_buffer(struct extent_io_tree *tree, >> u64 start, unsigned long len) >> { >> - unsigned long num_pages = num_extent_pages(start, len); >> - unsigned long i; >> - unsigned long index = start >> PAGE_CACHE_SHIFT; >> + /* Allocate a new extent_buffer depending on blocksize*/ >> + if (len < PAGE_CACHE_SIZE) >> + return alloc_extent_buffer_multiple(tree, start, len); >> + return alloc_extent_buffer_single(tree, start, len); >> +} >> + >> +struct extent_buffer *alloc_extent_buffer_multiple(struct extent_io_tree *tree, >> + u64 start, >> + unsigned long len) { >> + >> + struct address_space *mapping; >> + u32 blocksize_bits; >> + struct btrfs_inode *btrfs_inode; >> + struct extent_buffer *eb_cur; >> + struct extent_buffer *eb_head; >> + struct extent_buffer *exists; >> + unsigned long index; >> + struct page *page; >> + int ret; >> + >> + /* Initialize variables. */ >> + btrfs_inode = BTRFS_I(tree->mapping->host); >> + blocksize_bits = btrfs_inode->vfs_inode.i_sb->s_blocksize_bits; >> + >> + /* Sanity checks. */ >> + WARN_ON(num_extent_pages(start, len) > 1); >> + >> + /* See if the extent_buffer already exists in the radix tree. */ >> + rcu_read_lock(); >> + eb_cur = radix_tree_lookup(&tree->buffer, start >> blocksize_bits); >> + if (eb_cur && atomic_inc_not_zero(&eb_cur->refs)) { >> + rcu_read_unlock(); >> + mark_extent_buffer_accessed(eb_cur); >> + return eb_cur; >> + } >> + rcu_read_unlock(); >> + >> + /* Find the page in the mapping. */ >> + index = start >> PAGE_CACHE_SHIFT; >> + mapping = tree->mapping; >> + page = find_or_create_page(mapping, index, GFP_NOFS); >> + if (!page) { >> + WARN_ON(1); >> + return NULL; >> + } >> + >> + /* Allocate each extent buffer for the page. */ >> + eb_head = __alloc_extent_buffers(tree, page, GFP_NOFS); >> + if (!eb_head) { >> + WARN_ON(1); >> + return NULL; >> + } >> + >> + /* See if extent buffers have already been allocated for >> + * this page. */ >> + spin_lock(&mapping->private_lock); >> + if (PagePrivate(page)) { >> + /* >> + * We could have already allocated an eb for this page >> + * and attached one so lets see if we can get a ref on >> + * the existing eb, and if we can we know it''s good and >> + * we can just return that one, else we know we can just >> + * overwrite page->private. >> + */ >> + eb_cur = (struct extent_buffer *)page->private; >> + while (eb_cur->start != start) { >> + eb_cur = eb_cur->next; >> + BUG_ON(!eb_cur); >> + } >> + check_buffer_tree_ref(eb_cur); >> + spin_unlock(&mapping->private_lock); >> + unlock_page(page); >> + mark_extent_buffer_accessed(eb_cur); >> + __free_extent_buffers(eb_head); >> + return eb_cur; >> + } >> + >> + /* Bind the extent buffer to the page. */ >> + attach_extent_buffer_page(eb_head, page); >> + spin_unlock(&mapping->private_lock); >> + WARN_ON(PageDirty(page)); >> + mark_page_accessed(page); >> + >> +again: >> + /* Set eb_cur to the buffer added. */ >> + eb_cur = eb_head; >> + while (start != eb_cur->start) { >> + eb_cur = eb_cur->next; >> + BUG_ON(!eb_cur); >> + } >> + >> + /* Preload the radix tree. */ >> + ret = radix_tree_preload(GFP_NOFS & ~__GFP_HIGHMEM); >> + if (ret) { >> + WARN_ON(1); >> + return NULL; >> + } >> + >> + /* Add the extent buffer to the radix tree. */ >> + spin_lock(&tree->buffer_lock); >> + ret = radix_tree_insert(&tree->buffer, >> + eb_cur->start >> blocksize_bits, >> + eb_cur); >> + if (ret == -EEXIST) { >> + exists = radix_tree_lookup(&tree->buffer, >> + eb_cur->start >> blocksize_bits); >> + if (exists->start != start) >> + BUG_ON(1); >> + if (!atomic_inc_not_zero(&exists->refs)) { >> + spin_unlock(&tree->buffer_lock); >> + radix_tree_preload_end(); >> + exists = NULL; >> + goto again; >> + } >> + spin_unlock(&tree->buffer_lock); >> + radix_tree_preload_end(); >> + mark_extent_buffer_accessed(exists); >> + WARN_ON(!atomic_dec_and_test(&eb_cur->refs)); >> + btrfs_release_extent_buffer(eb_cur); >> + return exists; >> + } >> + >> + /* Set the extent buffer''s tree-reference bits. */ >> + check_buffer_tree_ref(eb_cur); >> + spin_unlock(&tree->buffer_lock); >> + radix_tree_preload_end(); >> + >> + /* Not quite sure what this does. */ >> + SetPageChecked(eb_head->pages[0]); >> + unlock_page(eb_head->pages[0]); >> + >> + return eb_cur; >> +} >> + >> +struct extent_buffer *alloc_extent_buffer_single(struct extent_io_tree *tree, >> + u64 start, unsigned long len) { >> + struct address_space *mapping = tree->mapping; >> struct extent_buffer *eb; >> struct extent_buffer *exists = NULL; >> + unsigned long i; >> + unsigned long index = start >> PAGE_CACHE_SHIFT; >> + unsigned long num_pages = num_extent_pages(start, len); >> struct page *p; >> - struct address_space *mapping = tree->mapping; >> int uptodate = 1; >> int ret; >> >> + /* See if the extent_buffer already exists */ >> rcu_read_lock(); >> eb = radix_tree_lookup(&tree->buffer, start >> PAGE_CACHE_SHIFT); >> if (eb && atomic_inc_not_zero(&eb->refs)) { >> @@ -4350,9 +4760,17 @@ struct extent_buffer *find_extent_buffer(struct extent_io_tree *tree, >> u64 start, unsigned long len) >> { >> struct extent_buffer *eb; >> + struct btrfs_inode *btrfs_inode = BTRFS_I(tree->mapping->host); >> + u32 blocksize_bits = btrfs_inode->vfs_inode.i_sb->s_blocksize_bits; >> >> rcu_read_lock(); >> - eb = radix_tree_lookup(&tree->buffer, start >> PAGE_CACHE_SHIFT); >> + // This branch needs to be fixed when the allocation code is merged. >> + // Seriously. >> + if (blocksize_bits >= PAGE_CACHE_SHIFT) >> + eb = radix_tree_lookup(&tree->buffer, >> + start >> PAGE_CACHE_SHIFT); >> + else >> + eb = radix_tree_lookup(&tree->buffer, start >> blocksize_bits); >> if (eb && atomic_inc_not_zero(&eb->refs)) { >> rcu_read_unlock(); >> mark_extent_buffer_accessed(eb); >> @@ -4371,9 +4789,25 @@ static inline void btrfs_release_extent_buffer_rcu(struct rcu_head *head) >> __free_extent_buffer(eb); >> } >> >> -/* Expects to have eb->eb_lock already held */ >> +/* >> + * The RCU head must point to the first extent buffer belonging to a page. >> + */ >> +static inline void btrfs_release_extent_buffers_rcu(struct rcu_head *head) >> +{ >> + struct extent_buffer *eb >> + container_of(head, struct extent_buffer, rcu_head); >> + >> + do { >> + call_rcu(&eb->rcu_head, btrfs_release_extent_buffer_rcu); >> + } while ((eb = eb->next)); >> +} >> + >> +/* Expects to have eb->refs_lock already held */ >> static int release_extent_buffer(struct extent_buffer *eb, gfp_t mask) >> { >> + struct btrfs_inode *btrfs_inode = BTRFS_I(eb->tree->mapping->host); >> + u32 blocksize_bits = btrfs_inode->vfs_inode.i_sb->s_blocksize_bits; >> + >> WARN_ON(atomic_read(&eb->refs) == 0); >> if (atomic_dec_and_test(&eb->refs)) { >> if (test_bit(EXTENT_BUFFER_DUMMY, &eb->bflags)) { >> @@ -4381,17 +4815,35 @@ static int release_extent_buffer(struct extent_buffer *eb, gfp_t mask) >> } else { >> struct extent_io_tree *tree = eb->tree; >> >> + /* Dumb hack to make releasing the page easier. */ >> + if (eb->len < PAGE_SIZE) >> + atomic_inc(&eb->refs); >> + >> spin_unlock(&eb->refs_lock); >> >> + // This also needs to be fixed when allocation code is >> + // merged. >> spin_lock(&tree->buffer_lock); >> - radix_tree_delete(&tree->buffer, >> - eb->start >> PAGE_CACHE_SHIFT); >> + if (eb->len >= PAGE_SIZE) >> + radix_tree_delete(&tree->buffer, >> + eb->start >> blocksize_bits); >> + else >> + radix_tree_delete(&tree->buffer, >> + eb->start >> blocksize_bits); >> spin_unlock(&tree->buffer_lock); >> } >> >> /* Should be safe to release our pages at this point */ >> - btrfs_release_extent_buffer_page(eb, 0); >> - call_rcu(&eb->rcu_head, btrfs_release_extent_buffer_rcu); >> + if (eb->len >= PAGE_SIZE) { >> + btrfs_release_extent_buffer_page(eb, 0); >> + call_rcu(&eb->rcu_head, >> + btrfs_release_extent_buffer_rcu); >> + } else { >> + struct extent_buffer *eb_head; >> + if (btrfs_release_extent_buffers_page(eb, &eb_head)) >> + btrfs_release_extent_buffers_rcu( >> + &eb_head->rcu_head); >> + } >> return 1; >> } >> spin_unlock(&eb->refs_lock); >> @@ -4482,6 +4934,11 @@ int set_extent_buffer_dirty(struct extent_buffer *eb) >> >> for (i = 0; i < num_pages; i++) >> set_page_dirty(extent_buffer_page(eb, i)); >> + /* Run an additional sanity check here instead of >> + * in btree_set_page_dirty() since we can''t get the eb there for >> + * subpage blocksize. */ >> + if (eb->len < PAGE_SIZE) >> + btrfs_assert_tree_locked(eb); >> return was_dirty; >> } >> >> @@ -4503,11 +4960,14 @@ int clear_extent_buffer_uptodate(struct extent_buffer *eb) >> unsigned long num_pages; >> >> clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); >> - num_pages = num_extent_pages(eb->start, eb->len); >> - for (i = 0; i < num_pages; i++) { >> - page = extent_buffer_page(eb, i); >> - if (page) >> - ClearPageUptodate(page); >> + /* Ignore the page''s uptodate flag forsubpage blocksize. */ >> + if (eb->len >= PAGE_SIZE) { >> + num_pages = num_extent_pages(eb->start, eb->len); >> + for (i = 0; i < num_pages; i++) { >> + page = extent_buffer_page(eb, i); >> + if (page) >> + ClearPageUptodate(page); >> + } >> } >> return 0; >> } >> @@ -4518,11 +4978,16 @@ int set_extent_buffer_uptodate(struct extent_buffer *eb) >> struct page *page; >> unsigned long num_pages; >> >> + /* Set extent buffer up-to-date. */ >> set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); >> - num_pages = num_extent_pages(eb->start, eb->len); >> - for (i = 0; i < num_pages; i++) { >> - page = extent_buffer_page(eb, i); >> - SetPageUptodate(page); >> + >> + /* Set pages up-to-date. */ >> + if (eb->len >= PAGE_CACHE_SIZE) { >> + num_pages = num_extent_pages(eb->start, eb->len); >> + for (i = 0; i < num_pages; i++) { >> + page = extent_buffer_page(eb, i); >> + SetPageUptodate(page); >> + } >> } >> return 0; >> } >> @@ -4606,7 +5071,7 @@ int read_extent_buffer_pages(struct extent_io_tree *tree, >> } >> } >> if (all_uptodate) { >> - if (start_i == 0) >> + if (start_i == 0 && eb->len >= PAGE_SIZE) >> set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags); >> goto unlock_exit; >> } >> @@ -4693,7 +5158,7 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, >> unsigned long *map_start, >> unsigned long *map_len) >> { >> - size_t offset = start & (PAGE_CACHE_SIZE - 1); >> + size_t offset; >> char *kaddr; >> struct page *p; >> size_t start_offset = eb->start & ((u64)PAGE_CACHE_SIZE - 1); >> @@ -4709,6 +5174,9 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, >> *map_start = 0; >> } else { >> offset = 0; >> + // I''m pretty sure that this is a) just plain wrong and >> + // b) will never realistically execute; not entirely sure, >> + // though... >> *map_start = ((u64)i << PAGE_CACHE_SHIFT) - start_offset; >> } >> >> @@ -4722,7 +5190,7 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, >> p = extent_buffer_page(eb, i); >> kaddr = page_address(p); >> *map = kaddr + offset; >> - *map_len = PAGE_CACHE_SIZE - offset; >> + *map_len = (PAGE_CACHE_SIZE - offset) & (eb->len - 1); >> return 0; >> } >> >> @@ -4996,6 +5464,7 @@ void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset, >> int try_release_extent_buffer(struct page *page, gfp_t mask) >> { >> struct extent_buffer *eb; >> + int ret; >> >> /* >> * We need to make sure noboody is attaching this page to an eb right >> @@ -5010,30 +5479,61 @@ int try_release_extent_buffer(struct page *page, gfp_t mask) >> eb = (struct extent_buffer *)page->private; >> BUG_ON(!eb); >> >> - /* >> - * This is a little awful but should be ok, we need to make sure that >> - * the eb doesn''t disappear out from under us while we''re looking at >> - * this page. >> - */ >> - spin_lock(&eb->refs_lock); >> - if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb)) { >> - spin_unlock(&eb->refs_lock); >> + if (eb->len >= PAGE_SIZE) { >> + /* >> + * This is a little awful but should be ok, we need to make >> + * sure that the eb doesn''t disappear out from under us while >> + * we''re looking at this page. >> + */ >> + spin_lock(&eb->refs_lock); >> + if (atomic_read(&eb->refs) != 1 || extent_buffer_under_io(eb)) { >> + spin_unlock(&eb->refs_lock); >> + spin_unlock(&page->mapping->private_lock); >> + return 0; >> + } >> spin_unlock(&page->mapping->private_lock); >> - return 0; >> - } >> - spin_unlock(&page->mapping->private_lock); >> >> - if ((mask & GFP_NOFS) == GFP_NOFS) >> - mask = GFP_NOFS; >> + if ((mask & GFP_NOFS) == GFP_NOFS) >> + mask = GFP_NOFS; >> >> - /* >> - * If tree ref isn''t set then we know the ref on this eb is a real ref, >> - * so just return, this page will likely be freed soon anyway. >> - */ >> - if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) { >> - spin_unlock(&eb->refs_lock); >> - return 0; >> - } >> + /* >> + * If tree ref isn''t set then we know the ref on this eb is a >> + * real ref, so just return, this page will likely be freed >> + * soon anyway. >> + */ >> + if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags)) { >> + spin_unlock(&eb->refs_lock); >> + return 0; >> + } >> >> - return release_extent_buffer(eb, mask); >> + return release_extent_buffer(eb, mask); >> + } else { >> + ret = 0; >> + do { >> + spin_lock(&eb->refs_lock); >> + if (atomic_read(&eb->refs) != 1 || >> + extent_buffer_under_io(eb)) { >> + spin_unlock(&eb->refs_lock); >> + continue; >> + } >> + spin_unlock(&page->mapping->private_lock); >> + >> + if ((mask & GFP_NOFS) == GFP_NOFS) >> + mask = GFP_NOFS; >> + >> + if (!test_and_clear_bit(EXTENT_BUFFER_TREE_REF, >> + &eb->bflags)) { >> + spin_unlock(&eb->refs_lock); >> + spin_lock(&page->mapping->private_lock); >> + continue; >> + } >> + >> + /* No idea what to do with the ''ret'' here. */ >> + ret |= release_extent_buffer(eb, mask); >> + >> + spin_lock(&page->mapping->private_lock); >> + } while ((eb = eb->next) != NULL); >> + spin_unlock(&page->mapping->private_lock); >> + return ret; >> + } >> } >> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h >> index 2eacfab..955ef5e 100644 >> --- a/fs/btrfs/extent_io.h >> +++ b/fs/btrfs/extent_io.h >> @@ -163,6 +163,9 @@ struct extent_buffer { >> wait_queue_head_t lock_wq; >> struct page *inline_pages[INLINE_EXTENT_BUFFER_PAGES]; >> struct page **pages; >> + >> + /* Acyclic linked list of extent_buffers belonging to a single page. */ >> + struct extent_buffer *next; >> }; >> >> static inline void extent_set_compress_type(unsigned long *bio_flags, >> @@ -270,6 +273,10 @@ void set_page_extent_mapped(struct page *page); >> >> struct extent_buffer *alloc_extent_buffer(struct extent_io_tree *tree, >> u64 start, unsigned long len); >> +struct extent_buffer *alloc_extent_buffer_single(struct extent_io_tree *tree, >> + u64 start, unsigned long len); >> +struct extent_buffer *alloc_extent_buffer_multiple(struct extent_io_tree *tree, >> + u64 start, unsigned long len); >> struct extent_buffer *alloc_dummy_extent_buffer(u64 start, unsigned long len); >> struct extent_buffer *btrfs_clone_extent_buffer(struct extent_buffer *src); >> struct extent_buffer *find_extent_buffer(struct extent_io_tree *tree, >> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >> index 3bff4d4..8745289 100644 >> --- a/fs/btrfs/file.c >> +++ b/fs/btrfs/file.c >> @@ -1340,7 +1340,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, >> } >> >> ret = btrfs_delalloc_reserve_space(inode, >> - num_pages << PAGE_CACHE_SHIFT); >> + write_bytes); >> if (ret) >> break; >> >> @@ -1354,7 +1354,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, >> force_page_uptodate); >> if (ret) { >> btrfs_delalloc_release_space(inode, >> - num_pages << PAGE_CACHE_SHIFT); >> + write_bytes); >> break; >> } >> >> @@ -1392,8 +1392,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, >> spin_unlock(&BTRFS_I(inode)->lock); >> } >> btrfs_delalloc_release_space(inode, >> - (num_pages - dirty_pages) << >> - PAGE_CACHE_SHIFT); >> + write_bytes - copied); >> } >> >> if (copied > 0) { >> @@ -1402,7 +1401,7 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, >> NULL); >> if (ret) { >> btrfs_delalloc_release_space(inode, >> - dirty_pages << PAGE_CACHE_SHIFT); >> + copied); >> btrfs_drop_pages(pages, num_pages); >> break; >> } >> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c >> index 59ea2e4..1c0e254 100644 >> --- a/fs/btrfs/free-space-cache.c >> +++ b/fs/btrfs/free-space-cache.c >> @@ -960,6 +960,8 @@ int __btrfs_write_out_cache(struct btrfs_root *root, struct inode *inode, >> >> if (block_group) >> start = block_group->key.objectid; >> + else // Hmm I don''t recall putting this here. >> + start = (u64)-1; >> >> while (block_group && (start < block_group->key.objectid + >> block_group->key.offset)) { >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index 3368c10..11ff3dd 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -2040,22 +2040,38 @@ static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end, >> struct btrfs_root *root = BTRFS_I(inode)->root; >> struct btrfs_ordered_extent *ordered_extent = NULL; >> struct btrfs_workers *workers; >> + u64 block_size = 1 << inode->i_blkbits; >> + u64 io_size; >> + >> + if (block_size >= PAGE_CACHE_SIZE) >> + io_size = end - start + 1; >> + else >> + io_size = block_size; >> >> trace_btrfs_writepage_end_io_hook(page, start, end, uptodate); >> >> ClearPagePrivate2(page); >> - if (!btrfs_dec_test_ordered_pending(inode, &ordered_extent, start, >> - end - start + 1, uptodate)) >> - return 0; >> - >> - ordered_extent->work.func = finish_ordered_fn; >> - ordered_extent->work.flags = 0; >> - >> - if (btrfs_is_free_space_inode(inode)) >> - workers = &root->fs_info->endio_freespace_worker; >> - else >> - workers = &root->fs_info->endio_write_workers; >> - btrfs_queue_worker(workers, &ordered_extent->work); >> +next_block: >> + if (btrfs_dec_test_ordered_pending(inode, &ordered_extent, start, >> + io_size, uptodate)) { >> + ordered_extent->work.func = finish_ordered_fn; >> + ordered_extent->work.flags = 0; >> + >> + if (btrfs_is_free_space_inode(inode)) >> + workers = &root->fs_info->endio_freespace_worker; >> + else >> + workers = &root->fs_info->endio_write_workers; >> + btrfs_queue_worker(workers, &ordered_extent->work); >> + } >> + >> + // I think that writes are always block-size granularity. >> + if (block_size < PAGE_CACHE_SIZE) >> + BUG_ON(start & (io_size - 1)); // Welp, one way to make sure... >> + start += io_size; >> + if (start < end) >> + goto next_block; >> + // We overshot. I''m pretty sure that this is terrible. >> + BUG_ON(start != (end + 1)); >> >> return 0; >> } >> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c >> index 657d83c..c0269df 100644 >> --- a/fs/btrfs/ioctl.c >> +++ b/fs/btrfs/ioctl.c >> @@ -3937,8 +3937,8 @@ long btrfs_ioctl(struct file *file, unsigned int >> return btrfs_ioctl_qgroup_create(file, argp); >> case BTRFS_IOC_QGROUP_LIMIT: >> return btrfs_ioctl_qgroup_limit(file, argp); >> - case BTRFS_IOC_DEV_REPLACE: >> - return btrfs_ioctl_dev_replace(root, argp); >> + //case BTRFS_IOC_DEV_REPLACE: >> +// return btrfs_ioctl_dev_replace(root, argp); >> } >> >> return -ENOTTY; >> -- >> 1.7.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/18/2012 12:49 AM, Miao Xie wrote:> On tue, 18 Dec 2012 15:30:51 +0800, Liu Bo wrote: >> On Mon, Dec 17, 2012 at 11:13:25PM -0800, clinew@linux.vnet.ibm.com wrote: >>> From: Wade Cline<clinew@linux.vnet.ibm.com> >>> >>> v1 -> v2: >>> - Added Signed-off-by tag (it''s kind of important). >>> >>> This patch is only an RFC. My internship is ending and I was hoping >>> to get some feedback and incorporate any suggestions people may >>> have before my internship ends along with life as we know it (this >>> Friday). >>> >>> The filesystem should mount/umount properly but tends towards the >>> explosive side when writes start happening. My current focus is on >>> checksumming issues and also an error when releasing extent buffers >>> when creating a large file with ''dd''... and probably any other >>> method. There''s still a significant amount of work that needs to be >>> done before this should be incorporated into mainline. >>> >>> A couple of notes: >>> - Based off of Josef''s btrfs-next branch, commit >>> 8d089a86e45b34d7bc534d955e9d8543609f7e42 >>> - C99-style comments are "meta-comments" where I''d like more >>> feedback; they aren''t permanent but make ''checkpatch'' moan. >>> - extent_buffer allocation and freeing need their code paths >>> merged; they''re currently in separate functions and are both >>> very ugly. >>> - The patch itself will eventually need to be broken down >>> into smaller pieces if at all possible... >> >> Could you please first elaborate why we need this subpagesize stuff and >> any user case in this patch''s commit log? >> Or Am I missing something? > > It is used on the machines on which the page size is larger than 4KB (Such as powerpc) > > Thanks > MiaoYeah. Basically, if we create a btrfs filesystem with a 4k blocksize then that filesystem is incompatible with architectures such as PowerPC and MIPS which have a page size larger than 4k. -Wade -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Samuel
2012-Dec-18 23:01 UTC
Re: [PATCH] [RFC v2] Btrfs: Subpagesize blocksize (WIP).
On 19/12/12 09:26, Wade Cline wrote:> Yeah. Basically, if we create a btrfs filesystem with a 4k blocksize > then that filesystem is incompatible with architectures such as PowerPC > and MIPS which have a page size larger than 4k.What happens currently? Does the btrfs code detect the mismatch and refuse to mount, or does it all go horribly wrong? cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/18/2012 03:01 PM, Chris Samuel wrote:> On 19/12/12 09:26, Wade Cline wrote: > >> Yeah. Basically, if we create a btrfs filesystem with a 4k blocksize >> then that filesystem is incompatible with architectures such as PowerPC >> and MIPS which have a page size larger than 4k. > What happens currently? Does the btrfs code detect the mismatch and > refuse to mount, or does it all go horribly wrong? > > cheers, > ChrisI recall hacking the mkfs.btrfs tool, testing it, and finding that the filesystem wouldn''t mount. I haven''t created a non-hacked filesystem on x86 and ported it to PPC verbatim yet, but the same should happen there; it shouldn''t crash the kernel. -Wade -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Dec 18, 2012 at 02:26:50PM -0800, Wade Cline wrote:> On 12/18/2012 12:49 AM, Miao Xie wrote: > > >On tue, 18 Dec 2012 15:30:51 +0800, Liu Bo wrote: > >>On Mon, Dec 17, 2012 at 11:13:25PM -0800, clinew@linux.vnet.ibm.com wrote: > >>>From: Wade Cline<clinew@linux.vnet.ibm.com> > >>> > >>>v1 -> v2: > >>>- Added Signed-off-by tag (it''s kind of important). > >>> > >>>This patch is only an RFC. My internship is ending and I was hoping > >>>to get some feedback and incorporate any suggestions people may > >>>have before my internship ends along with life as we know it (this > >>>Friday). > >>> > >>>The filesystem should mount/umount properly but tends towards the > >>>explosive side when writes start happening. My current focus is on > >>>checksumming issues and also an error when releasing extent buffers > >>>when creating a large file with ''dd''... and probably any other > >>>method. There''s still a significant amount of work that needs to be > >>>done before this should be incorporated into mainline. > >>> > >>>A couple of notes: > >>> - Based off of Josef''s btrfs-next branch, commit > >>> 8d089a86e45b34d7bc534d955e9d8543609f7e42 > >>> - C99-style comments are "meta-comments" where I''d like more > >>> feedback; they aren''t permanent but make ''checkpatch'' moan. > >>> - extent_buffer allocation and freeing need their code paths > >>> merged; they''re currently in separate functions and are both > >>> very ugly. > >>> - The patch itself will eventually need to be broken down > >>> into smaller pieces if at all possible... > >> > >>Could you please first elaborate why we need this subpagesize stuff and > >>any user case in this patch''s commit log? > >>Or Am I missing something? > > > >It is used on the machines on which the page size is larger than 4KB (Such as powerpc) > > > >Thanks > >Miao > > Yeah. Basically, if we create a btrfs filesystem with a 4k blocksize > then that filesystem is incompatible with architectures such as PowerPC > and MIPS which have a page size larger than 4k. > > -Wade >I''m just saying there _should_ be some kind of such description about the patch in your commit log... That''s for those who don''t ever know the background of the idea. thanks, liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 12/18/2012 06:02 PM, Liu Bo wrote:> On Tue, Dec 18, 2012 at 02:26:50PM -0800, Wade Cline wrote: >> On 12/18/2012 12:49 AM, Miao Xie wrote: >> >>> On tue, 18 Dec 2012 15:30:51 +0800, Liu Bo wrote: >>>> On Mon, Dec 17, 2012 at 11:13:25PM -0800, clinew@linux.vnet.ibm.com wrote: >>>>> From: Wade Cline<clinew@linux.vnet.ibm.com> >>>>> >>>>> v1 -> v2: >>>>> - Added Signed-off-by tag (it''s kind of important). >>>>> >>>>> This patch is only an RFC. My internship is ending and I was hoping >>>>> to get some feedback and incorporate any suggestions people may >>>>> have before my internship ends along with life as we know it (this >>>>> Friday). >>>>> >>>>> The filesystem should mount/umount properly but tends towards the >>>>> explosive side when writes start happening. My current focus is on >>>>> checksumming issues and also an error when releasing extent buffers >>>>> when creating a large file with ''dd''... and probably any other >>>>> method. There''s still a significant amount of work that needs to be >>>>> done before this should be incorporated into mainline. >>>>> >>>>> A couple of notes: >>>>> - Based off of Josef''s btrfs-next branch, commit >>>>> 8d089a86e45b34d7bc534d955e9d8543609f7e42 >>>>> - C99-style comments are "meta-comments" where I''d like more >>>>> feedback; they aren''t permanent but make ''checkpatch'' moan. >>>>> - extent_buffer allocation and freeing need their code paths >>>>> merged; they''re currently in separate functions and are both >>>>> very ugly. >>>>> - The patch itself will eventually need to be broken down >>>>> into smaller pieces if at all possible... >>>> >>>> Could you please first elaborate why we need this subpagesize stuff and >>>> any user case in this patch''s commit log? >>>> Or Am I missing something? >>> >>> It is used on the machines on which the page size is larger than 4KB (Such as powerpc) >>> >>> Thanks >>> Miao >> >> Yeah. Basically, if we create a btrfs filesystem with a 4k blocksize >> then that filesystem is incompatible with architectures such as PowerPC >> and MIPS which have a page size larger than 4k. >> >> -Wade >> > > I''m just saying there _should_ be some kind of such description about > the patch in your commit log... > > That''s for those who don''t ever know the background of the idea. > > thanks, > liubo >Okay, I''ll make sure to add that description next time I send the patch out. Thanks, Wade -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html