I am trying to understand how exactly the file extent back references work in btrfs. Can please someone tell me if the following is correct? - The back references are accumulated in an in-memory balanced tree (delayed-ref.c and delayed-ref.h) and pushed to disk during the transaction commit (a part of a checkpoint). They are placed into the B-tree under the key (bytenr, BTRFS_EXTENT_REF_KEY, hash of the four fields of the record), so that they are stored next to the file extent forward references. I am also wondering about the implications of copy on write: Imagine that you have an inode with four file extents and thus also four back references. COW of one of the extents then causes the COW of the inode. The new version of the inode has a different transaction ID, which is also one of the fields of back reference records. This causes the file system to add four new back reference records - one for the modified extent and three for the unmodified ones (since the transaction ID field has to be updated). Does this really happen, or is there some scheme to avoid adding these extra records? Thank you, Peter Macko -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2009/9/6 Peter Macko <pmacko@eecs.harvard.edu>:> I am trying to understand how exactly the file extent back references work > in btrfs. Can please someone tell me if the following is correct? - The back > references are accumulated in an in-memory balanced tree (delayed-ref.c and > delayed-ref.h) and pushed to disk during the transaction commit (a part of a > checkpoint). They are placed into the B-tree under the key (bytenr, > BTRFS_EXTENT_REF_KEY, hash of the four fields of the record), so that they > are stored next to the file extent forward references. >This was correct for btrfs in 2.6.30 and earlier version. We introduced a new back references format in 2.6.31. For more information about the new format, please read the comments in extent-tree.c> I am also wondering about the implications of copy on write: Imagine that > you have an inode with four file extents and thus also four back references. > COW of one of the extents then causes the COW of the inode. The new version > of the inode has a different transaction ID, which is also one of the fields > of back reference records. This causes the file system to add four new back > reference records - one for the modified extent and three for the unmodified > ones (since the transaction ID field has to be updated). Does this really > happen, or is there some scheme to avoid adding these extra records? >It''s avoid by using the new back references format. Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks! I have a follow up question: Are back references reference counted? If so, this should mean that after the file system COWs an inode, it must increase the reference counts of its file extent back references. Do we know what is the overhead? In the case they are not reference counted, how does the system know when to drop the reference? What are the bookend extents? Is the number of bookend requests in the fourth field of a file extent back reference the number of times the extent occurs within the file? Thank you, Peter Macko Yan, Zheng wrote:> 2009/9/6 Peter Macko <pmacko@eecs.harvard.edu>: > >> I am trying to understand how exactly the file extent back references work >> in btrfs. Can please someone tell me if the following is correct? - The back >> references are accumulated in an in-memory balanced tree (delayed-ref.c and >> delayed-ref.h) and pushed to disk during the transaction commit (a part of a >> checkpoint). They are placed into the B-tree under the key (bytenr, >> BTRFS_EXTENT_REF_KEY, hash of the four fields of the record), so that they >> are stored next to the file extent forward references. >> >> > This was correct for btrfs in 2.6.30 and earlier version. We introduced a new > back references format in 2.6.31. For more information about the new format, > please read the comments in extent-tree.c > > >> I am also wondering about the implications of copy on write: Imagine that >> you have an inode with four file extents and thus also four back references. >> COW of one of the extents then causes the COW of the inode. The new version >> of the inode has a different transaction ID, which is also one of the fields >> of back reference records. This causes the file system to add four new back >> reference records - one for the modified extent and three for the unmodified >> ones (since the transaction ID field has to be updated). Does this really >> happen, or is there some scheme to avoid adding these extra records? >> >> > It''s avoid by using the new back references format. > > Yan, Zheng >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 08, 2009 at 05:40:07PM -0400, Peter Macko wrote:> Thanks! I have a follow up question: Are back references reference > counted? If so, this should mean that after the file system COWs an > inode, it must increase the reference counts of its file extent back > references. Do we know what is the overhead? In the case they are > not reference counted, how does the system know when to drop the > reference?The reference counts live in a tree that is maintained via cow but not reference counted.> > What are the bookend extents? Is the number of bookend requests in > the fourth field of a file extent back reference the number of > times the extent occurs within the file?bookends are how we do cow with large extents without needing to read in the entire large extent. Picture a large 128MB extent where you want to overwrite 4K in the middle. What we do is create two pointers to the original extent, and then make a new extent for the new 4K mod. Our pointers end up like this: [ old extent part 1 ] [ new 4k extent ] [ old extent part 2 ] A future mod will be to split and modify the old extent when we know there aren''t any other reference holders on it. The bookend system assumes that a given extent is in use by multiple snapshots, where we aren''t allowed to change the actual extent records because it is in use in other places. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html