Hello everyone,
I''ve finally pushed out the bulk of my work to increase concurrency in
btree operations. It is a large change, and some parts of the FS are
not yet updated. In particular, you''ll get an oops, corruption or
other
problems if you:
* Add/remove devices online (create a multi-device FS at mkfs time works
fine)
* Defrag the FS
* Run btrfs-vol -b
I''ll clean these up tomorrow. In general, the code is fairly stable,
but I expect to be tracking strange oopsen and other corruptions for a
while.
In order to get this stabilized and out the door more quickly, I put a
single mutex around the extent allocation operations. Since this is a
COW FS, the extent allocation tree performance factors in heavily to all
operations, but I still get a big benefit from the current code.
Locking is done in a top down fashion, and most of the heavy lifting is
done inside btrfs_search_slot. The basic algorithm is to:
take a lock on a block at a specific level of the btree
search for a key
balance as required for insertion or deletion
unlock any blocks at levels higher up in the tree
if leaf: return
if node: read lower level
The end result is that searching through the btree usually only has two
blocks locked at a time, (one parent node and one child) and extra steps
are taken to drop locks before doing IO (except when balancing). By the
time that btrfs_search_slot returns, the only lock held is usually on
the leaf corresponding to the search key.
The locks are freed when a path is released (via btrfs_release_path or
btrfs_free_path).
This means a few rules of the FS have changed.
1) It is no longer valid to hold two paths on the same tree root at the
same time. Searching for the second path will probably deadlock against
the first.
2) The only components of the path that can be trusted are those that
are locked. struct btrfs_path has a locks array that indicates which
levels of the path are locked. In some cases such as changing the first
or last key in a block, the code needs to maintain locks on higher
levels of the tree after btrfs_search_slot returns so that higher nodes
can be modified to reflect changes.
All of this is a modified form of Ohad Rodeh''s locking ideas that he
presented along with reference counted snapshot design a few years ago.
I''m just using the page lock right now, which is quick and easy to code
but will need to be replaced with a different locking structure later
on.
I''ll spend the rest of the week writing up some locking docs and
finishing off the bits that are still broken before wandering off on
vacation next week. Happy testing ;)
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html