Chris Mason
2008-Jan-21 16:47 UTC
[Btrfs-devel] Initial Planning document for multiple device support
Hello everyone, I've spent some time outlining the support for multiple devices, and here is my basic plan so far. Any comments are welcome: http://oss.oracle.com/projects/btrfs/dist/documentation/btrfs-volumes.html -chris
sftf
2008-Jan-22 00:56 UTC
[Btrfs-devel] Initial Planning document for multiple device support
CM> http://oss.oracle.com/projects/btrfs/dist/documentation/btrfs-volumes.html Allocation Record is similar to LVM's physical extent (PE) Storage Chunk is similar to LVM's logical extent (LE) Storage Chunk1+Storage Chunk2+Storage Chunk3+... is similar to LVM's volume group (VG) And filesystem use aggregation of Storage Chunks directly to store metadata and data on them. I'm right?
Chris Mason
2008-Jan-23 04:52 UTC
[Btrfs-devel] Re: Initial Planning document for multiple device support
On Wednesday 23 January 2008, Andi Kleen wrote:> Chris Mason <chris.mason@oracle.com> writes: > > Just commenting on something that tripped me while reading > the document. > > >If Btrfs were to rely on device mapper or MD for mirroring, it would > >not be able to resolve checksum failures by checking the mirrored > >copy. The lower layers don't know the checksum or granularity of the > >filesystem blocks, and so they are not able to verify the data they > >return. > > I cannot imagine it would be that difficult to add a new READ_OTHER_COPY > io operation that would cause MD/LVM/... to return the other copy > in a mirror set.This is something SGI recently proposed, and it is a very good idea I think. It also makes sense for hooks between MD and the FS to figure out which blocks are in use during a rebuild, and for the FS to tell LVM when blocks are freed to help make snapshots more efficient.> > Even without btrfs that might be even generally useful for other > applications that do some checking on their files. > > e.g. I could well imagine a new system call to trigger this on the > page cache level. > > There might be other reasons to reinvent another storage manager > of course. Just that one above doesn't seem to be very convincing. > I admit I haven't thought too deeply about the other issues you > raise in the document.The key problem that requires most of this infrastructure is mirroring metadata on a single spindle. Chunks aren't required to solve it, but they do add flexibility to do lots of other things. For example, relocating hot blocks on to the SSD portion of a combined SSD/spindle drive, or writing to the SSD when on battery and then transferring in bulk to the spindle. The chunk code is basically a storage layer with three or four hooks into the FS. Once I have it working, I'll take a hard look at pushing it down into DM where it can be used for other things. -chris
Andi Kleen
2008-Jan-30 05:10 UTC
[Btrfs-devel] Re: Initial Planning document for multiple device support
Chris Mason <chris.mason@oracle.com> writes: Just commenting on something that tripped me while reading the document.>If Btrfs were to rely on device mapper or MD for mirroring, it would >not be able to resolve checksum failures by checking the mirrored >copy. The lower layers don't know the checksum or granularity of the >filesystem blocks, and so they are not able to verify the data they >return.I cannot imagine it would be that difficult to add a new READ_OTHER_COPY io operation that would cause MD/LVM/... to return the other copy in a mirror set. Even without btrfs that might be even generally useful for other applications that do some checking on their files. e.g. I could well imagine a new system call to trigger this on the page cache level. There might be other reasons to reinvent another storage manager of course. Just that one above doesn't seem to be very convincing. I admit I haven't thought too deeply about the other issues you raise in the document. -Andi