Hi, A couple of questions. 1. Does btrfs currently have anything like raid 5 or 6? 2. One guy on my LUG''s mailing list is really excited about the potential for setting redundancy on a per-file basis. I.e. /home/eric/criticalfile gets mirrored across all of the drives in the filesystem but /home/eric/temporaryfile gets striped. I''m skeptical. Is it a good idea to allow people/programs to do this? Cheers, Eric
On Sat, 2008-09-06 at 23:43 -0600, Eric Anopolsky wrote:> Hi, > > A couple of questions. > > 1. Does btrfs currently have anything like raid 5 or 6? >Not yet, it might one day.> 2. One guy on my LUG''s mailing list is really excited about the > potential for setting redundancy on a per-file basis. > I.e. /home/eric/criticalfile gets mirrored across all of the drives in > the filesystem but /home/eric/temporaryfile gets striped. I''m skeptical. > Is it a good idea to allow people/programs to do this?In general, yes. Some files or directories are crucial, and some (swap for example) don''t need to survive a crash. But, I think the flexibility should go a little further. The goal is to be able to define drive groups and tie files or directory trees to the drive groups. That way you can say these files go to the fastest drives and these files go to some other drive type, etc etc. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2008-09-08 at 10:47 -0400, Chris Mason wrote:> On Sat, 2008-09-06 at 23:43 -0600, Eric Anopolsky wrote: > > Hi, > > > > A couple of questions. > > > > 1. Does btrfs currently have anything like raid 5 or 6? > > > > Not yet, it might one day. > > > 2. One guy on my LUG''s mailing list is really excited about the > > potential for setting redundancy on a per-file basis. > > I.e. /home/eric/criticalfile gets mirrored across all of the drives in > > the filesystem but /home/eric/temporaryfile gets striped. I''m skeptical. > > Is it a good idea to allow people/programs to do this? > > In general, yes. Some files or directories are crucial, and some (swap > for example) don''t need to survive a crash.If a disk dies in a redundant configuration, I''d like to be able to hot replace the failed disk and keep going without any interruption. So losing parts of the paging file would be pretty bad in that case. Isn''t a partially failed array (where some files are accessible and others are not, without any additional filesystem damage) a weird failure mode? Do people know how to deal with this? Do applications know how to deal with this? What kind of file would be important enough to keep around but unimportant enough that it could be lost at any time while the system is up without anyone knowing or caring?> But, I think the flexibility should go a little further. The goal is to > be able to define drive groups and tie files or directory trees to the > drive groups. That way you can say these files go to the fastest drives > and these files go to some other drive type, etc etc.Let''s say I have 4 100GB drives (2 fast ones and 2 slow ones). I''ve restricted a performance critical directory to the two fastest drives, currently totaling 100GB of performance critical data. The rest of the data on the system is striped. How much free space do I have on the filesystem? 100GB (the amount of data I can store in the performance critical directory)? 200GB (the amount of data I can store outside the performance critical directory if the striping is guaranteed)? 300GB (the amount of data I can store outside the performance critical directory if the striping is best effort)? I''m open to being convinced otherwise, but I think issues like this would crop up any time the filesystem is artificially prevented from load balancing the data across the drives. Cheers, Eric
On Mon, 2008-09-08 at 18:46 -0600, Eric Anopolsky wrote:> On Mon, 2008-09-08 at 10:47 -0400, Chris Mason wrote: > > On Sat, 2008-09-06 at 23:43 -0600, Eric Anopolsky wrote: > > > Hi, > > > > > > A couple of questions. > > > > > > 1. Does btrfs currently have anything like raid 5 or 6? > > > > > > > Not yet, it might one day. > > > > > 2. One guy on my LUG''s mailing list is really excited about the > > > potential for setting redundancy on a per-file basis. > > > I.e. /home/eric/criticalfile gets mirrored across all of the drives in > > > the filesystem but /home/eric/temporaryfile gets striped. I''m skeptical. > > > Is it a good idea to allow people/programs to do this? > > > > In general, yes. Some files or directories are crucial, and some (swap > > for example) don''t need to survive a crash. > > If a disk dies in a redundant configuration, I''d like to be able to hot > replace the failed disk and keep going without any interruption. So > losing parts of the paging file would be pretty bad in that case. > > Isn''t a partially failed array (where some files are accessible and > others are not, without any additional filesystem damage) a weird > failure mode? Do people know how to deal with this? Do applications know > how to deal with this? >These configurations are not new. Admins create different filesystems on different storage all the time. From an admin point of view, one file on thier box isn''t accessible and they want to carry on. The fact that it is one file in a single FS or one file among dozens of filesystems doesn''t change things.> What kind of file would be important enough to keep around but > unimportant enough that it could be lost at any time while the system is > up without anyone knowing or caring? > > > But, I think the flexibility should go a little further. The goal is to > > be able to define drive groups and tie files or directory trees to the > > drive groups. That way you can say these files go to the fastest drives > > and these files go to some other drive type, etc etc. > > Let''s say I have 4 100GB drives (2 fast ones and 2 slow ones). I''ve > restricted a performance critical directory to the two fastest drives, > currently totaling 100GB of performance critical data. The rest of the > data on the system is striped. > > How much free space do I have on the filesystem? 100GB (the amount of > data I can store in the performance critical directory)? 200GB (the > amount of data I can store outside the performance critical directory if > the striping is guaranteed)? 300GB (the amount of data I can store > outside the performance critical directory if the striping is best > effort)? >People already create these configurations, they just do it with multiple filesystems. And, when they want to resize the performance critical section, it is a difficult (and often slow) operation. More flexibility in managing storage is the end goal for btrfs, and we''re just barely getting to the point where we can start addressing these difficult issues. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Replying to Chris Mason: I''d like to step into this thread because it''s relevant to my interests.> > Let''s say I have 4 100GB drives (2 fast ones and 2 slow ones). I''ve > > restricted a performance critical directory to the two fastest drives, > > currently totaling 100GB of performance critical data. The rest of the > > data on the system is striped. > > > > How much free space do I have on the filesystem? 100GB (the amount of > > data I can store in the performance critical directory)? 200GB (the > > amount of data I can store outside the performance critical directory if > > the striping is guaranteed)? 300GB (the amount of data I can store > > outside the performance critical directory if the striping is best > > effort)? > > > > People already create these configurations, they just do it with > multiple filesystems. And, when they want to resize the performance > critical section, it is a difficult (and often slow) operation. > > More flexibility in managing storage is the end goal for btrfs, and > we''re just barely getting to the point where we can start addressing > these difficult issues.Some time ago I created some list of features of ideal filesystems. Currently, btrfs (with all proposed but not implemented yet things) is very close. For example, the ability to freely manage the media pool, IOW add and remove harddisks of arbitrary size is very important now, when it''s not very uncommon to have a box with 24 hard drives each of them can fail at any time, and it''s economically unfeasible to keep spare pool of N drives of exactly the same size. The individual drive size constraint, which is very important in traditional layered raid-then-lvm-then-fs approach, is not present in our ideal case, which allows us to manage our storage more effectively. Another point is per-object locality/redundancy policy. It''s a killer feature, because, in a traditional world, you''ll have to manage (resize and move) all those partitions around, which is not very flexible given that you might have 24 drives and then you''ll have to create one raid10, one raid6, and one raid0 on top of them, juggling the underlying partition sizes, etc, you know. It is essential to have a filesystem which will do it for you, again, with more efficiency that you can extract from 30-year-old way of setting up "block devices". -- Paul P ''Stingray'' Komkoff Jr // http://stingr.net/key <- my pgp key This message represents the official view of the voices in my head -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Let''s say I have 4 100GB drives (2 fast ones and 2 slow ones). I''ve > > restricted a performance critical directory to the two fastest drives, > > currently totaling 100GB of performance critical data. The rest of the > > data on the system is striped. > > > > How much free space do I have on the filesystem? 100GB (the amount of > > data I can store in the performance critical directory)? 200GB (the > > amount of data I can store outside the performance critical directory if > > the striping is guaranteed)? 300GB (the amount of data I can store > > outside the performance critical directory if the striping is best > > effort)? > > > > People already create these configurations, they just do it with > multiple filesystems. And, when they want to resize the performance > critical section, it is a difficult (and often slow) operation.I think I''m starting to get it. btrfs would have drive groups, and no file would have data on more than one drive group at once. That would make it possible to make meaningful statements about how much free disk space there is (per drive group). This is almost the same as having multiple filesystems, except files cannot be assigned to filesystems on an individual basis. So in a way, btrfs would be replacing some functionality of the VFS (mapping files to filesystems). Is that right? Cheers, Eric
On Tue, 2008-09-09 at 19:32 -0600, Eric Anopolsky wrote:> > > Let''s say I have 4 100GB drives (2 fast ones and 2 slow ones). I''ve > > > restricted a performance critical directory to the two fastest drives, > > > currently totaling 100GB of performance critical data. The rest of the > > > data on the system is striped. > > > > > > How much free space do I have on the filesystem? 100GB (the amount of > > > data I can store in the performance critical directory)? 200GB (the > > > amount of data I can store outside the performance critical directory if > > > the striping is guaranteed)? 300GB (the amount of data I can store > > > outside the performance critical directory if the striping is best > > > effort)? > > > > > > > People already create these configurations, they just do it with > > multiple filesystems. And, when they want to resize the performance > > critical section, it is a difficult (and often slow) operation. > > I think I''m starting to get it. btrfs would have drive groups, and no > file would have data on more than one drive group at once. That would > make it possible to make meaningful statements about how much free disk > space there is (per drive group). This is almost the same as having > multiple filesystems, except files cannot be assigned to filesystems on > an individual basis.Yes, I think this is a fair statement.> So in a way, btrfs would be replacing some > functionality of the VFS (mapping files to filesystems).I think there are many different definitions of the VFS. Mostly what the VFS does is maintain the dentry and inode caches, and provide a basic locking framework around most file/inode operations. The VFS is still doing all the mapping of files to filesystems, and the filesystem is mapping files to disk blocks. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html