Hi all Having managed ZFS for about two years, I want to post a wishlist. INCLUDED IN ZFS - Mirror existing single-drive filesystem, as in ''zfs attach'' - RAIDz-stuff - single and hopefully multiple-parity RAID configuration with block-level checksumming - Background scrub/fsck - Pool-like management with multiple RAIDs/mirrors (VDEVs) - Autogrow as in ZFS autoexpand NOT INCLUDED IN CURRENT ZFS - Adding/removing drives from VDEVs - Rebalancing a pool - dedup This may be a long shot, but can someone tell if this is doable in a year or five? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Excerpts from Roy Sigurd Karlsbakk''s message of 2011-03-01 13:35:42 -0500:> Hi all > > Having managed ZFS for about two years, I want to post a wishlist. > > INCLUDED IN ZFS > > - Mirror existing single-drive filesystem, as in ''zfs attach''This one is easy, we do plan on adding it.> - RAIDz-stuff - single and hopefully multiple-parity RAID configuration with block-level checksummingWe''ll have raid56, but it won''t be variable stripe size. There will be one stripe size for data and one for metadata but that''s it.> - Background scrub/fsckThese are in the works> - Pool-like management with multiple RAIDs/mirrors (VDEVs)We have a pool of drives now....I''m not sure exactly what the vdevs are.> - Autogrow as in ZFS autoexpandWe grow to the available storage now.> > NOT INCLUDED IN CURRENT ZFS > > - Adding/removing drives from VDEVsWe can add and remove drives on the fly today> - Rebalancing a poolWe can rebalance space between drives today.> - dedupZFS does have dedup we don''t yet. This one has a firm maybe. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 1, 2011 at 10:39 AM, Chris Mason <chris.mason@oracle.com> wrote:> Excerpts from Roy Sigurd Karlsbakk''s message of 2011-03-01 13:35:42 -0500: > >> - Pool-like management with multiple RAIDs/mirrors (VDEVs) > > We have a pool of drives now....I''m not sure exactly what the vdevs are.This functionality is in btrfs already, but it''s using different terminology and configuration methods. In ZFS, the lowest level in the storage stack is the physical block device. You group these block devices together into a virtual device (aka vdev). The possible vdevs are: - single disk vdev, with no redundancy - mirror vdev, with any number of devices (n-way mirroring) - raidz1 vdev, single-parity redundancy - raidz2 vdev, dual-parity redundancy - raidz3 vdev, triple-party redundancy - log vdev, separate device for "journaling", or as a write cache - cache vdev, separate device that acts as a read cache A ZFS pool is made up of a collection of the vdevs. For example, a simple, non-redundant pool setup for a laptop would be: zpool create laptoppool da0 To create a pool with a dual-parity vdev using 8 disks: zpool create mypool raidz2 da0 da1 da2 da3 da4 da5 da6 da7 To later add to the existing pool: zpool add mypool raidz2 da8 da9 da10 da11 da12 da13 da14 da15 Later, you create your ZFS filesystems ontop of the pool. With btrfs, you setup the redundancy and the filesystem all in one shot, thus combining the "vdev" with the "pool" (aka filesystem). ZFS has better separation of the different layers (device, pool, filesystem), and better tools for working with them (zpool / zfs) but similar functionality is (or at least appears to be) in btrfs already. Using device mapper / md underneath btrfs also gives you a similar setup to ZFS. -- Freddie Cash fjwcash@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2011-03-01 19:39, Chris Mason wrote:> We''ll have raid56, but it won''t be variable stripe size. There will be > one stripe size for data and one for metadata but that''s it.Will the stripe *width* be configurable? If I have something like a Sun Thor with 48 drives, I would probably not be entirely comfortable having 46 drives data and 2 drives parity; too little redundancy for my tastes. 2 drives parity per 10 drives data is more like what I would run, but that would of course be an individual choice. /Bellman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 02, 2011 at 10:05:28AM +0100, Thomas Bellman wrote:> On 2011-03-01 19:39, Chris Mason wrote: > > > We''ll have raid56, but it won''t be variable stripe size. There will be > > one stripe size for data and one for metadata but that''s it. > > Will the stripe *width* be configurable? If I have something like a > Sun Thor with 48 drives, I would probably not be entirely comfortable > having 46 drives data and 2 drives parity; too little redundancy for > my tastes. 2 drives parity per 10 drives data is more like what I > would run, but that would of course be an individual choice.It''s something that''s been asked for, but isn''t supported by the current (proposed) RAID-5/6 code, as far as I''m aware. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Try everything once, except incest and folk-dancing. ---
On 02/03/11 10:05, Thomas Bellman wrote:> Will the stripe *width* be configurable? If I have something like a > Sun Thor with 48 drives, I would probably not be entirely comfortable > having 46 drives data and 2 drives parity; too little redundancy for > my tastes. 2 drives parity per 10 drives data is more like what I > would run, but that would of course be an individual choice.On thing to remember is is that the parity is for specific pieces of file system data. So not your entire dataset is at risk when only write errors occur on a few places on a few disks, only file system objects that have data stored in those places are at immediate risk. This means that only files unlucky enough to have multiple failing sectors for the same stripe width to be really impacted. Of course this only matters as long as we''re talking about bad sectors and not full disk failure. Regards, justin.... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html