Hello everyone, I''m new to ZFS and OpenSolaris, and I''ve been reading the docs on ZFS (the pdf "The Last Word on Filesystems" and wikipedia of course), and I''m trying to understand something. So ZFS is self-healing, correct? This is accomplished via parity and/or metadata of some sort on the disk, right? So it protects against data corruption, but not against disk failure. Or is it the case that ZFS intelligently puts the parity and/or metadata on alternate disks to protect against disk failure, even without a raid array? Anyway you can add mirrored, striped, raidz, or raidz2 arrays to the pool, right? But you can''t "effortlessly" grow/shrink this protected array if you wanted to add a disk or two to increase your protected storage capacity. My understanding is that if you want to add storage to a raid array, you must copy all your data off the array, destroy the array, recreate it with your extra disk(s), then copy all your data back. I like the idea of a protected storage pool that can grow and shrink effortlessly, but if protecting your data against drive failure is not as effortless, then honestly, what''s the point? In my opinion, the ease of use should be nearly that of the Drobo product. Which brings me to my final question: is there a gui tool available? I can use command line just like the next guy, but gui''s sure are convenient... Thanks for your help! -Steve This message posted from opensolaris.org
On Sat, May 24, 2008 at 3:12 AM, Steve Hull <p.witty at gmail.com> wrote:> Hello everyone, > > I''m new to ZFS and OpenSolaris, and I''ve been reading the docs on ZFS (the > pdf "The Last Word on Filesystems" and wikipedia of course), and I''m trying > to understand something. > > So ZFS is self-healing, correct? This is accomplished via parity and/or > metadata of some sort on the disk, right? So it protects against data > corruption, but not against disk failure. Or is it the case that ZFS > intelligently puts the parity and/or metadata on alternate disks to protect > against disk failure, even without a raid array? > > Anyway you can add mirrored, striped, raidz, or raidz2 arrays to the pool, > right? But you can''t "effortlessly" grow/shrink this protected array if you > wanted to add a disk or two to increase your protected storage capacity. My > understanding is that if you want to add storage to a raid array, you must > copy all your data off the array, destroy the array, recreate it with your > extra disk(s), then copy all your data back. > > I like the idea of a protected storage pool that can grow and shrink > effortlessly, but if protecting your data against drive failure is not as > effortless, then honestly, what''s the point? In my opinion, the ease of use > should be nearly that of the Drobo product. Which brings me to my final > question: is there a gui tool available? I can use command line just like > the next guy, but gui''s sure are convenient... > > Thanks for your help! > -Steve > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >You''re thinking in terms of a home user. ZFS was designed for an enterprise environment. When they add disks, they don''t add one disk at a time, it''s a tray at a time at the very least. Because of this, they aren''t ever copying data off of the array and back on, and no destruction is needed. You just add a raidz/raidz2 at a time striped across your 14 disks (or however large the tray of disks is). The gui is a web interface. Just point your browser at https://localhost:6789 --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080524/9d0ec61a/attachment.html>
> Anyway you can add mirrored, [...], raidz, or raidz2 arrays to the pool, right?correct. > add a disk or two to increase your protected storage capacity. if its a protected vdev, like a mirror or raidz, sure... one can force add a single disk, but then the pool isn''t protected until you attach a mirror to that single disk. one can''t (currently) remove a vdev (shrink a pool) but one can increase each element of a vdev increasing the size of the pool while maintaining the number of elements (disk count) Rob
Hi Steve, Am 24.05.2008 um 10:17 schrieb zfs-discuss-request at opensolaris.org:> ZFS: A general question > To: zfs-discuss at opensolaris.org > Message-ID: <4935302.1211617017042.JavaMail.Twebapp at oss-app1> > Content-Type: text/plain; charset=UTF-8 > > Hello everyone, > > I''m new to ZFS and OpenSolaris, and I''ve been reading the docs on > ZFS (the pdf "The Last Word on Filesystems" and wikipedia of > course), and I''m trying to understand something. > > So ZFS is self-healing, correct? This is accomplished via parity > and/or metadata of some sort on the disk, right? So it protects > against data corruption, but not against disk failure.This is not entirely true, but possible. You can use the copies attribute to have some sort of redundancy on a single disk. But obviously, if yo only use a single disk and it breaks completely, data loss can not be avoided. Even without redundancy features ZFS, provides very good detection of block failure and snapshots that can be used to avoid accidental deletion/unwanted changes of data> Or is it the case that ZFS intelligently puts the parity and/or > metadata on alternate disks to protect against disk failure, even > without a raid array?You do not need a hardware RAID array to get these features and you can theoretically even use partitions/slices on a single disk, but to get good protection and acceptable performance, you will need multiple drives, since a drive can always fail in a way that it is completely unusable (i.e. it does not spin up anymore).> > Anyway you can add mirrored, striped, raidz, or raidz2 arrays to the > pool, right? But you can''t "effortlessly" grow/shrink this > protected array if you wanted to add a disk or two to increase your > protected storage capacity.A number of redundant disks is called vdev - this is probably what you call "array". A vdev can be build from disks, files, iscsi targets or partitions. Several vdevs form a storage pool. You can increase the size of a pool by adding extra vdevs or replacing all members of a vdev with bigger ones.> My understanding is that if you want to add storage to a raid array, > you must copy all your data off the array, destroy the array, > recreate it with your extra disk(s), then copy all your data back.This is currently true for shrinking a pool and for changing the number of devices in a raidz1/2 vdev. Some efforts have been made to change that - see http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z Theoretically it should also be possible to "evacuate" vdevs (and remove them from a pool), but I do not think any code has been written to do so. The main reason is that Sun''s paying customers are probably reasonably happy to just add a vdev to increase storage, so other features are much higher on their priority list.> > I like the idea of a protected storage pool that can grow and shrink > effortlessly, but if protecting your data against drive failure is > not as effortless, then honestly, what''s the point? In my opinion, > the ease of use should be nearly that of the Drobo product. Which > brings me to my final question: is there a gui tool available? I > can use command line just like the next guy, but gui''s sure are > convenient...I''d say: The point is "First things first". Sun provides a free, reasonably manageable very robust storage concept that does not have all desirable features (yet). For a nice GUI-Tool you might have to wait for Mac OS X 10.6 ;-) Hope this helps, ralf
OK so in my (admittedly basic) understanding of raidz and raidz2, these technologies are very similar to raid5 and raid6. BUT if you set up one disk as a raidz vdev, you (obviously) can''t maintain data after a disk failure, but you are protected against data corruption that is NOT a result of disk failure. Right? So is there a resource somewhere that I could look at that clearly spells out how many disks I could have vs. how much resulting space I would have that would still protect me against disk failure (a la the "Drobolator" http://www.drobo.com/drobolator/index.html)? I mean, if I have a raidz vdev with one disk, then I add a disk, am I protected from disk failure? Is it the case that I need to have disks in groups of 4 to maintain protection against single disk failure with raidz and in groups of 5 for raidz2? It gets even more confusing if I wanted to add disks of varying sizes... And you said I could add a disk (or disks) to a mirror -- can I force add a disk (or disks) to a raidz or raidz2? Without destroying and rebuilding as I read would be required somewhere else? And if I create a zpool and add various single disks to it (without creating raidz/mirror/etc), is it the case that the zpool is essentially functioning like spanning raid? Ie, no protection at all?? Please either point me to an existing resource that spells this out a little clearer or give me a little more explanation around it. And... do you think that the Drobo (www.drobo.com) product is essentially just a box with OpenSolaris and ZFS on it? This message posted from opensolaris.org
I like the link you sent along... They did a nice job with that. (but it does show that mixing and matching vastly different drive-sizes is not exactly optimal...) http://www.drobo.com/drobolator/index.html Doing something like this for ZFS allowing people to create pools by mixing/matching drives, raid1, and raidz/z2 drives in a zpool makes for a pretty cool page. If one of the statistical gurus can add MTBF MTTdataLoss etc. to that as a calculator at the bottom that would be even better. (someone did some static graphs for different thumper configurations for this in the past... This would just make that more general purpose/GUI driven... Sounds like a cool project) -- No mention anywhere of "removing drives" thereby reducing capacity though... Raid-re-striping isn''t all that much fun, especially with larger drives... (and even ZFS lacks some features in this area for now) See the answer to you other question below. (from their FAQ) -- MikeE What file systems does drobo support? RESOLUTION: Drobo is a usb external disk array that is formatted by the host operating system (Windows or OS X). We currently support NTFS, HFS+, and FAT32 file systems with firmware revision 1.0.2. Drobo is not a ZFS file system. STATUS: Current specification 1.0.2 Applies to: Drobo DRO4D-U -----Original Message----- From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Steve Hull Sent: Saturday, May 24, 2008 7:00 PM To: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] ZFS: A general question OK so in my (admittedly basic) understanding of raidz and raidz2, these technologies are very similar to raid5 and raid6. BUT if you set up one disk as a raidz vdev, you (obviously) can''t maintain data after a disk failure, but you are protected against data corruption that is NOT a result of disk failure. Right? So is there a resource somewhere that I could look at that clearly spells out how many disks I could have vs. how much resulting space I would have that would still protect me against disk failure (a la the "Drobolator" http://www.drobo.com/drobolator/index.html)? I mean, if I have a raidz vdev with one disk, then I add a disk, am I protected from disk failure? Is it the case that I need to have disks in groups of 4 to maintain protection against single disk failure with raidz and in groups of 5 for raidz2? It gets even more confusing if I wanted to add disks of varying sizes... And you said I could add a disk (or disks) to a mirror -- can I force add a disk (or disks) to a raidz or raidz2? Without destroying and rebuilding as I read would be required somewhere else? And if I create a zpool and add various single disks to it (without creating raidz/mirror/etc), is it the case that the zpool is essentially functioning like spanning raid? Ie, no protection at all?? Please either point me to an existing resource that spells this out a little clearer or give me a little more explanation around it. And... do you think that the Drobo (www.drobo.com) product is essentially just a box with OpenSolaris and ZFS on it? This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Sooo... I''ve been reading a lot in various places. The conclusion I''ve drawn is this: I can create raidz vdevs in groups of 3 disks and add them to my zpool to be protected against 1 drive failure. This is the current status of "growing protected space" in raidz. Am I correct here? This message posted from opensolaris.org
Steve Hull wrote:> Sooo... I''ve been reading a lot in various places. The conclusion I''ve drawn is this: > > I can create raidz vdevs in groups of 3 disks and add them to my zpool to be protected against 1 drive failure. This is the current status of "growing protected space" in raidz. Am I correct here? > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >Correct. Here''s some quick summary information: a POOL is made of 1 or more VDEVs. POOLs consisting of more than 1 VDEV will stripe data across all the VDEVs. VDEVS may be freely added to any POOL, but cannot currently be removed from a POOL. When a vdev is added to a pool, data on the existing vdevs is not automatically re-distributed. That is, say you have 3 vdevs of 1GB each, and add another vdev of 1GB. The system does not immediately attempt to re-distribute the data on the original 3 devices. It will re-balance the data as you WRITE to the pool. Thus, if you expand a pool like this, it is a good idea to copy the data around. i.e. cp /zpool/olddir /zpool/newdir rm -rf /zpool/olddir If there are more than 1 vdev in a pool, the pool''s capacity is determined by the smallest device. Thus, if you have a 2GB, a 3GB, and a 5GB device in a pool, the pool''s capacity is 3 x 2GB = 6GB, as ZFS will only do full-stripes. Thus, there really is no equivalent to Concatenation in other RAID solutions. However, if you replace ALL devices in a pool with larger ones, ZFS will automatically expand the pool size. Thus, if you replaced the 2GB devices in the above case with 4GB devices, then the pool would automatically appear to be 3 x 4GB = 12GB. A VDEV can consist of: any file any disk slice/partition a whole disk (preferred!) a special sub-device, raidz/raidz1/raidz2/mirror/cache/log/spare For the special sub-devices, here''s a summary: raidz (synonym raidz1): You must provide at LEAST 3 storage devices (where a file, slice, or disk is a storage device) 1 device''s capacity is consumed in parity. However, parity is scattered around the devices, thus this is roughly analogous to RAID-5 Currently, devices CANNOT be added or removed from a raidz. It is possible to increase the size of raidz by replacing each drive, ONE AT A TIME, with a larger drive. But altering the NUMBER of drives is not possible. raidz2: You must have at LEAST _4_ storage devices 2 device''s capacity is consumed by parity. Like raidz, parity is scattered around the devices, improving I/O performance. Roughly analogous to RAID-6. Altering a raidz2 is exactly like doing a raidz. mirror You must provide at LEAST 2 storage devices All data is replicated across all devices, acting as a "normal" mirror. You can add or detach devices from a mirror at will, so long as they are at least a big as the original mirror. spare Indicates a device which can be used as a hot spare. log indicates an Intent Log, which is basically a transactional log of filesystem operations. Generally speaking, this is used only for certain high-performance cases, and tends to be used in association with enterprise-level devices, such as solid-state drives. cache similar to an Intent Log, this provide a place to cache filesystem internals (metadata such as directory/file attributes) usually used in situations similar to log devices. -------- All pools store redundant metadata, so they can automatically detect and repair most faults in metadata. If you vdev is raidz, raidz2 or mirror, they store redundant data (which allows them to recover from losing a disk), so they can automatically detect AND repair block-level faults. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
On Sun, May 25, 2008 at 1:27 AM, Erik Trimble <Erik.Trimble at sun.com> wrote:> If there are more than 1 vdev in a pool, the pool''s capacity is > determined by the smallest device. Thus, if you have a 2GB, a 3GB, and a > 5GB device in a pool, the pool''s capacity is 3 x 2GB = 6GB, as ZFS will > only do full-stripes. Thus, there really is no equivalent to > Concatenation in other RAID solutions.Not true. If you have a mirrored or a raidz vdev, then the size of that vdev is determined by the size of the smallest disk, but if you have multiple vdevs the size of the pool is the sum of the size of the vdevs. I have two pools off the top of my head that illustrate this: one with 3*120 and 1*200 that has ~550GB capacity, and one with 8*320 raidz and 8*500 raidz that has 4.47TB capacity. Will
> Thus, if you have a 2GB, a 3GB, and a 5GB device in a pool,> the pool''s capacity is 3 x 2GB = 6GB If you put the three into one raidz vdev it will be 2+2 until you replace the 2G disk with a 5G at which point it will be 3+3 and then when you replace the 3G with a 5G it will be 5+5G. and if you replace the 5G with a 10G it will still be 5+5G If one lists out the three disks so they are all their own vdevs it will be 3x faster than the raidz and 3+2+5 in size (see example below of mirrors and raidz vdevs of different sizes) > All pools store redundant metadata, so they can > automatically detect and repair most faults in metadata. and one can `zfs set copys=2 pool/home` with the 2+3+5 stripe to automatically detect and repair most faults in data as there is an "attempt" to store files on different vdevs (mirrors are best) 7 % zpool iostat -v capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- root 15.9G 100G 2 0 177K 800 c2t0d0s7 15.9G 100G 2 0 177K 800 ---------- ----- ----- ----- ----- ----- ----- z 3.28T 1.59T 379 19 26.6M 103K raidz1 1.83T 1.58T 207 12 14.9M 64.7K c0t2d0 - - 69 6 3.84M 17.1K c4t1d0 - - 69 6 3.84M 17.1K c0t6d0 - - 69 6 3.84M 17.1K c0t4d0 - - 69 6 3.84M 17.1K c4t3d0 - - 69 6 3.84M 17.1K raidz1 1.44T 12.0G 172 7 11.7M 37.9K c4t4d0 - - 58 5 3.06M 10.2K c4t6d0 - - 58 5 3.06M 10.2K c0t3d0 - - 58 5 3.06M 10.2K c4t2d0 - - 58 5 3.06M 10.2K c0t5d0 - - 58 5 3.06M 10.2K ---------- ----- ----- ----- ----- ----- ----- 1 % zpool iostat -v capacity operations bandwidth pool used avail read write read write ------------ ----- ----- ----- ----- ----- ----- root 5.28G 24.0G 0 0 863 2.13K mirror 5.28G 24.0G 0 0 863 2.13K c0t1d0s0 - - 0 0 297 4.76K c0t0d0s0 - - 0 0 597 4.76K ------------ ----- ----- ----- ----- ----- ----- z 230G 500G 17 76 150K 461K mirror 83.8G 182G 6 25 52.1K 158K c0t0d0s7 - - 2 15 85.1K 248K c0t1d0s7 - - 2 15 86.7K 248K mirror 72.6G 159G 5 26 49.4K 161K c0t2d0 - - 2 19 82.7K 251K c0t3d0 - - 2 19 81.2K 251K mirror 74.0G 158G 5 23 48.3K 142K c0t4d0 - - 2 18 72.3K 232K c0t5d0 - - 2 18 71.9K 232K ------------ ----- ----- ----- ----- ----- -----
Will and several other people are correct. I had forgotten that ZFS does a funky form of concatenation when you use different size vdevs. I tend to ignore this case, because it''s kinda useless (I know, I know, there''s people who use it, but, really... <wink>) Basically, it will stripe across vdevs as it can. So, if you have a zpool like this: 2GB vdev 3GB vdev 4GB vdev ZFS will have a 3-wide stripe across the first 2GB of all devices, then a 2-wide stripe across the next 1GB of the two larger devices, then finally a single stripe (aka no stripe) in the 1GB left in the largest one (in this example). So you do get the full 9GB of space. Naturally, this produces really weird performance curves for (random) data access. OK, maybe weird isn''t the right word, but... -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
THANK YOU VERY MUCH EVERYONE!! You have been very helpful and my questions are (mostly) resolved. While I am not (and probably will not become) a ZFS expert, I now at least feel confident that I can accomplish what I want to do. My last comment on this is this: I realize that ZFS is designed and intended for Enterprise use, but it also has many useful features that home and soho users appreciate. That being said, I feel that it still will leave most casual home and soho users a bit confused and wishing for other features (especially ease of use). If Sun released a software alternative to the Drobo product, I feel certain that they would be able to very successfully market a product like this to home and soho users. Heck, I would buy such a piece of software (from Sun) in a hot second. Plus, if they based it off of ZFS and just "hid" most of the configuration options so that your pools were automatically configured with single parity (or mirror for 2 drive setups) -- then added the "expand-o-matic raidz" feature, add a "shrink" feature, and add the ability to better utilize space on differently sized drives -- it would be awesome, and a good part of the work would already be done (ie, ZFS). It would be far superior to Drobo, and could probably undercut Drobo significantly on price point. Then it would truly be the holy grail of file systems. In fact, depending on the license of OpenSolaris/ZFS, I wonder if a group of independent developers could package up Vbox, OpenSolaris, a modified ZFS, and a setup/admin utility to create such a product... that would be cool. Again, the "heavy lifting" would be modifying raidz so that it could expand/shrink/better utilize space on differently sized drives. This message posted from opensolaris.org