I have a question about using mixed vdev in the same zpool and what the community opinion is on the matter. Here is my setup: I have four 1TB drives and two 500GB drives. When I first setup ZFS I was under the assumption that it does not really care much on how you add devices to the pool and it assumes you are thinking things through. But when I tried to create a pool (called group) with four 1TB disk in raidz and two 500GB disk in mirror configuration to the same pool ZFS complained and said if I wanted to do it I had to add a -f (which I assume stands for force). So was ZFS attempting to stop me from doing something generally considered bad? Some other questions I have, lets assume that this setup isn''t that bad (or it is that bad and these questions will be why): If one 500GB disk dies (c10dX) in the mirror and I choose not to replace it, would I be able to migrate the files that are on the other mirror that still works over to the drives in the raidz configuration assuming there is space? Would ZFS inform me which files are affected, like it does in other situations? In this configuration how does Solaris/ZFS determine which vdev to place the current write operations worth of data into? Is there any situations where data would, for some reason, not be protected against single disk failures? Would this configuration survive a two disk failure if the disk are in a separate vdev? jsmith at corax:~# zpool status group pool: group state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM group ONLINE 0 0 0 ..raidz1 ONLINE 0 0 0 ....c7t0d0 ONLINE 0 0 0 ....c7t1d0 ONLINE 0 0 0 ....c8t0d0 ONLINE 0 0 0 ....c8t1d0 ONLINE 0 0 0 ..mirror ONLINE 0 0 0 ....c10d0 ONLINE 0 0 0 ....c10d1 ONLINE 0 0 0 errors: No known data errors jsmith at corax:~# zfs list group NAME USED AVAIL REFER MOUNTPOINT group 94.4K 3.12T 23.7K /group This isn''t for a production environment in some datacenter but nevertheless I would like to make the data as reasonably secure as possible while maximizing total storage space. -- This message posted from opensolaris.org
Justin wrote:> I have a question about using mixed vdev in the same zpool and what the community opinion is on the matter. Here is my setup: > > I have four 1TB drives and two 500GB drives. When I first setup ZFS I was under the assumption that it does not really care much on how you add devices to the pool and it assumes you are thinking things through. But when I tried to create a pool (called group) with four 1TB disk in raidz and two 500GB disk in mirror configuration to the same pool ZFS complained and said if I wanted to do it I had to add a -f (which I assume stands for force). So was ZFS attempting to stop me from doing something generally considered bad? >Were any of these drives previous part of another pool? As such, ZFS will usually complain if it finds a signature already on the drive, and make you use the ''-f'' option. Otherwise, I don''t /think/ it should care if you are being foolish. <wink>> Some other questions I have, lets assume that this setup isn''t that bad (or it is that bad and these questions will be why): > > If one 500GB disk dies (c10dX) in the mirror and I choose not to replace it, would I be able to migrate the files that are on the other mirror that still works over to the drives in the raidz configuration assuming there is space? Would ZFS inform me which files are affected, like it does in other situations? > >No, you can''t currently. Essentially what you are asking is if you can remove the mirror from the pool - this is not currently possible, though I''m hopeful it may happen in the not-so-distant future.> In this configuration how does Solaris/ZFS determine which vdev to place the current write operations worth of data into? >It will attempt to balance the data across the two vdevs (the mirror and raidz) until it runs out of space on one (in your case, the mirror pair). ZFS does not currently understand differences in underlying hardware performance or vdev layout, so it can''t "magically" decide to write data to one particular vdev over the other. In fact, I can''t really come up with a sane way to approach that problem - there are simply too many variables to allow for automagic optimization like that. Perhaps if there was some way to "hint" to ZFS upon pool creation like "perfer vdev A for large writes, vdev B for small writes", but even so, I think that''s marching off into a wilderness we don''t want to visit, let alone spend any time in. I would consider this a poor design, as the vdevs have very different performance profiles, which hurts the overall performance of the pool significantly.> Is there any situations where data would, for some reason, not be protected against single disk failures? > >No. In your config, both vdevs can survive a single disk failure, so the pool is fine.> Would this configuration survive a two disk failure if the disk are in a separate vdev? > >Yes.> jsmith at corax:~# zpool status group > pool: group > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > group ONLINE 0 0 0 > ..raidz1 ONLINE 0 0 0 > ....c7t0d0 ONLINE 0 0 0 > ....c7t1d0 ONLINE 0 0 0 > ....c8t0d0 ONLINE 0 0 0 > ....c8t1d0 ONLINE 0 0 0 > ..mirror ONLINE 0 0 0 > ....c10d0 ONLINE 0 0 0 > ....c10d1 ONLINE 0 0 0 > > errors: No known data errors > jsmith at corax:~# zfs list group > NAME USED AVAIL REFER MOUNTPOINT > group 94.4K 3.12T 23.7K /group > > > This isn''t for a production environment in some datacenter but nevertheless I would like to make the data as reasonably secure as possible while maximizing total storage space. >If you are using Solaris (which you seem to be doing), my recommendation is that you use SVM to create a single 1TB concat device from the 2 500GB drives, then use that 1TB concat device along with the other physical 1TB devices to create your pool with. Failing 1 500GB drive then invalidates that concat device, which ZFS assumes is a single "disk", and behaves accordingly. Thus, my suggestion is something like this: ( using your cX layout in the example above) # metainit d0 2 1 c10d0s2 1 c10d1s2 # zpool create tank raidz c7t0d0 c7t1d0 c8t0d0 c8t1d0 /dev/md/dsk/d0 This would get you a RAIDZ of capacity 4TB or thereabouts, able to survive 1 disk failure (or, both 500GB drives failing) -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
Eric, Thanks for your input, this has been a great learning experience for me on the workings of ZFS. I will use your suggestion and create the metadevice and run raidz across 5 "devices" for approximately the same total storage. -- This message posted from opensolaris.org
On Fri, 26 Mar 2010, Erik Trimble wrote:> It will attempt to balance the data across the two vdevs (the mirror and > raidz) until it runs out of space on one (in your case, the mirror pair). > ZFS does not currently understand differences in underlying hardware > performance or vdev layout, so it can''t "magically" decide to write data to > one particular vdev over the other. In fact, I can''t really come up with aZfs can see the backlog of writes to devices in a vdev (service time). It can also see how much free space there is in a vdev. Round-robin is also used to influence which vdev is used for the next write. Regardless, if data is written slowly it may be that the free space available in the vdevs will ultimately become the dominant factor. Zfs may work just fine if the vdevs are of different base types, or use a different type of disk. It is just not assured to work as well as if the storage is perfectly balanced and symmetrical. Maybe someone should experiment with this and report their findings. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
> when I tried to create a pool (called group) with four 1TB disk in > raidz and two 500GB disk in mirror configuration to the same pool ZFS > complained and said if I wanted to do it I had to add a -fHonestly, I''m surprised by that. I would think it''s ok. I am surprised by the -f, just as you are.> If one 500GB disk dies (c10dX) in the mirror and I choose not to > replace it,Don''t do that. You cannot remove vdev''s from your pool. If one half of the mirror dies, you''re now degraded, and if the 2nd half dies, you lose your pool. You cannot migrate data out of the failing mirror onto the raidz, even if there''s empty space. I''m sure the ability to remove vdev''s will be created some day ... but not yet.> Would this configuration survive a two disk failure if the disk are inYup, no problem. Lose 1 disk in the raidz, and 1 disk in the mirror is ok. Just don''t lose 2 disks from the same vdev at the same time.