Ive never understood it. Ive heard that for the Thumper which uses 48 drives, you should not make them all into one zpool. Instead you should make them into several vdevs. And then you combine all vdevs into one zpool? Is it so? Why do you do that? Why not several zpools? This message posted from opensolaris.org
On Wed, 21 May 2008, Orvar Korvar wrote:> Ive heard that for the Thumper which uses 48 drives, you should not > make them all into one zpool. Instead you should make them into several > vdevs. And then you combine all vdevs into one zpool? Is it so? Why doRight. A vdev is the smallest storage unit ZFS knows about. What the advice is saying is that you should create a number of vdevs from several devices (how many depends on what sort of reducndancy requirements you have), and the make a pool from the resulting vdevs.> you do that? Why not several zpools?Why have several zpools? -- Rich Teer, SCSA, SCNA, SCSECA CEO, My Online Home Inventory URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer http://www.myonlinehomeinventory.com
Ok, so i make one vdev out of 8 discs. And I combine all vdevs into one large zpool? Is it correct? I have 8 port SATA card. I have 4 drives into one zpool. That is one vdev, right? Now I can add 4 new drives and make them into one zpool. And now I combine both zpool into one zpool? That can not be right? I dont get vdevs. Can someone explain? This message posted from opensolaris.org
On Fri, 2008-05-23 at 13:45 -0700, Orvar Korvar wrote:> Ok, so i make one vdev out of 8 discs. And I combine all vdevs into one large zpool? Is it correct? > > I have 8 port SATA card. I have 4 drives into one zpool.zpool create mypool raidz1 disk0 disk1 disk2 disk3 you have a pool consisting of one vdev made up of 4 disks.> That is one vdev, right? Now I can add 4 new drives and make them > into one zpool.you could do that and keep the pool separate, or you could add them as a single vdev to the existing pool: zpool add mypool raidz1 disk4 disk5 disk6 disk7 - Bill
Orvar Korvar ?????:> Ok, so i make one vdev out of 8 discs. And I combine all vdevs into > one large zpool? Is it correct?I think it is easier to provide couple of examples: zpool create pool c1t0d0 mirror c1t1d0 c1t2d0 This command would create storage pool with name ''pool'' consisting of 2 top-level vdevd: first is c1t0d0 second is a mirror of c1t1d0 c1t2d0 Though it is not recommended to combine top-level vdevs with different replication setting in one pool. ZFS will distribute blocks of data between these two top-level vdevs automatically, so blocks of data which end up on mirror will be protected, and blocks of data which end up on single disk will not be protected. zpool add pool c1t0d0 would add another single-disk top level to the pool ''pool'' zpool add pool raidz c3t0d0 c4t0d0 c5t0d0 would add another raidz top-level vdev to our pool. Inside ZFS disks forming mirror and raidz are called vdev also, but not top-level vdevs. Hth, victor
Orvar Korvar wrote:> Ok, so i make one vdev out of 8 discs. And I combine all vdevs into one large zpool? Is it correct? > > I have 8 port SATA card. I have 4 drives into one zpool. That is one vdev, right? Now I can add 4 new drives and make them into one zpool. And now I combine both zpool into one zpool? That can not be right? I dont get vdevs. Can someone explain? >A ''vdev'' is the basic unit that a zpool is made of. There are several types of vdevs: Single device type: This vdev is made from one storage device - Generally a Hard disk drive, but there are other possibilities. This type of vdev has *no* data redundancy, but ZFS will still be able to notice errors due to checksumming every block. ZFS will also keep redundant meta-data on this one device, so meta data has the ability to survive block failures, but nothing will save data in this type of vdev from a full device failure. The size of this vdev is the size of the storage device it is made from. Mirrored type: This type of vdev is made from 2 or more storage devices, All data is written to all devices, so there is data redundancy. The more devices in the mirror the more copies of the data, and the more full device failures the vdev can survive. The size of this vdev is the size of the smallest storage device in the mirror. While devices (copies) can be added and removed from the mirror vdev, this only changes the redundancy, and not the size. Though if the smallest storage device in the mirror is replaced with a larger one, the size of the mirror should grow to the size of the new smallest device in the mirror. RAIDZ or RAIDZ1 type: This type of vdev is made of 3 or more storage devices. This type of vdev has data redundancy, and can survive the loss of one device in the vdev at a time. The available space on a RAIDZ vdev is the size of the smallest storage device in the vdev times one less than the number of devices in the vdev ( minsize*(n-1) ) because 1 devices worth of space is used for parity information to provide the redundancy. This vdev type cannot (currently) have it''s size changes by adding or removing (changing ''n'') devices to/from it. However it can have it''s available space increased by replacing the current smallest device with a larger device (changing ''minsize'') so that some other device now becomes the ''smallest device''. NOTE: if the vdev started with identical sized devices, you''ll need to replace all of them before you''ll see any increase in the available space since the ''size of the smallest device'' will still be the same untill they are all replaced. Posts by knowledgable people on this mailing list have suggested that there is little benefit to having 10 or more devices in a RAIDZ vdev., and that devices should be split into multiple vdevs to keep the number in anyone in the single digits. RAIDZ2 type: This type of vdev is made of 4 or more storage devices. It is basically just like RAIDZ1, except it has enoug redundancy to survive 2 device failures at the same time, and the available space is the size of the smallest device times *two* less than number of devices in the vdev ( minsize*(n-2) ) becuase 2 devices worth of space are used to provide the redundancy. Changing the space in this type of vdev is limited the same way that a RAIDZ vdev is. As noted above, the term ''Storage Device'' in these descriptoins is generally a hard disk drive, but it can be other things. ZFS allows you to use files on another filesystem, slices (solaris partitions) of a drive, fdisk partitions, Hardware RAID LUNs, iSCSI targets, USB thumb drives, etc. A Zpool is made up of more than one of these vdev''s. The size of a zpool is the sum of the sizes of the vdevs it is made from. The zpool doesn''t add any redundancy itself, the vdev''s are responsible for that. Which is why, while a zpool can be made of vdev''s of differing types, it''s not a good idea. The ''zpool create'' command will warn you if you try to use a mix of redundant and non-redundant vdev types in the same pool. This is really a bad idea since you can''t control what data is placed on the redundant vdev''s and which is places on the non-redundant vdev''s. If you have data that has different redundancy needs, you''re better off creating more than one zpool. Vdev''s can be added to a zpool, but not removed (yet?) Therefore to increase the size of a of zpool, you have to either add another full vdev to it, or replace one (or more) devices in one of the existing vdev''s so that the vdev contributes more space to the zpool. I hope this helps. -Kyle
> There are several types of vdevs:wow, outstanding list Kyle! > suggested that there is little benefit to having 10 > or more devices in a RAIDZ vdev. the tgx is split between vdevs, the blocks to the single raidz vdev is divided by the data elements in the raidz set, so lets say its 128k, with a 4+1 raidz set each disk will see 32k. so the 9+1 would get 14.2k. and what if the block is less than 128k? wouldn''t it be better to have two sets of 4+1 and go twice as fast splitting the blocks less in the process? (two vdevs) Rob
NB. the zpool(1m) man page provides a rather extensive explanation of vdevs. Rob at Logan.com wrote:> > There are several types of vdevs: > > wow, outstanding list Kyle! > > > suggested that there is little benefit to having 10 > > or more devices in a RAIDZ vdev. > > the tgx is split between vdevs, the blocks to the single > raidz vdev is divided by the data elements in the raidz > set, so lets say its 128k, with a 4+1 raidz set > each disk will see 32k. so the 9+1 would get 14.2k. > and what if the block is less than 128k? wouldn''t > it be better to have two sets of 4+1 and go > twice as fast splitting the blocks less in the > process? (two vdevs)That is not exactly how it works. You are describing RAID-5, not raidz. raidz is dynamic and does not require that the data is spread across all vdevs in the set. This becomes more important as you get to large numbers (which is not recommended) such as 47+1. However, your conclusion is correct: it is generally better for performance to have qty 2 of 4+1 rather than qty 1 of 9+1. -- richard
Ok, that was a very good explanation. Thanx a lot! So, I have a 8 ports SATA card, and I have one ZFS raid with 4 discs, 500gb each. These 4 discs are one vdev, right? And then I can add 4 more discs and create another vdev of them. 1) Vdev of 4 samsung 500GB discs. -> zpool, consisting of 1.5TB 2) Vdev of 4 drives 1TB discs -> add to zpool, 3TB + 1.5TB = 4.5TB. So I can add 2) to my existing zpool, right? Is it better if all drives have the same capacity, throughout the zpool? This message posted from opensolaris.org
Orvar Korvar wrote:> Ok, that was a very good explanation. Thanx a lot! > > So, I have a 8 ports SATA card, and I have one ZFS raid with 4 discs, > 500gb each. > These 4 discs are one vdev, right?Yes you have a pool with 1 4 disk *RAIDZ* type vdev.> And then I can add 4 more discs and create another vdev of them. >Yes you can add a second 4 disk RAIDZ type vdev to the zpool.> 1) Vdev of 4 samsung 500GB discs. -> zpool, consisting of 1.5TB > 2) Vdev of 4 drives 1TB discs -> add to zpool, 3TB + 1.5TB = 4.5TB. > > So I can add 2) to my existing zpool, right? Is it better if all > drives have the same capacity, throughout the zpool?Yes you can add 4 1TB drives just as you described. No there isn''t (to my knowledge) any benefit from making all the drives in a *zpool* the same size. However making all the drives in a *vdev* (of almost any type) the same size has definite advantages. The only reason you might want to upgrade your 500GB disks to 1TB disks, is if you wanted to backup your data, destroy the pool, and then create a single 7+1 RAIDZ vdev, so that you only lose 1 disk to parity instead of 2. Or create a 6+2 disk RAIDZ2, where you still lose 2 disks to parity, but gain additional redundancy. But that destroy-recreate operation is costly in several ways, so you''ll want to think hard before undertaking it. :) -Kyle
> making all the drives in a *zpool* the same size.The only issue of having vdevs of diffrent sizes is when one fills up, reducing the strip size for writes. > making all the drives in a *vdev* (of almost any type) the same The only issue is the unused space of the largest device, but then we call that "short stroke" for speed :-) >>Yes you can add 4 1TB drives just as you described. yup.. Rob
So, it basically boils down to this, what operations can I do with a vdev? Any links? Ive googled a bit, but there is no comprehensive list of what I can do. This message posted from opensolaris.org
Hi Orvar, This section describes the operations you can do with a mirrored storage pool: http://docs.sun.com/app/docs/doc/817-2271/gazhv?a=view This section describes the operations you can do with a raidz storage pool: http://docs.sun.com/app/docs/doc/817-2271/gcvjg?a=view Go with mirrored storage pools because they provide much more flexibility if you need to add or remove disks. Please see the ZFS Best Practices wiki for more recommendations: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Cindy Orvar Korvar wrote:> So, it basically boils down to this, what operations can I do with a vdev? Any links? Ive googled a bit, but there is no comprehensive list of what I can do. > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
As a side bar, since you cannot remove a vdev, you can really shot oneself in the foot if not careful (as I recently did). I had a X4500 system with (4) RAIDz2 vdevs in one pool. When adding a disk to the system, I forgot to add it as a spare and I was left with a stripe across the RAIDz2 sets + a single disk. I had to blow everything away... Any information on when disk evacuation will make its way into ZFS? Robert On May 25, 2008, at 3:11 PM, Kyle McDonald wrote:> Orvar Korvar wrote: > > Ok, so i make one vdev out of 8 discs. And I combine all vdevs > into one large zpool? Is it correct? > > > > I have 8 port SATA card. I have 4 drives into one zpool. That is > one vdev, right? Now I can add 4 new drives and make them into one > zpool. And now I combine both zpool into one zpool? That can not be > right? I dont get vdevs. Can someone explain? > > > A ''vdev'' is the basic unit that a zpool is made of. > > There are several types of vdevs: > > Single device type: > > This vdev is made from one storage device - Generally a Hard disk > drive, but there are other possibilities. This type of vdev has *no* > data redundancy, but ZFS will still be able to notice errors due to > checksumming every block. ZFS will also keep redundant meta-data on > this > one device, so meta data has the ability to survive block failures, > but > nothing will save data in this type of vdev from a full device > failure. > The size of this vdev is the size of the storage device it is made > from. > > Mirrored type: > This type of vdev is made from 2 or more storage devices, All data > is written to all devices, so there is data redundancy. The more > devices > in the mirror the more copies of the data, and the more full device > failures the vdev can survive. The size of this vdev is the size of > the > smallest storage device in the mirror. While devices (copies) can be > added and removed from the mirror vdev, this only changes the > redundancy, and not the size. Though if the smallest storage device in > the mirror is replaced with a larger one, the size of the mirror > should > grow to the size of the new smallest device in the mirror. > > RAIDZ or RAIDZ1 type: > This type of vdev is made of 3 or more storage devices. This type > of > vdev has data redundancy, and can survive the loss of one device in > the > vdev at a time. The available space on a RAIDZ vdev is the size of the > smallest storage device in the vdev times one less than the number of > devices in the vdev ( minsize*(n-1) ) because 1 devices worth of space > is used for parity information to provide the redundancy. This vdev > type > cannot (currently) have it''s size changes by adding or removing > (changing ''n'') devices to/from it. However it can have it''s available > space increased by replacing the current smallest device with a larger > device (changing ''minsize'') so that some other device now becomes the > ''smallest device''. NOTE: if the vdev started with identical sized > devices, you''ll need to replace all of them before you''ll see any > increase in the available space since the ''size of the smallest > device'' > will still be the same untill they are all replaced. Posts by > knowledgable people on this mailing list have suggested that there is > little benefit to having 10 or more devices in a RAIDZ vdev., and that > devices should be split into multiple vdevs to keep the number in > anyone > in the single digits. > > RAIDZ2 type: > This type of vdev is made of 4 or more storage devices. It is > basically just like RAIDZ1, except it has enoug redundancy to > survive 2 > device failures at the same time, and the available space is the > size of > the smallest device times *two* less than number of devices in the > vdev > ( minsize*(n-2) ) becuase 2 devices worth of space are used to provide > the redundancy. Changing the space in this type of vdev is limited the > same way that a RAIDZ vdev is. > > As noted above, the term ''Storage Device'' in these descriptoins is > generally a hard disk drive, but it can be other things. ZFS allows > you > to use files on another filesystem, slices (solaris partitions) of a > drive, fdisk partitions, Hardware RAID LUNs, iSCSI targets, USB > thumb > drives, etc. > > > A Zpool is made up of more than one of these vdev''s. The size of a > zpool is the sum of the sizes of the vdevs it is made from. The zpool > doesn''t add any redundancy itself, the vdev''s are responsible for > that. > Which is why, while a zpool can be made of vdev''s of differing types, > it''s not a good idea. The ''zpool create'' command will warn you if you > try to use a mix of redundant and non-redundant vdev types in the same > pool. This is really a bad idea since you can''t control what data is > placed on the redundant vdev''s and which is places on the non- > redundant > vdev''s. If you have data that has different redundancy needs, you''re > better off creating more than one zpool. > > Vdev''s can be added to a zpool, but not removed (yet?) Therefore to > increase the size of a of zpool, you have to either add another full > vdev to it, or replace one (or more) devices in one of the existing > vdev''s so that the vdev contributes more space to the zpool. > > > I hope this helps. > > -Kyle > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080602/a294f2f9/attachment.html>
Robert, thanx a lot for your warning! Now I know that I must be extremely cautios before adding my 4 new drives, to my existing raidZ with 4 drives. I create a new vdev with 4 new discs, and add them to the existing raidZ, as you have suggested. Thanx a lot for your help! :o) This message posted from opensolaris.org