I currently have a 7x1.5tb raidz1. I want to add "phase 2" which is another 7x1.5tb raidz1 Can I add the second phase to the first phase and basically have two raid5''s striped (in raid terms?) Yes, I probably should upgrade the zpool format too. Currently running snv_104. Also should upgrade to 110. If that is possible, would anyone happen to have the simple command lines to do it quick? I assume I''d be creating another raidz1 and then somehow growing the "tank" zpool? Does this make sense, or is this stupid from a performance perspective? Should I just have two separate zpools? Ideally I would like to have one massive data storage target. I''d be fine with somehow changing this into a raidz2 as well, I suppose, since I had planned on it being another raidz1 anyway. Or, perhaps I could add tank #2 as a raidz2, and then move all the data off tank #1, and then add disks individually from tank #1 until I have all 14 disks in a single raidz2? Performance is not an absolute must - I can deal with a little bit of overhead. Thanks in advance. [root at nas01 ~]# zpool status pool: rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c0t0d0s0 ONLINE 0 0 0 c0t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: tank state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 errors: No known data errors [root at nas01 ~]# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT rpool 149G 11.4G 138G 7% ONLINE - tank 9.50T 9.34T 159G 98% ONLINE - [root at nas01 ~]#
On Sat, Mar 28, 2009 at 8:12 AM, Michael Shadle <mike503 at gmail.com> wrote:> I currently have a 7x1.5tb raidz1. > > I want to add "phase 2" which is another 7x1.5tb raidz1 > > Can I add the second phase to the first phase and basically have two > raid5''s striped (in raid terms?)Yes, that''s how it''s done.> Yes, I probably should upgrade the zpool format too. Currently running > snv_104. Also should upgrade to 110. > > If that is possible, would anyone happen to have the simple command > lines to do it quick? I assume I''d be creating another raidz1 and then > somehow growing the "tank" zpool?zpool add tank raidz1 disk_1 disk_2 disk_3 ... (The syntax is just like creating a pool, only with add instead of create.)> Does this make sense, or is this stupid from a performance > perspective? Should I just have two separate zpools? Ideally I would > like to have one massive data storage target.It makes perfect sense. My thumpers have a number of raidz vdevs combined into a single pool. Your performance scales with the number of vdevs, and its better to combine them into a single pool as you combine the performance. Generally, unless you want different behaviour from different pools, it''s easier to combine them. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Sat, Mar 28, 2009 at 1:37 AM, Peter Tribble <peter.tribble at gmail.com> wrote:> zpool add tank raidz1 disk_1 disk_2 disk_3 ... > > (The syntax is just like creating a pool, only with add instead of create.)so I can add individual disks to the existing tank zpool anytime i want?> It makes perfect sense. My thumpers have a number of raidz vdevs combined > into a single pool. Your performance scales with the number of vdevs, and > its better to combine them into a single pool as you combine the performance. > Generally, unless you want different behaviour from different pools, it''s easier > to combine them.so essentially you''re tleling me to keep it at raidz1 (not raidz2 as many people usually stress when getting up to a certain # of disks, like 8 or so most people start bringing it up a lot)
On Sat, Mar 28, 2009 at 11:06 AM, Michael Shadle <mike503 at gmail.com> wrote:> On Sat, Mar 28, 2009 at 1:37 AM, Peter Tribble <peter.tribble at gmail.com> wrote: > >> zpool add tank raidz1 disk_1 disk_2 disk_3 ... >> >> (The syntax is just like creating a pool, only with add instead of create.) > > so I can add individual disks to the existing tank zpool anytime i want?Yes, but you wouldn''t want to do that. (And zpool might not like it.) If you just add a disk, it just gets added as a new device. So you have unprotected storage. In particular, you can''t grow the existing raidz. What you''re doing here is adding a second raidz1 vdev. That''s good because the 2nd phase of your storage is just like the first phase.>> It makes perfect sense. My thumpers have a number of raidz vdevs combined >> into a single pool. Your performance scales with the number of vdevs, and >> its better to combine them into a single pool as you combine the performance. >> Generally, unless you want different behaviour from different pools, it''s easier >> to combine them. > > so essentially you''re tleling me to keep it at raidz1 (not raidz2 as > many people usually stress when getting up to a certain # of disks, > like 8 or so most people start bringing it up a lot)The choice of raidz1 versus raidz2 is another matter. Given that you''ve already got raidz1, and you can''t (yet) grow that or expand it to raidz2, then there doesn''t seem to be much point to having the second half of your storage being more protected. If you were starting from scratch, then you have a choice between a single raidz2 vdev and a pair of raidz1 vdevs. (Lots of other choices too, but that is really what you''re asking here I guess.) With 1.5T drives, I would want a second layer of protection. If you didn''t have backups (by which I mean an independent copy of your important data) then raidz2 is strongly indicated. I have a thumper that''s a primary fileserver. It has a single pool that is made up of a number of raidz2 vdevs. I''m replacing it with a pair of machines (this gives me system redundancy as well); each one will have a number of raidz1 vdevs because I can always get the data back off the other machine if something goes wrong. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Sat, Mar 28, 2009 at 4:30 AM, Peter Tribble <peter.tribble at gmail.com> wrote:>> so I can add individual disks to the existing tank zpool anytime i want? > > Yes, but you wouldn''t want to do that. (And zpool might not like it.) > > If you just add a disk, it just gets added as a new device. So you have > unprotected storage.so you''re saying i should add 7 disks to match the existing setup (or at least 2 disks so it has some sort of redundancy) and i would run zpool add tank raidz1 disk1 disk2 disk3 disk4 disk5 disk6 disk7 ... if my goal is to use 7 disks. this would allow it to become part of one large storage pool with two identical types of redundancy setups (separate from each other, like two physically different raidsets combined, though, which is fine)> In particular, you can''t grow the existing raidz. What you''re doing > here is adding > a second raidz1 vdev. That''s good because the 2nd phase of your storage is > just like the first phase.i guess this is redundant, but would i be able to "see" these as one large storage pool, or would i essentially have tank and tank2? is there a way to combine them? just the command above?
On Sat, 28 Mar 2009, Peter Tribble wrote:> > The choice of raidz1 versus raidz2 is another matter. Given that > you''ve already got raidz1, and you can''t (yet) grow that or expand > it to raidz2, then there doesn''t seem to be much point to having the > second half of your storage being more protected.There is no harm from using a raidz2 vdev even if an existing vdev is only raidz1. If raidz2 is an available option then it is wise to choose it. Of course starting out with raidz2 would have been even better. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Sat, Mar 28, 2009 at 10:29 AM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Sat, 28 Mar 2009, Peter Tribble wrote: > >> >> The choice of raidz1 versus raidz2 is another matter. Given that you''ve >> already got raidz1, and you can''t (yet) grow that or expand it to raidz2, >> then there doesn''t seem to be much point to having the second half of your >> storage being more protected. >> > > There is no harm from using a raidz2 vdev even if an existing vdev is only > raidz1. If raidz2 is an available option then it is wise to choose it. Of > course starting out with raidz2 would have been even better. > > Bob >#1: yes, there is harm as he may very well run into inconsistent performance which is a complete PITA to track down when you''ve got differing raidtypes underlying a volume. #2: raidz2 isn''t always "wise" to choose. It''s a matter of performance, space, security requirements. 7+1 is fine for raidz1. If he was pushing 10 data disks that''d be another story. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090328/b6bf5e2c/attachment.html>
Michael Shadle wrote:> On Sat, Mar 28, 2009 at 1:37 AM, Peter Tribble<peter.tribble at gmail.com> wrote:> >> zpool add tank raidz1 disk_1 disk_2 disk_3 ... >> >> (The syntax is just like creating a pool, only with add instead ofcreate.)> > so I can add individual disks to the existing tank zpool anytime i want?Using the command above that Peter gave you would get you a result similar to this NAME STATE READ WRITE CKSUM storage2 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad16 ONLINE 0 0 0 ad14 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da0 ONLINE 0 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 0 0 The actual setup is a RAIDZ1 of 1.5TB drives and a RAIDZ1 of 500GB drives with the data striped across the two RAIDZs. In your case it would be 7 drives in each RAIDZ based on what you said before but I don''t have *that* much money for my home file server.> so essentially you''re tleling me to keep it at raidz1 (not raidz2 as > many people usually stress when getting up to a certain # of disks, > like 8 or so most people start bringing it up a lot)This really depends on how valuable your data is. Richard Elling has a lot of great information about MTTDL here http://blogs.sun.com/relling/tags/mttdl
On Sat, 28 Mar 2009, Tim wrote:> > #1: yes, there is harm as he may very well run into inconsistent performance > which is a complete PITA to track down when you''ve got differing raidtypes > underlying a volume.Inconsistent performance can come from many things, including a single balky disk drive. The small difference between RAID types does not seem like enough to worry about. If it was a mix between raidz2 and mirrors then there is more cause for concern. It is true that if the performance of the vdevs is not well balanced, then some vdevs could fill up faster than others when the system is under extremely heavy write loads.> #2: raidz2 isn''t always "wise" to choose. It''s a matter of performance, > space, security requirements. 7+1 is fine for raidz1. If he was pushing 10 > data disks that''d be another story.Many in the industry have already declared RAID5 to be "unsafe at any speed" with today''s huge SATA disk drives. The data recovery model for raidz1 is similar to RAID5. If the user can afford it, then raidz2 offers considerably more peace of mind. If you are using 750GB+ SATA drives then your "7+1 is fine for raidz1" notion does not seem so bright. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Sat, 28 Mar 2009, Jonathan wrote:> > This really depends on how valuable your data is. Richard Elling has a > lot of great information about MTTDL here > http://blogs.sun.com/relling/tags/mttdlAlmost any data with a grade higher than "disposable junk" becomes pretty valuable once you consider the time and effort to recover it. Multiple terrabytes of data takes quite a long time to recover at perhaps a few tens of megabytes per second and critical backup resources are consumed in the mean time. Meanwhile, business goes on. Do the math in advance and decide if you are really willing to put yourself in the middle of a long recovery situation. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
2009/3/28 Tim <tim at tcsac.net>:>> There is no harm from using a raidz2 vdev even if an existing vdev is only >> raidz1. ?If raidz2 is an available option then it is wise to choose it. ?Of >> course starting out with raidz2 would have been even better.> #2: raidz2 isn''t always "wise" to choose.? It''s a matter of performance, > space, security requirements.? 7+1 is fine for raidz1.? If he was pushing 10 > data disks that''d be another story.if i went raidz2 i''d want the entire 14 disk array in it i think. i''d rather not do a raidz2 with less than 100% of the disks and then a second raidz1 (or 2) because i''d wind up losing much more disk space. essentially, i am willing to give up 2 of 14 disks (roughly of course) to parity.
On Sat, 28 Mar 2009, Michael Shadle wrote:> if i went raidz2 i''d want the entire 14 disk array in it i think. > > i''d rather not do a raidz2 with less than 100% of the disks and then a > second raidz1 (or 2) because i''d wind up losing much more disk space. > essentially, i am willing to give up 2 of 14 disks (roughly of course) > to parity.Hopefully you consider all of the costs before making this sort of decision. If you are a lousy tipper you can''t expect very good service the next time you come to visit. :-) If 14 disks cost a "lot", then you should carefully balance the cost of the "wasted" disk against the cost of lost performance or the cost of lost availability. In many business environments, the potential for lost availability more than justifies purchasing more "wasted" disk. In many business environments, the potential for lousy performance more than justifies purchasing more "wasted" disk. Any good businessman should be able to specify a "dollars per hour" cost to the business if the storage is not available, or unable to provide sufficient performance to meet business needs. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Sat, Mar 28, 2009 at 11:12 AM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Sat, 28 Mar 2009, Tim wrote: > >> >> #1: yes, there is harm as he may very well run into inconsistent >> performance >> which is a complete PITA to track down when you''ve got differing raidtypes >> underlying a volume. >> > > Inconsistent performance can come from many things, including a single > balky disk drive. The small difference between RAID types does not seem > like enough to worry about. If it was a mix between raidz2 and mirrors then > there is more cause for concern. > > It is true that if the performance of the vdevs is not well balanced, then > some vdevs could fill up faster than others when the system is under > extremely heavy write loads.An extra parity disk is hardly a "small difference". You pay your penalty at some point for the extra parity, and it will come back to bite you in the ass. It''s why NOBODY including Sun, supports it on enterprise arrays.> > > #2: raidz2 isn''t always "wise" to choose. It''s a matter of performance, >> space, security requirements. 7+1 is fine for raidz1. If he was pushing >> 10 >> data disks that''d be another story. >> > > Many in the industry have already declared RAID5 to be "unsafe at any > speed" with today''s huge SATA disk drives. The data recovery model for > raidz1 is similar to RAID5. If the user can afford it, then raidz2 offers > considerably more peace of mind. > > If you are using 750GB+ SATA drives then your "7+1 is fine for raidz1" > notion does not seem so bright. >Many in the industry make their money selling you disk drives, of course they''re going to declare you need to buy more. The "math" you ask people to do points towards a 7+1 being more than acceptable if you have a hot-spare. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090328/699cdff4/attachment.html>
* On 28 Mar 2009, Peter Tribble wrote:> The choice of raidz1 versus raidz2 is another matter. Given that > you''ve already got raidz1, and you can''t (yet) grow that or expand > it to raidz2, then there doesn''t seem to be much point to having the > second half of your storage being more protected. > > If you were starting from scratch, then you have a choice between a > single raidz2 vdev and a pair of raidz1 vdevs. (Lots of other choices > too, but that is really what you''re asking here I guess.)I''ve had too many joint failures in my life to put much faith in raidz1, especially with 7 disks that likely come from the same manufacturing batch and might exhibit the same flaws. A single-redundancy system of 7 disks (gross) has too high a MTTDL for my taste. If you can sell yourself on raidz2 and the loss of two more disks'' worth of data -- a loss which IMO is more than made up for by the gain in security -- consider this technique: 1. build a new zpool of a single raidz2; 2. migrate your data from the old zpool to the new one; 3. destroy the old zpool, releasing its volumes; 4. use ''zpool add'' to add those old volumes to the new zpool as a second raidz2 vdev (see Richard Elling''s previous post). Now you have a single zpool consisting of two raidz2 vdevs. The migration in step 2 can be done either by ''zfs send''ing each zfs in the zpool, or by constructing analogous zfs in the new zpool and rsyncing the files across in one go. -- -D. dgc at uchicago.edu NSIT University of Chicago
Well this is for a home storage array for my dvds and such. If I have to turn it off to swap a failed disk it''s fine. It does not need to be highly available and I do not need extreme performance like a database for example. 45mb/sec would even be acceptable. On Mar 28, 2009, at 10:47 AM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us > wrote:> On Sat, 28 Mar 2009, Michael Shadle wrote: >> if i went raidz2 i''d want the entire 14 disk array in it i think. >> >> i''d rather not do a raidz2 with less than 100% of the disks and >> then a >> second raidz1 (or 2) because i''d wind up losing much more disk space. >> essentially, i am willing to give up 2 of 14 disks (roughly of >> course) >> to parity. > > Hopefully you consider all of the costs before making this sort of > decision. If you are a lousy tipper you can''t expect very good > service the next time you come to visit. :-) > > If 14 disks cost a "lot", then you should carefully balance the cost > of the "wasted" disk against the cost of lost performance or the > cost of lost availability. In many business environments, the > potential for lost availability more than justifies purchasing more > "wasted" disk. In many business environments, the potential for > lousy performance more than justifies purchasing more "wasted" > disk. Any good businessman should be able to specify a "dollars per > hour" cost to the business if the storage is not available, or > unable to provide sufficient performance to meet business needs. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Sat, 28 Mar 2009, Michael Shadle wrote:> Well this is for a home storage array for my dvds and such. If I have to turn > it off to swap a failed disk it''s fine. It does not need to be highly > available and I do not need extreme performance like a database for example. > 45mb/sec would even be acceptable.I can see that 14 disks costs a lot for a home storage array but to you the data on your home storage array may be just as important as data on some businesses enterprise storage array. In fact, it may be even more critical since it seems unlikely that you will have an effective backup system in place like large businesses do. The main problem with raidz1 is that if a disk fails and you replace it, that if a second disk substantially fails during resilvering (which needs to successfully read all data on remaining disks) then your ZFS pool (or at least part of the files) may be toast. The more data which must be read during resilvering, the higher the probability that there will be a failure. If 12TB of data needs to be read to resilver a 1TB disk, then that is a lot of successful reading which needs to go on. In order to lessen risk, you can schedule a periodic zfs scrub via a cron job so that there is less probabily of encountering data which can not be read. This will not save you from entirely failed disk drives though. As far as Tim''s post that NOBODY recommends using better than RAID5, I hardly consider companies like IBM and NetApp to be "NOBODY". Only Sun RAID hardware seems to lack RAID6, but Sun offers ZFS''s raidz2 so it does not matter. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Mar 28, 2009, at 5:22 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us > wrote:> On Sat, 28 Mar 2009, Michael Shadle wrote: > >> Well this is for a home storage array for my dvds and such. If I >> have to turn it off to swap a failed disk it''s fine. It does not >> need to be highly available and I do not need extreme performance >> like a database for example. 45mb/sec would even be acceptable. > > I can see that 14 disks costs a lot for a home storage array but to > you the data on your home storage array may be just as important as > data on some businesses enterprise storage array. In fact, it may > be even more critical since it seems unlikely that you will have an > effective backup system in place like large businesses do. >Well I might back up the more important stuff offsite. But in theory it''s all replaceable. Just would be a pain. Could I setup a raidz2 on the new zdev then destroy the old one and then raidz2 that technically if I want? Then both sets would have double redundancy, if I was feeling paranoid. But raid5 has served people well for a long time... Is resilvering speed roughly the same as a raid5 controller rebuild?> The main problem with raidz1 is that if a disk fails and you replace > it, that if a second disk substantially fails during resilvering > (which needs to successfully read all data on remaining disks) then > your ZFS pool (or at least part of the files) may be toast. The > more data which must be read during resilvering, the higher the > probability that there will be a failure. If 12TB of data needs to > be read to resilver a 1TB disk, then thatThis is good info to know. I guess I''m willing to take the risk of a resilver. It''s got a dedicated quad core proc doing nothing else than exporting samba and zfs... I wonder how long it would take.>> In order to lessen risk, you can schedule a periodic zfs scrub via a > cron job so that there is less probabily of encountering data which > can not be read. This will not save you from entirely failed disk > drives though. >I do a weekly scrub and an fmadm faulty every 5 or 10 mins to email me if anything comes up...
Bob Friesenhahn wrote:> On Sat, 28 Mar 2009, Michael Shadle wrote: > >> Well this is for a home storage array for my dvds and such. If I have >> to turn it off to swap a failed disk it''s fine. It does not need to >> be highly available and I do not need extreme performance like a >> database for example. 45mb/sec would even be acceptable. > > I can see that 14 disks costs a lot for a home storage array but to > you the data on your home storage array may be just as important as > data on some businesses enterprise storage array. In fact, it may be > even more critical since it seems unlikely that you will have an > effective backup system in place like large businesses do. > > The main problem with raidz1 is that if a disk fails and you replace > it, that if a second disk substantially fails during resilvering > (which needs to successfully read all data on remaining disks) then > your ZFS pool (or at least part of the files) may be toast. The more > data which must be read during resilvering, the higher the probability > that there will be a failure. If 12TB of data needs to be read to > resilver a 1TB disk, then that is a lot of successful reading which > needs to go on.This is a very good point for anyone following this and wondering why RAIDZ2 is a good idea. I have seen over the years several large RAID 5 hardware arrays go belly up as a 2nd drive fails during a rebuild with the end result of the entire RAID set being rendered useless. If you can afford it then you should use it. RAID6 or RAIDZ2 was made for big SATA drives. If you do use it though, one should make sure that you have reasonable CPU as it does require a bit more grunt to run over RAIDZ. The bigger the disks and the bigger the stripe the more likely you are to encounter a issue during a rebuild of a failed drive. plain and simple.> > In order to lessen risk, you can schedule a periodic zfs scrub via a > cron job so that there is less probabily of encountering data which > can not be read. This will not save you from entirely failed disk > drives though. > > As far as Tim''s post that NOBODY recommends using better than RAID5, I > hardly consider companies like IBM and NetApp to be "NOBODY". Only > Sun RAID hardware seems to lack RAID6, but Sun offers ZFS''s raidz2 so > it does not matter.Plenty of Sun hardware comes with RAID6 support out of the box these days Bob. Certainly all of the 4140, 4150, 4240 and 4250 2 socket x86 /x64 systems have hardware controllers for this. Also all of the 6140''s, 6540 and 6780''s disk arrays do also have RAID 6 if they have Crystal firmware and of course the Open Storage 7000 series machines do as well being that they are Opensolaris and ZFS based.> > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- _________________________________________________________________________ Scott Lawson Systems Architect Information Communication Technology Services Manukau Institute of Technology Private Bag 94006 South Auckland Mail Centre Manukau 2240 Auckland New Zealand Phone : +64 09 968 7611 Fax : +64 09 968 7641 Mobile : +64 27 568 7611 mailto:scott at manukau.ac.nz http://www.manukau.ac.nz __________________________________________________________________________ perl -e ''print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'' __________________________________________________________________________
On Sat, Mar 28, 2009 at 7:22 PM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Sat, 28 Mar 2009, Michael Shadle wrote: > > Well this is for a home storage array for my dvds and such. If I have to >> turn it off to swap a failed disk it''s fine. It does not need to be highly >> available and I do not need extreme performance like a database for example. >> 45mb/sec would even be acceptable. >> > > I can see that 14 disks costs a lot for a home storage array but to you the > data on your home storage array may be just as important as data on some > businesses enterprise storage array. In fact, it may be even more critical > since it seems unlikely that you will have an effective backup system in > place like large businesses do. > > The main problem with raidz1 is that if a disk fails and you replace it, > that if a second disk substantially fails during resilvering (which needs to > successfully read all data on remaining disks) then your ZFS pool (or at > least part of the files) may be toast. The more data which must be read > during resilvering, the higher the probability that there will be a failure. > If 12TB of data needs to be read to resilver a 1TB disk, then that is a lot > of successful reading which needs to go on. > > In order to lessen risk, you can schedule a periodic zfs scrub via a cron > job so that there is less probabily of encountering data which can not be > read. This will not save you from entirely failed disk drives though. > > As far as Tim''s post that NOBODY recommends using better than RAID5, I > hardly consider companies like IBM and NetApp to be "NOBODY". Only Sun RAID > hardware seems to lack RAID6, but Sun offers ZFS''s raidz2 so it does not > matter. > >I did NOT say nobody recommends using raid5. What I *DID* say was that NOBODY supports using raid-5 and raid-6 under a single pool of storage. Which IBM array are you referring to that is supported with RAID5 and 6 in a single pool? --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090329/8e28b81e/attachment.html>
On Sat, Mar 28, 2009 at 7:22 PM, Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote:> On Sat, 28 Mar 2009, Michael Shadle wrote: > > Well this is for a home storage array for my dvds and such. If I have to >> turn it off to swap a failed disk it''s fine. It does not need to be highly >> available and I do not need extreme performance like a database for example. >> 45mb/sec would even be acceptable. >> > > I can see that 14 disks costs a lot for a home storage array but to you the > data on your home storage array may be just as important as data on some > businesses enterprise storage array. In fact, it may be even more critical > since it seems unlikely that you will have an effective backup system in > place like large businesses do. > > The main problem with raidz1 is that if a disk fails and you replace it, > that if a second disk substantially fails during resilvering (which needs to > successfully read all data on remaining disks) then your ZFS pool (or at > least part of the files) may be toast. The more data which must be read > during resilvering, the higher the probability that there will be a failure. > If 12TB of data needs to be read to resilver a 1TB disk, then that is a lot > of successful reading which needs to go on. > > In order to lessen risk, you can schedule a periodic zfs scrub via a cron > job so that there is less probabily of encountering data which can not be > read. This will not save you from entirely failed disk drives though. > > As far as Tim''s post that NOBODY recommends using better than RAID5, I > hardly consider companies like IBM and NetApp to be "NOBODY". Only Sun RAID > hardware seems to lack RAID6, but Sun offers ZFS''s raidz2 so it does not > matter. > >Oh, and NetApp supporting RAID5? Have any other good jokes for us? --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090329/1b188248/attachment.html>
On Mar 29, 2009, at 00:41, Michael Shadle wrote:> Well I might back up the more important stuff offsite. But in theory > it''s all replaceable. Just would be a pain.And what is the cost of the time to replace it versus the price of a hard disk? Time ~ money. There used to be a time when I like fiddling with computer parts. I now have other, more productive ways of wasting my time. :)
Okay so riddle me this - can I create a raidz2 using the new disks and move all the data from the existing zdev to it. Then recreate a raidz2 this time using the old 7 disks ? And have them all stay in the same Zpool? Side note: does the port I plug the drive into matter on the controller? Does it have to be the same drive lineup or does it work based on drive uuid or something like that? On Mar 29, 2009, at 8:58 AM, David Magda <dmagda at ee.ryerson.ca> wrote:> On Mar 29, 2009, at 00:41, Michael Shadle wrote: > >> Well I might back up the more important stuff offsite. But in >> theory it''s all replaceable. Just would be a pain. > > And what is the cost of the time to replace it versus the price of a > hard disk? Time ~ money. > > There used to be a time when I like fiddling with computer parts. I > now have other, more productive ways of wasting my time. :) >
On Sun, 29 Mar 2009, Michael Shadle wrote:> Okay so riddle me this - can I create a raidz2 using the new disks and move > all the data from the existing zdev to it. Then recreate a raidz2 this time > using the old 7 disks ? > > And have them all stay in the same Zpool?You will have to create a new pool, creating a raidz2 in the new pool using the new disks, copying the data from the old pool to the new, pool, then destroying the old pool, and creating a second raidz2 in the new pool using the old disks. If you already have a lot of data, the end result will have more data on one raidz2 vdev than the other. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 03/29/09 11:58, David Magda wrote:> On Mar 29, 2009, at 00:41, Michael Shadle wrote: > >> Well I might back up the more important stuff offsite. But in theory >> it''s all replaceable. Just would be a pain. > > And what is the cost of the time to replace it versus the price of a > hard disk? Time ~ money.So what is best if you get a 4th drive for a 3 drive raidz? Is it better to keep it separate and use it for backups of the replaceable data (perhaps on a different machine), as a hot spare, second parity, or something else? Seems so un-green to have it spinning uselessly :-) LTO-4 tape drives at $200 just for the media? I guess not...> There used to be a time when I like fiddling with computer parts. I now > have other, more productive ways of wasting my time. :)Quite.
On Sun, 29 Mar 2009, Frank Middleton wrote:> > So what is best if you get a 4th drive for a 3 drive raidz? Is it > better to keep it separate and use it for backups of the replaceable > data (perhaps on a different machine), as a hot spare, second parity, > or something else? Seems so un-green to have it spinning uselessly :-) > LTO-4 tape drives at $200 just for the media? I guess not...With so few drives it does not make sense to use raidz2, and particularly since raidz2 still does not protect against user error, OS bugs, severe over-voltage from a common power supply, or meteorite strike. An external backup drive system that you can turn off (e.g. a USB or eSATA drive) and detatch is the appropriately green solution. For this pupose I use two large USB drives with ZFS in a mirrored configuration. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Mar 29, 2009, at 12:40, Frank Middleton wrote:> So what is best if you get a 4th drive for a 3 drive raidz? Is it > better to keep it separate and use it for backups of the replaceable > data (perhaps on a different machine), as a hot spare, second parity, > or something else? Seems so un-green to have it spinning uselessly :-) > LTO-4 tape drives at $200 just for the media? I guess not...ZFS doesn''t (at this time) support going from one RAID level to another dynamically. If you want to go from single-parity (RAID-Z) to dual-parity (RAID-Z2) you would need to destroy the pool and create it. If you don''t to re-create it, hot spare I guess. LTO-4 is closer to about US$ 60: http://www.tapeandmedia.com/lto_4_tape_ultrium_4_tapes_hp.asp http://www.cdw.com/shop/products/default.aspx?EDC=1225207 http://search.pricewatch.com/media/lto-4-0.htm Half-height SAS drives seem to be about US$ 3,000.
On Mar 29, 2009, at 12:17, Michael Shadle wrote:> Okay so riddle me this - can I create a raidz2 using the new disks > and move all the data from the existing zdev to it. Then recreate a > raidz2 this time using the old 7 disks? > > And have them all stay in the same Zpool? >>Yes, I believe so. Create new pool, move data to it (zfs send/recv), destroy old RAID-Z1 pool. At this point you would have one pool with your data, and a bunch of unused disks. You could then use the old disks to create a new RAID-Z2 pool and do another zfs send/recv.> Side note: does the port I plug the drive into matter on the > controller? Does it have to be the same drive lineup or does it work > based on drive uuid or something like that?UUIDs are used, so placement shouldn''t matter. When you do a ''zpool import poolX'' the ZFS codes to all the disks accessible by the system and "tastes" them to see if they''re party of "poolX". If you want to play around with this, what you can actually do it using plain files instead of disks. The ZFS commands work just as well on files: # mkdir /zfstest # mkfile 200m /zfstest/myfile1 # mkfile 200m /zfstest/myfile2 # mkfile 200m /zfstest/myfile3 # mkfile 200m /zfstest/myfile4 # mkfile 200m /zfstest/myfile5 # zpool create testfiles raidz2 /zfstest/myfile1 zfstest/myfile2 zfstest/myfile3 zfstest/myfile4 zfstest/myfile5
On Mar 29, 2009, at 13:24, Bob Friesenhahn wrote:> With so few drives it does not make sense to use raidz2, and > particularly since raidz2 still does not protect against user error, > OS bugs, severe over-voltage from a common power supply, or > meteorite strike.I remember reading on this list that for 3-5 drives, RAID-Z is generally recommended, and for anything more that five drives RAID-Z2 would be the way to go. This had something to do with probability of failures and space used for parity. Don''t remember the details, and I can''t seem to find it in a search.> An external backup drive system that you can turn off (e.g. a USB or > eSATA drive) and detatch is the appropriately green solution. For > this pupose I use two large USB drives with ZFS in a mirrored > configuration.There are also drive docks which make it easy to attach drives as well: http://www.newertech.com/products/voyagerq.php There are less expensive units that only have fewer connection options (e.g., you only care about USB). This is good solution if you don''t care for the enclosure component, and simply want to leverage rotation of inexpensive disks.
Tim wrote:> > I did NOT say nobody recommends using raid5. What I *DID* say was > that NOBODY supports using raid-5 and raid-6 under a single pool of > storage. Which IBM array are you referring to that is supported with > RAID5 and 6 in a single pool?Thanks for the clarification, Tim, I thought you might have lost your head :-) However, I do not believe Sun does not "support" mixed zpools. Clearly, it is possible to create them, and clearly they work as one would expect. But Sun does not keep a list of things that are "supported," so it is difficult to prove the negative. In most cases, "support" depends on what the engineering group wishes to accept bugs on -- there is no consistent policy nor logic. I think you will find the ZFS team to be very receptive to accepting bugs and would be more than mildly interested in discovering if mixed vdevs did not work as designed. NB I quote "support" because I''ve never found two people with the same operational definition of "support." There is a near 100% chance that my definition is at odds with the reader :-/ -- richard
On Sun, Mar 29, 2009 at 10:35 AM, David Magda <dmagda at ee.ryerson.ca> wrote:> Create new pool, move data to it (zfs send/recv), destroy old RAID-Z1 pool.Would send/recv be more efficient than just a massive rsync or related? Also I''d have to reduce the data on my existing raidz1 as it is almost full, and the raidz2 it would be sending to would be 1.5tb smaller technically.
On Sun, Mar 29, 2009 at 1:37 PM, Michael Shadle <mike503 at gmail.com> wrote:> On Sun, Mar 29, 2009 at 10:35 AM, David Magda <dmagda at ee.ryerson.ca> wrote: > >> Create new pool, move data to it (zfs send/recv), destroy old RAID-Z1 pool. > > Would send/recv be more efficient than just a massive rsync or related? > > Also I''d have to reduce the data on my existing raidz1 as it is almost > full, and the raidz2 it would be sending to would be 1.5tb smaller > technically. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >I''d personally say send/recv would be more efficient, rsync is awfully slow on large data sets. But, it depends what build you are using! BugID 6418042 (slow zfs send/recv) was fixed in build 105, it impacted send/recv operations local to remote, not sure if it happens local to local, but I experienced it doing local-remote send/recv. Not sure the best way to handle moving data around, when space is tight though... -- Brent Jones brent at servuhome.net
On Sun, Mar 29, 2009 at 1:59 PM, Brent Jones <brent at servuhome.net> wrote:> I''d personally say send/recv would be more efficient, rsync is awfully > slow on large data sets. But, it depends what build you are using! > BugID 6418042 (slow zfs send/recv) was fixed in build 105, it impacted > send/recv operations local to remote, not sure if it happens local to > local, but I experienced it doing local-remote send/recv. > > Not sure the best way to handle moving data around, when space is > tight though...Well one thing is - I''ve never used send/recv before first off, and I''m comfortable with rsync - and rsync 3.x is a hell of a lot more efficient too with large amounts of files. Although most of these are large files, not large file counts. I''d probably try to upgrade this to snv_110 at the same time and update the zpool format too while I''m at it. Hopefully it would resolve any possible oddities... and not introduce new ones. Like how I can''t install snv_110 on my other machine properly, it just gives me a grub prompt on reboot, it doesn''t seem to install zfs root properly or something.
On Mar 29, 2009, at 16:37, Michael Shadle wrote:> On Sun, Mar 29, 2009 at 10:35 AM, David Magda <dmagda at ee.ryerson.ca> > wrote: > >> Create new pool, move data to it (zfs send/recv), destroy old RAID- >> Z1 pool. > > Would send/recv be more efficient than just a massive rsync or > related? > > Also I''d have to reduce the data on my existing raidz1 as it is almost > full, and the raidz2 it would be sending to would be 1.5tb smaller > technically.Potentially yes, especially with large quantities of files or directories. Not so much their size, as their number. Rsync, regardless of version, has to traverse the tree and get the file metadata, and compute checksums over the length of the file. Send/recv uses the file system structures to find these things and figure out exactly which blocks of data in a ZFS dataset has changed, without having to run checksums. Depending on the types of data you have, you may want to consider enabling compression on the new pool. If you''re new to ZFS you may want to try doing a few different things as an experiment as a learning experience perhaps. (Time permitting.)
Hello David and Michael,>> Well I might back up the more important stuff offsite. But in theory >> it''s all replaceable. Just would be a pain. > > And what is the cost of the time to replace it versus the price of a hard > disk? Time ~ money.This is true, but there is one counterpoint. If you do raidz2, you are definitely paying for extra disk(s). If you stay with raidz1, the cost of the time to recover the data would be incurred if and only if he has a failure in raidz1 followed by a second failure during the re-build process. So, the statistically correct thing to do is to multiply the cost of recovery by the probability and see if that exceeds the cost of the new drives. To be really accurate, the cost of raidz2 option should also include the cost of moving the data from the existing raidz1 to the new raidz2 and then re-formatting the raidz1 into raidz2. However, all this calculating is probably not worthwhile. My feeling is: it''s just a home video server and Michael still has the original media (I think). Raidz1 is good enough. Monish
My only question is is how long it takes to resilver... Supposedly the entire array has to be checked which means 6x1.5tb. It has a quad core CPU that''s basically dedicated to it. Anyone have any estimates? Sounds like it is a lot slower than a normal raid5 style rebuild. Is there a way to tune it so it can rebuild/resilver faster? On Mar 29, 2009, at 9:43 PM, "Monish Shah" <monish at indranetworks.com> wrote:> Hello David and Michael, > >>> Well I might back up the more important stuff offsite. But in >>> theory it''s all replaceable. Just would be a pain. >> >> And what is the cost of the time to replace it versus the price of >> a hard disk? Time ~ money. > > This is true, but there is one counterpoint. If you do raidz2, you > are definitely paying for extra disk(s). If you stay with raidz1, > the cost of the time to recover the data would be incurred if and > only if he has a failure in raidz1 followed by a second failure > during the re-build process. So, the statistically correct thing to > do is to multiply the cost of recovery by the probability and see if > that exceeds the cost of the new drives. > > To be really accurate, the cost of raidz2 option should also include > the cost of moving the data from the existing raidz1 to the new > raidz2 and then re-formatting the raidz1 into raidz2. > > However, all this calculating is probably not worthwhile. My > feeling is: it''s just a home video server and Michael still has the > original media (I think). Raidz1 is good enough. > > Monish >
On Mar 30, 2009, at 13:48, Michael Shadle wrote:> My only question is is how long it takes to resilver... Supposedly > the entire array has to be checked which means 6x1.5tb. It has a > quad core CPU that''s basically dedicated to it. Anyone have any > estimates? > > Sounds like it is a lot slower than a normal raid5 style rebuild. Is > there a way to tune it so it can rebuild/resilver faster?There is a background process in ZFS (see "scrub" in zpool(1M)) that goes through and make sure all the checksums match reality (and corrects things if it can). It''s reading all the data, but unlike hardware RAID arrays, it only checks the actual space used. It basically goes through the file system structure hierarchy, and if there''s an unused space on the array it doesn''t bother--since no data blocks point to it, there''s nothing to check. The scrubbing process is the same whether you''re using mirrors, RAID-Z1, or RAID-Z2. It can be kicked off manually or you can launch it via cron / SMF. Not sure about tuning (e.g., allocating bandwidth / priority). If you start a scrub the output of "zpool status" will give the progress (%) and ETA to finish. There is (was?) a bug where creating a new snapshot reset the scrub.
On Mon, Mar 30, 2009 at 4:00 PM, David Magda <dmagda at ee.ryerson.ca> wrote:> There is a background process in ZFS (see "scrub" in zpool(1M)) that goes > through and make sure all the checksums match reality (and corrects things > if it can). It''s reading all the data, but unlike hardware RAID arrays, it > only checks the actual space used. > > It basically goes through the file system structure hierarchy, and if > there''s an unused space on the array it doesn''t bother--since no data blocks > point to it, there''s nothing to check. The scrubbing process is the same > whether you''re using mirrors, RAID-Z1, or RAID-Z2. It can be kicked off > manually or you can launch it via cron / SMF. > > Not sure about tuning (e.g., allocating bandwidth / priority). If you start > a scrub the output of "zpool status" will give the progress (%) and ETA to > finish. > > There is (was?) a bug where creating a new snapshot reset the scrub.Well basically I am trying to analyze giving up 1/7th of my space for the off chance that one drive fails during resilvering. I just don''t know what kind of time to expect for a resilver. I''m sure it also depends on the build of nevada too and various bugs... Normally it seems like raid5 is perfectly fine for a workoad like this but maybe I''d sleep better at night knowing I could have 2 disks fail, but the odds of that are pretty slim. I''ve never had 2 disks fail, and if I did, the whole array is probably failed / the actual unit itself got damaged, and then probably more than the 2 disks have been destroyed anyway. Looks like there are two open requests for speeding up and slowing down the resilvering process already. So it does not sound like you can tune it. But it would be nice to have some sort of number to expect.
On Mar 30, 2009, at 19:13, Michael Shadle wrote:> Normally it seems like raid5 is perfectly fine for a workoad like this > but maybe I''d sleep better at night knowing I could have 2 disks fail, > but the odds of that are pretty slim. I''ve never had 2 disks fail, and > if I did, the whole array is probably failed / the actual unit itself > got damaged, and then probably more than the 2 disks have been > destroyed anyway.Given ZFS'' checksums, it''s quite amazing what it can survive: http://tinyurl.com/ytyzs6 http://www.joyeur.com/2008/01/22/bingodisk-and-strongspace-what-happened They also discuss it on their podcast as well: http://tinyurl.com/29msbb http://www.joyeur.com/2008/01/24/new-podcast-quad-core-episode-2
Michael Shadle wrote:> On Mon, Mar 30, 2009 at 4:00 PM, David Magda <dmagda at ee.ryerson.ca> wrote: > > >> There is a background process in ZFS (see "scrub" in zpool(1M)) that goes >> through and make sure all the checksums match reality (and corrects things >> if it can). It''s reading all the data, but unlike hardware RAID arrays, it >> only checks the actual space used. >> >> It basically goes through the file system structure hierarchy, and if >> there''s an unused space on the array it doesn''t bother--since no data blocks >> point to it, there''s nothing to check. The scrubbing process is the same >> whether you''re using mirrors, RAID-Z1, or RAID-Z2. It can be kicked off >> manually or you can launch it via cron / SMF. >> >> Not sure about tuning (e.g., allocating bandwidth / priority). If you start >> a scrub the output of "zpool status" will give the progress (%) and ETA to >> finish. >> >> There is (was?) a bug where creating a new snapshot reset the scrub. >> > > Well basically I am trying to analyze giving up 1/7th of my space for > the off chance that one drive fails during resilvering. I just don''t > know what kind of time to expect for a resilver. I''m sure it also > depends on the build of nevada too and various bugs... >Resilver performance is constrained as follows: 1. priority of other activity, resilvering occurs at a lower priority than most other file system operations 2. write bandwidth of the drive being resilvered 3. read iops of the source drives Since you are only resilvering data, this may go fast, or may take a while. One thing is certain: it is difficult to predict in advance.> Normally it seems like raid5 is perfectly fine for a workoad like this > but maybe I''d sleep better at night knowing I could have 2 disks fail, > but the odds of that are pretty slim. I''ve never had 2 disks fail, and > if I did, the whole array is probably failed / the actual unit itself > got damaged, and then probably more than the 2 disks have been > destroyed anyway. >As the disks get bigger, the odds become worse. Seagate claims that a 1.5 TByte Barracuda 7200.11 has an UER of 10^-14. Or, to put this in layman''s terms, expect an unrecoverable read for every 8.3 times you read the entire disk. Fortunately, the UER seems to be a conservative specification.> Looks like there are two open requests for speeding up and slowing > down the resilvering process already. So it does not sound like you > can tune it. But it would be nice to have some sort of number to > expect.The more you value your data, the more redundancy you will need. -- richard
On Mon, Mar 30, 2009 at 4:13 PM, Michael Shadle <mike503 at gmail.com> wrote:> Well basically I am trying to analyze giving up 1/7th of my space for > the off chance that one drive fails during resilvering. I just don''t > know what kind of time to expect for a resilver. I''m sure it also > depends on the build of nevada too and various bugs... > > Normally it seems like raid5 is perfectly fine for a workoad like this > but maybe I''d sleep better at night knowing I could have 2 disks fail, > but the odds of that are pretty slim. I''ve never had 2 disks fail, and > if I did, the whole array is probably failed / the actual unit itself > got damaged, and then probably more than the 2 disks have been > destroyed anyway. > > Looks like there are two open requests for speeding up and slowing > down the resilvering process already. So it does not sound like you > can tune it. But it would be nice to have some sort of number to > expect.Well after all this discussion I think I''ve come to the conclusion: I think I will just create 2 zpools. One called "duo" or "dual" or something, and one called "single" or some other creative/latin/etc. word for it. The stuff that is easy to replace just by re-ripping it to disk (other than the time and effort to do so) I will keep on that raidz1 one. The new disks I''ll make into a raidz2, and keep the more important/harder to find stuff/backups on the raidz2. I just don''t know if I want to go with setting up raidz2, moving everything off the existing one (9tb or so) to the new raidz2 + another temporary area and re-do the existing raidz1 into raidz2. I''m not sure it''s -that- important. If my chassis supported more disks, I wouldn''t be as frugal, but I have limited space and I would like to squeeze more space out of it if I can. Sounds like a reasonable idea, no? Follow up question: can I add a single disk to the existing raidz2 later on (if somehow I found more space in my chassis) so instead of a 7 disk raidz2 (5+2) it becomes a 6+2 ? Thanks...
Michael Shadle wrote:> On Mon, Mar 30, 2009 at 4:13 PM, Michael Shadle <mike503 at gmail.com> wrote: > > > > Sounds like a reasonable idea, no? > > Follow up question: can I add a single disk to the existing raidz2 > later on (if somehow I found more space in my chassis) so instead of a > 7 disk raidz2 (5+2) it becomes a 6+2 ? >No. There is no way to expand a RAIDZ or RAIDZ2 at this point. It is a feature that is often discussed and people would like, but has been seen by Sun as more of a feature home users would like rather than enterprise users. Enterprise users are expected to buy a 4 or more disks and create another RAIDZ2 vdev and add it to the pool to increase space. You would of course have this option.. However by the time that you fill it there might be a solution. Adam Leventhal proposed a way that this could be implemented on his blog, so I suspect at some point in the next few years somebody will implement it and you will possible have the option to do so then. (after and OS and ZFS version upgrade) http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z> Thanks... > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
On Tue, Mar 31, 2009 at 1:31 AM, Scott Lawson <Scott.Lawson at manukau.ac.nz> wrote:> No. There is no way to expand a RAIDZ or RAIDZ2 at this point. It is a > feature that is often discussed > and people would like, but has been seen by Sun as more of a feature home > users would like rather2 > than enterprise users. Enterprise users are expected to buy a 4 or more > disks and create another RAIDZ2 > vdev and add it to the pool to increase space. You would of course have this > option..Yeah, I get it. It definately would seem to be more for the lower-cost market, since enterprises have $$ :)> However by the time that you fill it there might be a solution. Adam > Leventhal proposed a way that > this could be implemented on his blog, so I suspect at some point in the > next few years somebody will > implement it and you will possible have the option to do so then. (after and > OS and ZFS version upgrade) > > http://blogs.sun.com/ahl/entry/expand_o_matic_raid_zWell - years / etc. is not my timeline... in a few years I could buy a normal-size chassis and put 4TB disks in there and not care about my physical limitations :)
On Mar 31, 2009, at 04:31, Scott Lawson wrote:> http://blogs.sun.com/ahl/entry/expand_o_matic_raid_zThere''s a more recent post on bp (block pointer) rewriting that will allow for moving blocks around (part of cleaning up the scrub code): http://blogs.sun.com/ahrens/entry/new_scrub_code This should help in vacating of devices, and perhaps restriping and defragmenting as well.
I''m going to try to move one of my disks off my rpool tomorrow (since it''s a mirror) to a different controller. According to what I''ve heard before, ZFS should automagically recognize this new location and have no problem, right? Or do I need to do some sort of detach/etc. process first?
Michael Shadle wrote:> I''m going to try to move one of my disks off my rpool tomorrow (since > it''s a mirror) to a different controller. > > According to what I''ve heard before, ZFS should automagically > recognize this new location and have no problem, right? > > Or do I need to do some sort of detach/etc. process first?I''ve got a 4-way RaidZ pool made from IDE disks that I''ve connected in three different ways: - via a Firewire-to-IDE case - via a 4-port PCI-to-IDE card - via 4 SATA-to-IDE converters Each transition resulted in different device IDs, but I could always see them with ''zpool import''. You should ''zpool export'' your pool first. Rob T
On Wed, Apr 1, 2009 at 3:19 AM, Michael Shadle <mike503 at gmail.com> wrote:> I''m going to try to move one of my disks off my rpool tomorrow (since > it''s a mirror) to a different controller. > > According to what I''ve heard before, ZFS should automagically > recognize this new location and have no problem, right?I successfully have realized how nice ZFS is with locating the proper location of the disk across different controllers/ports. Besides for rpool - ZFS boot. Moving those creates a huge PITA. Now quick question - if I have a raidz2 named ''tank'' already I can expand the pool by doing: zpool attach tank raidz2 device1 device2 device3 ... device7 It will make ''tank'' larger and each group of disks (vdev? or zdev?) will be dual parity. It won''t create a mirror, will it?
On Tue, 7 Apr 2009, Michael Shadle wrote:> > Now quick question - if I have a raidz2 named ''tank'' already I can > expand the pool by doing: > > zpool attach tank raidz2 device1 device2 device3 ... device7 > > It will make ''tank'' larger and each group of disks (vdev? or zdev?) > will be dual parity. It won''t create a mirror, will it?No. The two vdevs will be load shared rather than creating a mirror. This should double your multi-user performance. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Michael Shadle wrote:> On Wed, Apr 1, 2009 at 3:19 AM, Michael Shadle <mike503 at gmail.com> wrote: > >> I''m going to try to move one of my disks off my rpool tomorrow (since >> it''s a mirror) to a different controller. >> >> According to what I''ve heard before, ZFS should automagically >> recognize this new location and have no problem, right? >> > > I successfully have realized how nice ZFS is with locating the proper > location of the disk across different controllers/ports. Besides for > rpool - ZFS boot. Moving those creates a huge PITA. > > > Now quick question - if I have a raidz2 named ''tank'' already I can > expand the pool by doing: > > zpool attach tank raidz2 device1 device2 device3 ... device7 > > It will make ''tank'' larger and each group of disks (vdev? or zdev?) >You cannot expand a RAIDZ or RAIDZ2 at all. You must back up the data and destroy if you wish to alter the number of disks in a single RAIDz or RAIDZ2 stripe. You may however attach and additional RAIDZ or RAIDZ2 to an existing storage pool. Your pool would look something like below if you add additional RAIDZ''s. This is an output from a J4500 with 48 x 1TB drives with multiple RAIDZ''s in a single pool yielding ~30TB or so. pool: nbupool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nbupool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t7d0 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 c2t9d0 ONLINE 0 0 0 c2t10d0 ONLINE 0 0 0 c2t11d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t12d0 ONLINE 0 0 0 c2t13d0 ONLINE 0 0 0 c2t14d0 ONLINE 0 0 0 c2t15d0 ONLINE 0 0 0 c2t16d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t17d0 ONLINE 0 0 0 c2t18d0 ONLINE 0 0 0 c2t19d0 ONLINE 0 0 0 c2t20d0 ONLINE 0 0 0 c2t21d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t22d0 ONLINE 0 0 0 c2t23d0 ONLINE 0 0 0 c2t24d0 ONLINE 0 0 0 c2t25d0 ONLINE 0 0 0 c2t26d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t27d0 ONLINE 0 0 0 c2t28d0 ONLINE 0 0 0 c2t29d0 ONLINE 0 0 0 c2t30d0 ONLINE 0 0 0 c2t31d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t32d0 ONLINE 0 0 0 c2t33d0 ONLINE 0 0 0 c2t34d0 ONLINE 0 0 0 c2t35d0 ONLINE 0 0 0 c2t36d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t37d0 ONLINE 0 0 0 c2t38d0 ONLINE 0 0 0 c2t39d0 ONLINE 0 0 0 c2t40d0 ONLINE 0 0 0 c2t41d0 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t42d0 ONLINE 0 0 0 c2t43d0 ONLINE 0 0 0 c2t44d0 ONLINE 0 0 0 c2t45d0 ONLINE 0 0 0 c2t46d0 ONLINE 0 0 0 spares c2t47d0 AVAIL c2t48d0 AVAIL c2t49d0 AVAIL> will be dual parity. It won''t create a mirror, will it? > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
* On 07 Apr 2009, Michael Shadle wrote:> > Now quick question - if I have a raidz2 named ''tank'' already I can > expand the pool by doing: > > zpool attach tank raidz2 device1 device2 device3 ... device7 > > It will make ''tank'' larger and each group of disks (vdev? or zdev?) > will be dual parity. It won''t create a mirror, will it?That''s correct. Anything you''re unsure about, you can test. Just create a zpool using files instead of devices: for i in 1 2 3 4; do mkfile 256m /tmp/file$i done zpool create testpool raidz /tmp/file1 /tmp/file2 /tmp/file3 /tmp/file4 ...and experiment on that. No data risk this way. -- -D. dgc at uchicago.edu NSIT University of Chicago
On Tue, Apr 7, 2009 at 5:22 PM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> No. ?The two vdevs will be load shared rather than creating a mirror. This > should double your multi-user performance.Cool - now a followup - When I attach this new raidz2, will ZFS auto "rebalance" data between the two, or will it keep the other one empty and do some sort of load balancing between the two for future writes only? Is there a way (perhaps a scrub? or something?) to get the data spread around to both?
Michael Shadle wrote:> On Tue, Apr 7, 2009 at 5:22 PM, Bob Friesenhahn > <bfriesen at simple.dallas.tx.us> wrote: > > >> No. The two vdevs will be load shared rather than creating a mirror. This >> should double your multi-user performance. >> > > Cool - now a followup - > > When I attach this new raidz2, will ZFS auto "rebalance" data between > the two, or will it keep the other one empty and do some sort of load > balancing between the two for future writes only? >Future writes only as far as I am aware. You will however get increased IO potentially. (Total increase will depend on controller layouts etc etc.)> Is there a way (perhaps a scrub? or something?) to get the data spread > around to both? >No. You could backup and restore though. (or if you a small number of big files you could I guess copy them around inside the pool to get them "rebalanced". )> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- _______________________________________________________________________ Scott Lawson Systems Architect Manukau Institute of Technology Information Communication Technology Services Private Bag 94006 Manukau City Auckland New Zealand Phone : +64 09 968 7611 Fax : +64 09 968 7641 Mobile : +64 27 568 7611 mailto:scott at manukau.ac.nz http://www.manukau.ac.nz ________________________________________________________________________ perl -e ''print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'' ________________________________________________________________________
Michael, You can''t attach disks to an existing RAIDZ vdev, but you add another RAIDZ vdev. Also keep in mind that you can''t detach disks from RAIDZ pools either. See the syntax below. Cindy # zpool create rzpool raidz2 c1t0d0 c1t1d0 c1t2d0 # zpool status pool: rzpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rzpool ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 errors: No known data errors # zpool add rzpool raidz2 c1t3d0 c1t4d0 c1t5d0 # zpool status pool: rzpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rzpool ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 # zpool attach rzpool c1t5d0 c1t8d0 c1t9d0 c1t10d0 too many arguments usage: attach [-f] <pool> <device> <new_device> # zpool attach rzpool c1t5d0 c1t8d0 cannot attach c1t8d0 to c1t5d0: can only attach to mirrors and top-level disks # zpool detach rzpool c1t2d0 cannot detach c1t2d0: only applicable to mirror and replacing vdevs Michael Shadle wrote:> On Wed, Apr 1, 2009 at 3:19 AM, Michael Shadle <mike503 at gmail.com> wrote: > >>I''m going to try to move one of my disks off my rpool tomorrow (since >>it''s a mirror) to a different controller. >> >>According to what I''ve heard before, ZFS should automagically >>recognize this new location and have no problem, right? > > > I successfully have realized how nice ZFS is with locating the proper > location of the disk across different controllers/ports. Besides for > rpool - ZFS boot. Moving those creates a huge PITA. > > > Now quick question - if I have a raidz2 named ''tank'' already I can > expand the pool by doing: > > zpool attach tank raidz2 device1 device2 device3 ... device7 > > It will make ''tank'' larger and each group of disks (vdev? or zdev?) > will be dual parity. It won''t create a mirror, will it? > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>>>> "ms" == Michael Shadle <mike503 at gmail.com> writes:ms> When I attach this new raidz2, will ZFS auto "rebalance" data ms> between the two, or will it keep the other one empty and do ms> some sort of load balancing between the two for future writes ms> only? the second choice. You can see how things are balanced right now, in terms of both size used on each vdev and io/s on each device, using ''zpool iostat <pool> 1''. the interval timing seems kind of goofy though---when the pool is busy the intervals are longer than one second, which sort of defeats the purpose of bucketing, so it is just a rough-guess kind of tool, not the serious statistics collector it looks like. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090408/990cd61e/attachment.bin>
Wed, Apr 8, 2009 at 9:39 AM, Miles Nordin <carton at ivy.net> wrote:>>>>>> "ms" == Michael Shadle <mike503 at gmail.com> writes: > > ? ?ms> When I attach this new raidz2, will ZFS auto "rebalance" data > ? ?ms> between the two, or will it keep the other one empty and do > ? ?ms> some sort of load balancing between the two for future writes > ? ?ms> only? > > the second choice.I actually have to move a bunch of files around anyway, so what I am planning on doing is waiting until tonight (hopefully) when I add my second raidz2 vdev and then doing the move. It''s between two ZFS filesystems on the same zpool, hopefully I might be able to help force "rebalance" the data a bit (an idea someone had) - either way it''s something I have to do, so I might be able to get some possible additional benefit out of it too :) also Cindy: thanks for the "add" vs. "attach" correction.