Steven Sim
2009-Feb-09 16:05 UTC
ZFS Automatic Growth after replacing original disk with a larger sized disk
All; There''s been some negative post about ZFS recently. I''ve been using ZFS for more than 13 months now, my system''s gone through 3 major upgrades, one critical failure and the data''s still fully intact. I am thoroughly impressed with ZFS. In particular, it''s sheer reliability. As for flexibility, I am also impressed but am of the opinion that it could do with some improvement here. (like allowing LUNS to be removed..) Recently, I had at my place a ZFS pool consisting of 4 x 320 Gbyte SATA II drives in a RAID-Z configuration. I had almost used up all the available space and sought a way to expand the space without attaching any additional drives. root@sunlight:/root# zfs list myplace NAME USED AVAIL REFER MOUNTPOINT myplace 874G 4.05G 28.4K /myplace I bought 4 x 1 TB SATA II drives and deliberately replace one of the older 320Gbyte drive with a 1 TB drive. root@sunlight:/root# zpool status -v myplace pool: myplace state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM myplace DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c4d0 ONLINE 0 0 0 c5d0 ONLINE 0 0 0 c6d0 ONLINE 0 0 0 c7d0 UNAVAIL 0 220 0 cannot open Yup! As expected, ZFS reported c7d0 as faulted.... so .. root@sunlight:/root# zpool replace myplace c7d0 root@sunlight:/root# zpool status -v myplace pool: myplace state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.00% done, 328h41m to go config: NAME STATE READ WRITE CKSUM myplace DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c4d0 ONLINE 0 0 0 260K resilvered c5d0 ONLINE 0 0 0 252K resilvered c6d0 ONLINE 0 0 0 260K resilvered replacing DEGRADED 0 0 0 c7d0s0/o FAULTED 0 972 0 corrupted data c7d0 ONLINE 0 0 0 418K resilvered errors: No known data errors Almost three hours later.. root@sunlight:/root# zpool status -v myplace pool: myplace state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using ''zpool upgrade''. Once this is done, the pool will no longer be accessible on older software versions. scrub: resilver completed after 2h53m with 0 errors on Fri Feb 6 19:32:35 2009 config: NAME STATE READ WRITE CKSUM myplace ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 171M resilvered c5d0 ONLINE 0 0 0 171M resilvered c6d0 ONLINE 0 0 0 168M resilvered c7d0 ONLINE 0 0 0 292G resilvered errors: No known data errors Subsequently I did a ZFS upgrade.. root@sunlight:/root# zpool upgrade This system is currently running ZFS pool version 13. The following pools are out of date, and can be upgraded. After being upgraded, these pools will no longer be accessible by older software versions. VER POOL --- ------------ 10 myplace Use ''zpool upgrade -v'' for a list of available versions and their associated features. root@sunlight:/root# zpool upgrade myplace This system is currently running ZFS pool version 13. Successfully upgraded ''myplace'' from version 10 to version 13 Ok, upgrade successful. Subsequently I replaced the next drive, waited for ZFS to completely resilvered the replaced 1 TB drive and repeated the sequence until I completely replaced all 320 Gbyte drive with 1 Tbyte drives.... And after replacing and completely resilvering the 4th drive.... (output shows the 4th drive being resilvered..) root@sunlight:/root# zpool status -v myplace pool: myplace state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h17m, 5.15% done, 5h16m to go config: NAME STATE READ WRITE CKSUM myplace DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 replacing DEGRADED 0 0 0 c4d0s0/o FAULTED 0 6.35K 0 corrupted data c4d0 ONLINE 0 0 0 15.0G resilvered c5d0 ONLINE 0 0 0 48.7M resilvered c6d0 ONLINE 0 0 0 48.3M resilvered c7d0 ONLINE 0 0 0 18.7M resilvered errors: No known data errors After the 4th drive was completely resilvered.... root@sunlight:/root# $ zpool status -v myplace pool: myplace state: ONLINE scrub: resilver completed after 4h28m with 0 errors on Sat Feb 7 14:51:05 2009 config: NAME STATE READ WRITE CKSUM myplace ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4d0 ONLINE 0 0 0 292G resilvered c5d0 ONLINE 0 0 0 173M resilvered c6d0 ONLINE 0 0 0 170M resilvered c7d0 ONLINE 0 0 0 134M resilvered errors: No known data errors $ zfs list myplace NAME USED AVAIL REFER MOUNTPOINT myplace 874G 1.82T 28.4K /myplace AUTOMAGICALLY! 1.82T Available! For ALL my File systems within the ZFS Pool!! Awesome! Simply awesome! No other file system or volume manager I know has this absolutely wonderful ability. Certainly Enterprise system will not use this method to increase their storage space but it certainly proved a boon for me! Warmest Regards Steven Sim _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Bob Friesenhahn
2009-Feb-09 17:48 UTC
[zfs-discuss] ZFS Automatic Growth after replacing original disk with a larger sized disk
On Tue, 10 Feb 2009, Steven Sim wrote:> > I had almost used up all the available space and sought a way to > expand the space without attaching any additional drives.It is good that you succeeded, but the approach you used seems really risky. If possible, it is far safer to temporarily add the extra drive without disturbing the redundancy of your raidz1 configuration. The most likely time to encounter data loss is while resilvering since so much data needs to be successfully read. Since you removed the redundancy (by intentionally causing a disk "failure"), any read failure could have lost files, or even the entire pool! Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
David Dyer-Bennet
2009-Feb-09 18:59 UTC
[zfs-discuss] ZFS Automatic Growth after replacing original disk with a larger sized disk
On Mon, February 9, 2009 11:48, Bob Friesenhahn wrote:> On Tue, 10 Feb 2009, Steven Sim wrote: >> >> I had almost used up all the available space and sought a way to >> expand the space without attaching any additional drives. > > It is good that you succeeded, but the approach you used seems really > risky. If possible, it is far safer to temporarily add the extra > drive without disturbing the redundancy of your raidz1 configuration. > The most likely time to encounter data loss is while resilvering since > so much data needs to be successfully read. Since you removed the > redundancy (by intentionally causing a disk "failure"), any read > failure could have lost files, or even the entire pool!Most people run most of their lives with no redundancy in their data, though. If you make sure the backups are up-to-date I don''t see any serious risk in using the swap-one-disk-at-a-time approach for upgrading a home server, where you can have it out of service more easily (or at least tell people not to count on anything they write being safe) than in a commercial environment. And as I recall, many of the discussions about this technique involve people who do not, in fact, have the ability to replicate the entire vdev. Often people with 4 hot-swap bays running a 4-disk raidz. -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
Andrew Gabriel
2009-Feb-09 19:13 UTC
Re: ZFS Automatic Growth after replacing original disk with a larger sized disk
David Dyer-Bennet wrote: On Mon, February 9, 2009 11:48, Bob Friesenhahn wrote: On Tue, 10 Feb 2009, Steven Sim wrote: I had almost used up all the available space and sought a way to expand the space without attaching any additional drives. It is good that you succeeded, but the approach you used seems really risky. If possible, it is far safer to temporarily add the extra drive without disturbing the redundancy of your raidz1 configuration. The most likely time to encounter data loss is while resilvering since so much data needs to be successfully read. Since you removed the redundancy (by intentionally causing a disk "failure"), any read failure could have lost files, or even the entire pool! Most people run most of their lives with no redundancy in their data, though. If you make sure the backups are up-to-date I don''t see any serious risk in using the swap-one-disk-at-a-time approach for upgrading a home server, where you can have it out of service more easily (or at least tell people not to count on anything they write being safe) than in a commercial environment. And as I recall, many of the discussions about this technique involve people who do not, in fact, have the ability to replicate the entire vdev. Often people with 4 hot-swap bays running a 4-disk raidz. If you''re going to do this, at least get a clean zpool scrub run completed (with no checksum errors) before you start. Otherwise you may well find you have some corrupt blocks in files you hardly ever access (and so haven''t seen), but of course they will be needed to reconstruct the raidz on the new disk, at which point you are stuffed. Actually, it''s probably a good idea to get a clean zpool scrub between each disk swap too, in case one of the new disks turns out to be giving errors. -- Andrew _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Bob Friesenhahn
2009-Feb-09 20:12 UTC
[zfs-discuss] ZFS Automatic Growth after replacing original disk with a larger sized disk
On Mon, 9 Feb 2009, David Dyer-Bennet wrote:> Most people run most of their lives with no redundancy in their data, > though. If you make sure the backups are up-to-date I don''t see any > serious risk in using the swap-one-disk-at-a-time approach for upgrading a > home server, where you can have it out of service more easily (or at least > tell people not to count on anything they write being safe) than in a > commercial environment.The risk of some level of failure is pretty high, particularly when using large SATA drives. The typical home user does not do true backups. Even with true backups, the pool needs to be configured from scratch to reproduce what was there before. If they at least copy their files somewhere, it can still take a day or two to put the pool back together. These are good arguments for home users to use raidz2 rather than raidz1 so that the risk is dramatically diminished.> And as I recall, many of the discussions about this technique involve > people who do not, in fact, have the ability to replicate the entire vdev. > Often people with 4 hot-swap bays running a 4-disk raidz.There is almost always a way to add a disk to a system, even if via slow USB. Some people use a USB chassis which will accept the new drive, zpool replace the array drive with this new drive, and then physically install the new drive in the array. Zfs export and import is required. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/