Is it possible to create a (degraded) zpool with placeholders specified instead of actual disks (parity or mirrors)? This is possible in linux mdadm ("missing" keyword), so I kinda hoped this can be done in Solaris, but didn''t manage to. Usecase scenario: I have a single server (or home workstation) with 4 HDD bays, sold with 2 drives. Initially the system was set up with a ZFS mirror for data slices. Now we got 2 more drives and want to replace the mirror with a larger RAIDZ2 set (say I don''t want a RAID10 which is trivial to make). Technically I think that it should be possible to force creation of a degraded raidz2 array with two actual drives and two missing drives. Then I''d copy data from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs recv), destroy the mirror pool and attach its two drives to "repair" the raidz2 pool. While obviously not an "enterprise" approach, this is useful while expanding home systems when I don''t have a spare tape backup to dump my files on it and restore afterwards. I think it''s an (intended?) limitation in zpool command itself, since the kernel can very well live with degraded pools. //Jim -- This message posted from opensolaris.org
Jim Klimov wrote:> Is it possible to create a (degraded) zpool with placeholders specified instead > of actual disks (parity or mirrors)? This is possible in linux mdadm ("missing" > keyword), so I kinda hoped this can be done in Solaris, but didn''t manage to. > > Usecase scenario: > > I have a single server (or home workstation) with 4 HDD bays, sold with 2 drives. > Initially the system was set up with a ZFS mirror for data slices. Now we got 2 > more drives and want to replace the mirror with a larger RAIDZ2 set (say I don''t > want a RAID10 which is trivial to make). > > Technically I think that it should be possible to force creation of a degraded > raidz2 array with two actual drives and two missing drives. Then I''d copy data > from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs recv), > destroy the mirror pool and attach its two drives to "repair" the raidz2 pool. > > While obviously not an "enterprise" approach, this is useful while expanding > home systems when I don''t have a spare tape backup to dump my files on it > and restore afterwards. >I would say it is definitely not a recommended approach for those who love their data, whether "enterprise" or not. But my opinion is really a result of our environment at Sun (or any systems vendor). Being here blinds us to some opportunities. Please file an RFE at http://bugs.opensolaris.org -- richard
On 15 January, 2009 - Jim Klimov sent me these 1,3K bytes:> Is it possible to create a (degraded) zpool with placeholders specified instead > of actual disks (parity or mirrors)? This is possible in linux mdadm ("missing" > keyword), so I kinda hoped this can be done in Solaris, but didn''t manage to. > > Usecase scenario: > > I have a single server (or home workstation) with 4 HDD bays, sold with 2 drives. > Initially the system was set up with a ZFS mirror for data slices. Now we got 2 > more drives and want to replace the mirror with a larger RAIDZ2 set (say I don''t > want a RAID10 which is trivial to make). > > Technically I think that it should be possible to force creation of a degraded > raidz2 array with two actual drives and two missing drives. Then I''d copy data > from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs recv), > destroy the mirror pool and attach its two drives to "repair" the raidz2 pool. > > While obviously not an "enterprise" approach, this is useful while expanding > home systems when I don''t have a spare tape backup to dump my files on it > and restore afterwards. > > I think it''s an (intended?) limitation in zpool command itself, since the kernel > can very well live with degraded pools.You can fake it.. kalv:/tmp# mkfile 64m realdisk1 kalv:/tmp# mkfile 64m realdisk2 kalv:/tmp# mkfile -n 64m fakedisk1 kalv:/tmp# mkfile -n 64m fakedisk2 kalv:/tmp# ls -la real* fake* -rw------T 1 root root 67108864 2009-01-15 17:02 fakedisk1 -rw------T 1 root root 67108864 2009-01-15 17:02 fakedisk2 -rw------T 1 root root 67108864 2009-01-15 17:02 realdisk1 -rw------T 1 root root 67108864 2009-01-15 17:02 realdisk2 kalv:/tmp# du real* fake* 65555 realdisk1 65555 realdisk2 133 fakedisk1 133 fakedisk2 In reality, those realdisk* should be pointing at real disks, but fakedisk* should still point at sparse mkfile''s with the same size as your real disks (300GB or whatever). kalv:/tmp# zpool create blah raidz2 /tmp/realdisk1 /tmp/realdisk2 /tmp/fakedisk1 /tmp/fakedisk2 kalv:/tmp# zpool status blah pool: blah state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM blah ONLINE 0 0 0 raidz2 ONLINE 0 0 0 /tmp/realdisk1 ONLINE 0 0 0 /tmp/realdisk2 ONLINE 0 0 0 /tmp/fakedisk1 ONLINE 0 0 0 /tmp/fakedisk2 ONLINE 0 0 0 errors: No known data errors Ok, so it''s created fine. Let''s "accidentally" introduce some problems.. kalv:/tmp# rm /tmp/fakedisk1 kalv:/tmp# rm /tmp/fakedisk2 kalv:/tmp# zpool scrub blah kalv:/tmp# zpool status blah pool: blah state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: scrub completed after 0h0m with 0 errors on Thu Jan 15 17:03:38 2009 config: NAME STATE READ WRITE CKSUM blah DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 /tmp/realdisk1 ONLINE 0 0 0 /tmp/realdisk2 ONLINE 0 0 0 /tmp/fakedisk1 UNAVAIL 0 0 0 cannot open /tmp/fakedisk2 UNAVAIL 0 0 0 cannot open errors: No known data errors Still working. At this point, you can start filling blah with data. Then after a while, let''s bring in the other real disks: kalv:/tmp# mkfile 64m realdisk3 kalv:/tmp# mkfile 64m realdisk4 kalv:/tmp# zpool replace blah /tmp/fakedisk1 /tmp/realdisk3 kalv:/tmp# zpool replace blah /tmp/fakedisk2 /tmp/realdisk4 kalv:/tmp# zpool status blah pool: blah state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Thu Jan 15 17:04:31 2009 config: NAME STATE READ WRITE CKSUM blah ONLINE 0 0 0 raidz2 ONLINE 0 0 0 /tmp/realdisk1 ONLINE 0 0 0 /tmp/realdisk2 ONLINE 0 0 0 /tmp/realdisk3 ONLINE 0 0 0 /tmp/realdisk4 ONLINE 0 0 0 Of course, try it out a bit before doing it for real. /Tomas -- Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Ume? `- Sysadmin at {cs,acc}.umu.se
Thanks Tomas, I haven''t checked yet, but your workaround seems feasible. I''ve posted an RFE and referenced your approach as a workaround. That''s nearly what zpool should do under the hood, and perhaps can be done temporarily with a wrapper script to detect min(physical storage sizes) ;) //Jim -- This message posted from opensolaris.org
Tomas ?gren wrote:> On 15 January, 2009 - Jim Klimov sent me these 1,3K bytes: > >> Is it possible to create a (degraded) zpool with placeholders specified instead >> of actual disks (parity or mirrors)? This is possible in linux mdadm ("missing" >> keyword), so I kinda hoped this can be done in Solaris, but didn''t manage to. >> >> Usecase scenario: >> >> I have a single server (or home workstation) with 4 HDD bays, sold with 2 drives. >> Initially the system was set up with a ZFS mirror for data slices. Now we got 2 >> more drives and want to replace the mirror with a larger RAIDZ2 set (say I don''t >> want a RAID10 which is trivial to make). >> >> Technically I think that it should be possible to force creation of a degraded >> raidz2 array with two actual drives and two missing drives. Then I''d copy data >> from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs recv), >> destroy the mirror pool and attach its two drives to "repair" the raidz2 pool. >> >> While obviously not an "enterprise" approach, this is useful while expanding >> home systems when I don''t have a spare tape backup to dump my files on it >> and restore afterwards. >> >> I think it''s an (intended?) limitation in zpool command itself, since the kernel >> can very well live with degraded pools. > > You can fake it..[snip command set] Summary, yes that actually works and I''ve done it, but its very slow! I essentially did this myself when I migrated a 4x2-way mirror pool to a 2x4 disk raidzs (4x 500GB and 4x 1.5TB). I can say from experience that it works but since I used 2 sparsefiles to simulate 2 disks on a single physical disk performance sucked and it took a long time to do the migration. IIRC it took over 2 days to transfer 2TB of data. I used rsync, at the time I either didn''t know about or forgot about zfs send/receive which would probably work better. It took a couple more days to verify that everything transferred correctly with no bit rot (rsync -c). I think Sun avoids making things like this too easy because from a business standpoint it''s easier just to spend the money on enough hardware to do it properly without the chance of data loss and the extended down time. "Doesn''t invest the time in" may be a be a better phrase than "avoids" though. I doubt Sun actually goes out of their way to make things harder for people. Hope that helps, Jonathan
Jim Klimov schrieb:> Is it possible to create a (degraded) zpool with placeholders specified instead > of actual disks (parity or mirrors)? This is possible in linux mdadm ("missing" > keyword), so I kinda hoped this can be done in Solaris, but didn''t manage to.Create sparse files with the size of the disks (mkfile -n ...). Create a zpool with the free disks and the sparse files (zpool create -f ...). Then immediately put the sparse files offline (zpool offline ...). Copy the files to the new zpool, destroy the old one and replace the sparse files with the now freed up disks (zpool replace ...). Remember: during data migration your are running without safety belts. If a disk fails during migration you will lose data. Daniel
Johan Hartzenberg
2009-Jan-16 14:51 UTC
[zfs-discuss] Can I create ZPOOL with missing disks?
On Thu, Jan 15, 2009 at 5:18 PM, Jim Klimov <jimklimov at cos.ru> wrote:> Usecase scenario: > > I have a single server (or home workstation) with 4 HDD bays, sold with 2 > drives. > Initially the system was set up with a ZFS mirror for data slices. Now we > got 2 > more drives and want to replace the mirror with a larger RAIDZ2 set (say I > don''t > want a RAID10 which is trivial to make). > > Technically I think that it should be possible to force creation of a > degraded > raidz2 array with two actual drives and two missing drives. Then I''d copy > data > from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs > recv), > destroy the mirror pool and attach its two drives to "repair" the raidz2 > pool. > >1. Buy, borrow or steal two External USB disk enclosures (if you don''t have two). 2. Install two new disks internally, and connect the other two via the USB external enclosures. 3. Set up the zpool 4. Copy the data over. 5. Export both pool. 6. Shut Down 7. Remove the two old disks 8. Move the two disks from the External USB enclosures into the system 9. Start back up, and ... 10. Import the new pool. -- Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke My blog: http://initialprogramload.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090116/b5475249/attachment.html>
Thanks to all those who helped, even despite the "non-enterprise approach" of this question ;) While experimenting I discovered that Solaris /tmp doesn''t seem to support sparse files: "mkfile -n" still creates full-sized files which can either use up the swap space, or not fit there. ZFS and UFS filesystems make sparse files okay though. This was tested on Solaris 10u4, 10u6 and OpenSolaris b103. Other than this detail, scenario suggested by Tomas Ogren and updated by Daniel Rock seems working. Of these two, the variant with "zpool offline" for sparse files is preferential: it only takes one command to complete and is more straightforward (less error-prone). The variant with removing a sparse file requires "zpool scrub", otherwise the file remains open on the filesystem and grows (consumes space) while I copy data to the test pool. The consumed space is released after "zpool scrub" when the removed file is finally unlinked from FS. "zpool replace" works for both cases. -- This message posted from opensolaris.org
And one more note: while I could offline both "fake drives" in OpenSolaris tests, the Solaris 10u6 box refused to offline the second drive since it left the pool without parity. {code} [root at t2k1 /]# zpool status test pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 raidz2 ONLINE 0 0 0 /pool/rf1 ONLINE 0 0 0 /pool/rf2 ONLINE 0 0 0 /pool/rf3 ONLINE 0 0 0 /pool/rf4 ONLINE 0 0 0 errors: No known data errors [root at t2k1 /]# zpool offline test /pool/rf1 [root at t2k1 /]# zpool offline test /pool/rf2 cannot offline /pool/rf2: no valid replicas {code} There is no force option in either Solaris nor OpenSolaris: {code} [root at t2k1 /]# zpool offline -f test /pool/rf2 invalid option ''f'' usage: offline [-t] <pool> <device> ... {code} So this method is okay when used with file-based fake drives (which can be unlinked and scrubbed), but needs some more trickery reserarch in supported Solaris (i.e. if I tried to use this trick for resizing a raidz2 array of 4 disks). //Jim -- This message posted from opensolaris.org
...and, apparently, I can replace two drives at the similar time (in two commands), and resilvering goes in parallel: {code} [root at t2k1 /]# zpool status pool pool: pool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-2Q scrub: resilver completed after 0h0m with 0 errors on Sun Jan 18 15:11:24 2009 config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c1t0d0s3 ONLINE 0 0 0 /ff1 OFFLINE 0 0 0 c1t2d0s3 ONLINE 0 0 0 /ff2 UNAVAIL 0 0 0 cannot open errors: No known data errors [root at t2k1 /]# zpool replace pool /ff1 c1t1d0s3; zpool replace pool /ff2 c1t3d0s3 {code} This took a while, about half-a-minute. Now, how is array rebuild going? {code} [root at t2k1 /]# zpool status pool pool: pool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.48% done, 1h9m to go config: NAME STATE READ WRITE CKSUM pool DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c1t0d0s3 ONLINE 0 0 0 replacing DEGRADED 0 0 0 /ff1 OFFLINE 0 0 0 c1t1d0s3 ONLINE 0 0 0 c1t2d0s3 ONLINE 0 0 0 replacing DEGRADED 0 0 0 /ff2 UNAVAIL 0 0 0 cannot open c1t3d0s3 ONLINE 0 0 0 errors: No known data errors {code} The progress meter tends to lie at first: resilvering takes roughly 30 min for the raidz2 of 4*60Gb slices. BTW, an earlier poster reported very slow synchronization using real disks and sparse files on a single disk. I removed the sparse files as soon as the array was initialized, and writing to two searate drives went reasonably well. I sent data from the latest snapshot of the oldpool to the newpool with {code} zfs send -R oldpool at 20090118-02-postUpgrade | zfs recv -vF -d newpool {code} Larger datasets went in the normal range of 13-20Mb/s (of course, smaller datasets and snapshots ranging in a few kilobytes of size took more time to open-close than actually copying data; so estimated speed was bytes or kbytes per sec). //Jim -- This message posted from opensolaris.org