thr3ads.net - zfs discuss - [zfs-discuss] Can I create ZPOOL with missing disks? [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Jim Klimov

2009-Jan-15 15:18 UTC

[zfs-discuss] Can I create ZPOOL with missing disks?

Is it possible to create a (degraded) zpool with placeholders specified instead
of actual disks (parity or mirrors)? This is possible in linux mdadm
("missing"
keyword), so I kinda hoped this can be done in Solaris, but didn''t
manage to.

Usecase scenario: 

I have a single server (or home workstation) with 4 HDD bays, sold with 2
drives.
Initially the system was set up with a ZFS mirror for data slices. Now we got 2 
more drives and want to replace the mirror with a larger RAIDZ2 set (say I
don''t
want a RAID10 which is trivial to make). 

Technically I think that it should be possible to force creation of a degraded
raidz2 array with two actual drives and two missing drives. Then I''d
copy data
from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs recv),
destroy the mirror pool and attach its two drives to "repair" the
raidz2 pool.

While obviously not an "enterprise" approach, this is useful while
expanding
home systems when I don''t have a spare tape backup to dump my files on
it
and restore afterwards.

I think it''s an (intended?) limitation in zpool command itself, since
the kernel
can very well live with degraded pools.

//Jim
-- 
This message posted from opensolaris.org

Richard Elling

2009-Jan-15 16:06 UTC

head link

[zfs-discuss] Can I create ZPOOL with missing disks?

Jim Klimov wrote:> Is it possible to create a (degraded) zpool with placeholders specified
instead
> of actual disks (parity or mirrors)? This is possible in linux mdadm
("missing"
> keyword), so I kinda hoped this can be done in Solaris, but didn''t
manage to.
>
> Usecase scenario: 
>
> I have a single server (or home workstation) with 4 HDD bays, sold with 2
drives.
> Initially the system was set up with a ZFS mirror for data slices. Now we
got 2
> more drives and want to replace the mirror with a larger RAIDZ2 set (say I
don''t
> want a RAID10 which is trivial to make). 
>
> Technically I think that it should be possible to force creation of a
degraded
> raidz2 array with two actual drives and two missing drives. Then
I''d copy data
> from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs
recv),
> destroy the mirror pool and attach its two drives to "repair" the
raidz2 pool.
>
> While obviously not an "enterprise" approach, this is useful
while expanding
> home systems when I don''t have a spare tape backup to dump my
files on it
> and restore afterwards.
>   
I would say it is definitely not a recommended approach for those who
love their data, whether "enterprise" or not.  But my opinion is
really a
result of our environment at Sun (or any systems vendor).  Being here
blinds us to some opportunities. Please file an RFE at
    http://bugs.opensolaris.org
 -- richard

Tomas Ögren

2009-Jan-15 16:10 UTC

head link

[zfs-discuss] Can I create ZPOOL with missing disks?

On 15 January, 2009 - Jim Klimov sent me these 1,3K bytes:
> Is it possible to create a (degraded) zpool with placeholders specified
instead
> of actual disks (parity or mirrors)? This is possible in linux mdadm
("missing"
> keyword), so I kinda hoped this can be done in Solaris, but didn''t
manage to.
> 
> Usecase scenario: 
> 
> I have a single server (or home workstation) with 4 HDD bays, sold with 2
drives.
> Initially the system was set up with a ZFS mirror for data slices. Now we
got 2
> more drives and want to replace the mirror with a larger RAIDZ2 set (say I
don''t
> want a RAID10 which is trivial to make). 
> 
> Technically I think that it should be possible to force creation of a
degraded
> raidz2 array with two actual drives and two missing drives. Then
I''d copy data
> from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs
recv),
> destroy the mirror pool and attach its two drives to "repair" the
raidz2 pool.
> 
> While obviously not an "enterprise" approach, this is useful
while expanding
> home systems when I don''t have a spare tape backup to dump my
files on it
> and restore afterwards.
> 
> I think it''s an (intended?) limitation in zpool command itself,
since the kernel
> can very well live with degraded pools.
You can fake it..

kalv:/tmp# mkfile 64m realdisk1
kalv:/tmp# mkfile 64m realdisk2
kalv:/tmp# mkfile -n 64m fakedisk1
kalv:/tmp# mkfile -n 64m fakedisk2
kalv:/tmp# ls -la real* fake*
-rw------T 1 root root 67108864 2009-01-15 17:02 fakedisk1
-rw------T 1 root root 67108864 2009-01-15 17:02 fakedisk2
-rw------T 1 root root 67108864 2009-01-15 17:02 realdisk1
-rw------T 1 root root 67108864 2009-01-15 17:02 realdisk2
kalv:/tmp# du real* fake*
65555   realdisk1
65555   realdisk2
133     fakedisk1
133     fakedisk2


In reality, those realdisk* should be pointing at real disks, but
fakedisk* should still point at sparse mkfile''s with the same size as
your real disks (300GB or whatever).

kalv:/tmp# zpool create blah raidz2 /tmp/realdisk1 /tmp/realdisk2 /tmp/fakedisk1
/tmp/fakedisk2
kalv:/tmp# zpool status blah
  pool: blah
 state: ONLINE
 scrub: none requested
config:

        NAME                STATE     READ WRITE CKSUM
        blah                ONLINE       0     0     0
          raidz2            ONLINE       0     0     0
            /tmp/realdisk1  ONLINE       0     0     0
            /tmp/realdisk2  ONLINE       0     0     0
            /tmp/fakedisk1  ONLINE       0     0     0
            /tmp/fakedisk2  ONLINE       0     0     0

errors: No known data errors

Ok, so it''s created fine. Let''s "accidentally"
introduce some problems..


kalv:/tmp# rm /tmp/fakedisk1
kalv:/tmp# rm /tmp/fakedisk2
kalv:/tmp# zpool scrub blah
kalv:/tmp# zpool status blah
  pool: blah
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: scrub completed after 0h0m with 0 errors on Thu Jan 15 17:03:38
2009
config:

        NAME                STATE     READ WRITE CKSUM
        blah                DEGRADED     0     0     0
          raidz2            DEGRADED     0     0     0
            /tmp/realdisk1  ONLINE       0     0     0
            /tmp/realdisk2  ONLINE       0     0     0
            /tmp/fakedisk1  UNAVAIL      0     0     0  cannot open
            /tmp/fakedisk2  UNAVAIL      0     0     0  cannot open

errors: No known data errors


Still working.

At this point, you can start filling blah with data. Then after a while,
let''s bring in the other real disks:

kalv:/tmp# mkfile 64m realdisk3
kalv:/tmp# mkfile 64m realdisk4
kalv:/tmp# zpool replace blah /tmp/fakedisk1 /tmp/realdisk3
kalv:/tmp# zpool replace blah /tmp/fakedisk2 /tmp/realdisk4
kalv:/tmp# zpool status blah
  pool: blah
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Thu Jan 15 17:04:31 2009
config:

        NAME                STATE     READ WRITE CKSUM
        blah                ONLINE       0     0     0
          raidz2            ONLINE       0     0     0
            /tmp/realdisk1  ONLINE       0     0     0
            /tmp/realdisk2  ONLINE       0     0     0
            /tmp/realdisk3  ONLINE       0     0     0
            /tmp/realdisk4  ONLINE       0     0     0


Of course, try it out a bit before doing it for real.

/Tomas
-- 
Tomas ?gren, stric at acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Ume?
`- Sysadmin at {cs,acc}.umu.se

Jim Klimov

2009-Jan-15 17:47 UTC

head link

[zfs-discuss] Can I create ZPOOL with missing disks?

Thanks Tomas, I haven''t checked yet, but your workaround seems
feasible.

I''ve posted an RFE and referenced your approach as a workaround.
That''s nearly what zpool should do under the hood, and perhaps can be
done
temporarily with a wrapper script to detect min(physical storage sizes)  ;)

//Jim
-- 
This message posted from opensolaris.org

Jonathan

2009-Jan-16 04:37 UTC

head link

[zfs-discuss] Can I create ZPOOL with missing disks?

Tomas ?gren wrote:> On 15 January, 2009 - Jim Klimov sent me these 1,3K bytes:
> 
>> Is it possible to create a (degraded) zpool with placeholders specified
instead
>> of actual disks (parity or mirrors)? This is possible in linux mdadm
("missing"
>> keyword), so I kinda hoped this can be done in Solaris, but
didn''t manage to.
>>
>> Usecase scenario: 
>>
>> I have a single server (or home workstation) with 4 HDD bays, sold with
2 drives.
>> Initially the system was set up with a ZFS mirror for data slices. Now
we got 2
>> more drives and want to replace the mirror with a larger RAIDZ2 set
(say I don''t
>> want a RAID10 which is trivial to make). 
>>
>> Technically I think that it should be possible to force creation of a
degraded
>> raidz2 array with two actual drives and two missing drives. Then
I''d copy data
>> from the old mirror pool to the new degraded raidz2 pool (zfs send |
zfs recv),
>> destroy the mirror pool and attach its two drives to "repair"
the raidz2 pool.
>>
>> While obviously not an "enterprise" approach, this is useful
while expanding
>> home systems when I don''t have a spare tape backup to dump my
files on it
>> and restore afterwards.
>>
>> I think it''s an (intended?) limitation in zpool command
itself, since the kernel
>> can very well live with degraded pools.
> 
> You can fake it..
[snip command set]

Summary, yes that actually works and I''ve done it, but its very slow!

I essentially did this myself when I migrated a 4x2-way mirror pool to a
2x4 disk raidzs (4x 500GB and 4x 1.5TB).  I can say from experience that
it works but since I used 2 sparsefiles to simulate 2 disks on a single
physical disk performance sucked and it took a long time to do the
migration.  IIRC it took over 2 days to transfer 2TB of data.  I used
rsync, at the time I either didn''t know about or forgot about zfs
send/receive which would probably work better.  It took a couple more
days to verify that everything transferred correctly with no bit rot
(rsync -c).

I think Sun avoids making things like this too easy because from a
business standpoint it''s easier just to spend the money on enough
hardware to do it properly without the chance of data loss and the
extended down time.  "Doesn''t invest the time in" may be a be
a better
phrase than "avoids" though.  I doubt Sun actually goes out of their
way
to make things harder for people.

Hope that helps,
Jonathan

Daniel Rock

2009-Jan-16 09:42 UTC

head link

[zfs-discuss] Can I create ZPOOL with missing disks?

Jim Klimov schrieb:> Is it possible to create a (degraded) zpool with placeholders specified
instead
> of actual disks (parity or mirrors)? This is possible in linux mdadm
("missing"
> keyword), so I kinda hoped this can be done in Solaris, but didn''t
manage to.
Create sparse files with the size of the disks (mkfile -n ...).

Create a zpool with the free disks and the sparse files (zpool create -f 
...). Then immediately put the sparse files offline (zpool offline ...). 
Copy the files to the new zpool, destroy the old one and replace the 
sparse files with the now freed up disks (zpool replace ...).

Remember: during data migration your are running without safety belts. 
If a disk fails during migration you will lose data.



Daniel

Johan Hartzenberg

2009-Jan-16 14:51 UTC

head link

[zfs-discuss] Can I create ZPOOL with missing disks?

On Thu, Jan 15, 2009 at 5:18 PM, Jim Klimov <jimklimov at cos.ru> wrote:
> Usecase scenario:
>
> I have a single server (or home workstation) with 4 HDD bays, sold with 2
> drives.
> Initially the system was set up with a ZFS mirror for data slices. Now we
> got 2
> more drives and want to replace the mirror with a larger RAIDZ2 set (say I
> don''t
> want a RAID10 which is trivial to make).
>
> Technically I think that it should be possible to force creation of a
> degraded
> raidz2 array with two actual drives and two missing drives. Then
I''d copy
> data
> from the old mirror pool to the new degraded raidz2 pool (zfs send | zfs
> recv),
> destroy the mirror pool and attach its two drives to "repair" the
raidz2
> pool.
>
>1. Buy, borrow or steal two External USB disk enclosures (if you don''t
have
two).
2. Install two new disks internally, and connect the other two via the USB
external enclosures.
3. Set up the zpool
4. Copy the data over.
5. Export both pool.
6. Shut Down
7. Remove the two old disks
8. Move the two disks from the External USB enclosures into the system
9. Start back up, and ...
10. Import the new pool.



-- 
Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog: http://initialprogramload.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090116/b5475249/attachment.html>

Jim Klimov

2009-Jan-17 09:00 UTC

head link

[zfs-discuss] Can I create ZPOOL with missing disks?

Thanks to all those who helped, even despite the "non-enterprise
approach" of
this question ;)

While experimenting I discovered that Solaris /tmp doesn''t seem to
support
sparse files: "mkfile -n" still creates full-sized files which can
either use up the
swap space, or not fit there. ZFS and UFS filesystems make sparse files okay
though. This was tested on Solaris 10u4, 10u6 and OpenSolaris b103.

Other than this detail, scenario suggested by Tomas Ogren and updated by  
Daniel Rock seems working. Of these two, the variant with "zpool
offline" for
sparse files is preferential: it only takes one command to complete and is more
straightforward (less error-prone).

The variant with removing a sparse file requires "zpool scrub",
otherwise the
file remains open on the filesystem and grows (consumes space) while I copy
data to the test pool. The consumed space is released after "zpool
scrub" when
the removed file is finally unlinked from FS.

"zpool replace" works for both cases.
-- 
This message posted from opensolaris.org

Jim Klimov

2009-Jan-18 11:36 UTC

head link

[zfs-discuss] Can I create ZPOOL with missing disks?

And one more note: while I could offline both "fake drives" in
OpenSolaris tests,
the Solaris 10u6 box refused to offline the second drive since it left the pool
without parity. 

{code}
[root at t2k1 /]# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:

        NAME           STATE     READ WRITE CKSUM
        test           ONLINE       0     0     0
          raidz2       ONLINE       0     0     0
            /pool/rf1  ONLINE       0     0     0
            /pool/rf2  ONLINE       0     0     0
            /pool/rf3  ONLINE       0     0     0
            /pool/rf4  ONLINE       0     0     0

errors: No known data errors
[root at t2k1 /]# zpool offline test /pool/rf1
[root at t2k1 /]# zpool offline test /pool/rf2
cannot offline /pool/rf2: no valid replicas
{code}

There is no force option in either Solaris nor OpenSolaris:

{code}
[root at t2k1 /]# zpool offline -f test /pool/rf2
invalid option ''f''
usage:
        offline [-t] <pool> <device> ...
{code}

So this method is okay when used with file-based fake drives (which can be 
unlinked and scrubbed), but needs some more trickery reserarch in supported 
Solaris (i.e. if I tried to use this trick for resizing a raidz2 array of 4
disks).

//Jim
-- 
This message posted from opensolaris.org

Jim Klimov

2009-Jan-18 18:44 UTC

head link

[zfs-discuss] Can I create ZPOOL with missing disks?

...and, apparently, I can replace two drives at the similar time (in two 
commands), and resilvering goes in parallel:

{code}
[root at t2k1 /]# zpool status pool
  pool: pool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using ''zpool
online''.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver completed after 0h0m with 0 errors on Sun Jan 18 15:11:24 2009
config:

        NAME          STATE     READ WRITE CKSUM
        pool          DEGRADED     0     0     0
          raidz2      DEGRADED     0     0     0
            c1t0d0s3  ONLINE       0     0     0
            /ff1      OFFLINE      0     0     0
            c1t2d0s3  ONLINE       0     0     0
            /ff2      UNAVAIL      0     0     0  cannot open

errors: No known data errors
[root at t2k1 /]# zpool replace pool /ff1 c1t1d0s3; zpool replace pool /ff2
c1t3d0s3
{code}

This took a while, about half-a-minute. Now, how is array rebuild going?

{code}
[root at t2k1 /]# zpool status pool
  pool: pool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.48% done, 1h9m to go
config:

        NAME            STATE     READ WRITE CKSUM
        pool            DEGRADED     0     0     0
          raidz2        DEGRADED     0     0     0
            c1t0d0s3    ONLINE       0     0     0
            replacing   DEGRADED     0     0     0
              /ff1      OFFLINE      0     0     0
              c1t1d0s3  ONLINE       0     0     0
            c1t2d0s3    ONLINE       0     0     0
            replacing   DEGRADED     0     0     0
              /ff2      UNAVAIL      0     0     0  cannot open
              c1t3d0s3  ONLINE       0     0     0

errors: No known data errors
{code}

The progress meter tends to lie at first: resilvering takes roughly 30 min for
the
raidz2 of 4*60Gb slices.

BTW, an earlier poster reported very slow synchronization using real disks and
sparse files on a single disk. I removed the sparse files as soon as the array
was
initialized, and writing to two searate drives went reasonably well. 

I sent data from the latest snapshot of the oldpool to the newpool with 
{code}
zfs send -R oldpool at 20090118-02-postUpgrade | zfs  recv -vF -d newpool
{code}

Larger datasets went in the normal range of 13-20Mb/s (of course, smaller 
datasets and snapshots ranging in a few kilobytes of size took more time to
open-close than actually copying data; so estimated speed was bytes or kbytes 
per sec).

//Jim
-- 
This message posted from opensolaris.org

zfs discuss - Jan 2009 - Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?

[zfs-discuss] Can I create ZPOOL with missing disks?