I''m setting up a ZFS fileserver using a bunch of spare drives. I''d like some redundancy and to maximize disk usage, so my plan was to use raid-z. The problem is that the drives are considerably mismatched and I haven''t found documentation (though I don''t see why it shouldn''t be possible) to stripe smaller drives together to match bigger ones. The drives are: 1x750, 2x500, 2x400, 2x320, 2x250. Is it possible to accomplish the following with those drives: raid-z 750 500+250=750 500+250=750 400+320=720 400+720=720 and if so what is the command? The only way I''ve thought to implement this is to create striped pools for each subset of drives, create a file that fills the whole pool, and use the file as a vdev. That seems like a nasty workaround though, and an unnecessary one. If there''s any trouble understanding the question I can elaborate a bit. Any advice is greatly appreciated! This message posted from opensolaris.org
On Thu, 21 Aug 2008, John wrote:> I''m setting up a ZFS fileserver using a bunch of spare drives. I''d > like some redundancy and to maximize disk usage, so my plan was to > use raid-z. The problem is that the drives are considerably > mismatched and I haven''t found documentation (though I don''t see why > it shouldn''t be possible) to stripe smaller drives together to match > bigger ones. The drives are: 1x750, 2x500, 2x400, 2x320, 2x250. Is > it possible to accomplish the following with those drives:The ZFS vdev will only use up to the size of the smallest device in it. If your smallest device is 250GB and another device in the same vdev is 350GB, then 100GB of that device will be ignored. While I would not really recommend it, a way out of the predicament is to use partitioning (via ''format'') to partition a large drive into several smaller partitions which are similar in size to your smaller drives. The reason why this is not recommended is that a single drive failure could then take out several logical devices and the vdev and pool could be toast. With care, it could work ok with simple mirrors, but mirrors waste 1/2 the physical disk space. The better solution is to try to build your vdevs out of similar-sized disk drives from the start. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Hi, John wrote:> I''m setting up a ZFS fileserver using a bunch of spare drives. I''d like some redundancy and to maximize disk usage, so my plan was to use raid-z. The problem is that the drives are considerably mismatched and I haven''t found documentation (though I don''t see why it shouldn''t be possible) to stripe smaller drives together to match bigger ones. The drives are: 1x750, 2x500, 2x400, 2x320, 2x250. Is it possible to accomplish the following with those drives: > > raid-z > 750 > 500+250=750 > 500+250=750 > 400+320=720 > 400+720=720Though I''ve never used this in production, it seems possible to layer ZFS on good old SDS (aka SVM, disksuite). At least I managed to create a trivial pool on what-10-mins-ago-was-my-swap-slice: haggis:/var/tmp# metadb -f -a -c 3 /dev/dsk/c5t0d0s7 haggis:/var/tmp# metainit d10 1 1 /dev/dsk/c5t0d0s1 d10: Concat/Stripe is setup haggis:/var/tmp# zpool create test /dev/md/dsk/d10 haggis:/var/tmp# zpool status test pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 /dev/md/dsk/d10 ONLINE 0 0 0 So it looks like you could do the follwing: * Put a small slice (10-20m should suffice, by convention it''s slice 7 on the first cylinders) on each of your disks and make them the metadb, if you are not using SDS already metadb -f -a -c 3 <all your slices_7> make slice 0 the remainder of each disk * for your 500/250G drives, create a concat (stripe not possible) for each pair. for clarity, I''d recommend to include the 750G disk as well (syntax from memory, apologies if I''m wrong with details): metainit d11 1 1 <700G disk>s0 metainit d12 2 1 <500G disk>s0 1 <250G disk>s0 metainit d13 2 1 <500G disk>s0 1 <250G disk>s0 metainit d14 2 1 <400G disk>s0 1 <320G disk>s0 metainit d15 2 1 <400G disk>s0 1 <320G disk>s0 * create a raidz pool on your metadevices zpool create <name> raidz /dev/md/dsk/d11 /dev/md/dsk/d12 /dev/md/dsk/d13 /dev/md/dsk/d14 /dev/md/dsk/d15 Again: I have never tried this, so please don''t blame me if this doesn''t work. Nils This message posted from opensolaris.org
About the best I can see: zpool create dirtypool raidz 250a 250b 320a raidz 320b 400a 400b raidz 500a 500b 750a And you have to do them in that order. The zpool will create using the smallest device. This gets you about 2140GB (500 + 640 + 1000) of space. Your desired method is only 2880GB (720 * 4) and is WAY harder to setup and maintain, especially if you get into the SDS configuration. I, for one, welcome our convoluted configuration overlords. I''d also like to see what the zpool looks like if it works. This is, obviously, untested. chris On Fri, Aug 22, 2008 at 11:03 AM, Nils Goroll <nils.goroll at hamburg.de>wrote:> Hi, > > John wrote: > > I''m setting up a ZFS fileserver using a bunch of spare drives. I''d like > some redundancy and to maximize disk usage, so my plan was to use raid-z. > The problem is that the drives are considerably mismatched and I haven''t > found documentation (though I don''t see why it shouldn''t be possible) to > stripe smaller drives together to match bigger ones. The drives are: 1x750, > 2x500, 2x400, 2x320, 2x250. Is it possible to accomplish the following with > those drives: > > > > raid-z > > 750 > > 500+250=750 > > 500+250=750 > > 400+320=720 > > 400+720=720 > > > Though I''ve never used this in production, it seems possible to layer ZFS > on good old SDS (aka SVM, disksuite). > > At least I managed to create a trivial pool on > what-10-mins-ago-was-my-swap-slice: > > haggis:/var/tmp# metadb -f -a -c 3 /dev/dsk/c5t0d0s7 > haggis:/var/tmp# metainit d10 1 1 /dev/dsk/c5t0d0s1 > d10: Concat/Stripe is setup > haggis:/var/tmp# zpool create test /dev/md/dsk/d10 > haggis:/var/tmp# zpool status test > pool: test > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > /dev/md/dsk/d10 ONLINE 0 0 0 > > So it looks like you could do the follwing: > > * Put a small slice (10-20m should suffice, by convention it''s slice 7 on > the first cylinders) on each of your disks and make them the metadb, if you > are not using SDS already > metadb -f -a -c 3 <all your slices_7> > > make slice 0 the remainder of each disk > > * for your 500/250G drives, create a concat (stripe not possible) for each > pair. for clarity, I''d recommend to include the 750G disk as well (syntax > from memory, apologies if I''m wrong with details): > > metainit d11 1 1 <700G disk>s0 > metainit d12 2 1 <500G disk>s0 1 <250G disk>s0 > metainit d13 2 1 <500G disk>s0 1 <250G disk>s0 > metainit d14 2 1 <400G disk>s0 1 <320G disk>s0 > metainit d15 2 1 <400G disk>s0 1 <320G disk>s0 > > * create a raidz pool on your metadevices > > zpool create <name> raidz /dev/md/dsk/d11 /dev/md/dsk/d12 /dev/md/dsk/d13 > /dev/md/dsk/d14 /dev/md/dsk/d15 > > Again: I have never tried this, so please don''t blame me if this doesn''t > work. > > Nils > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- chris -at- microcozm -dot- net === Si Hoc Legere Scis Nimium Eruditionis Habes -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080822/d7defe7d/attachment.html>
Chris Cosby wrote:> About the best I can see: > > zpool create dirtypool raidz 250a 250b 320a raidz 320b 400a 400b raidz > 500a 500b 750a > > And you have to do them in that order. The zpool will create using the > smallest device. This gets you about 2140GB (500 + 640 + 1000) of > space. Your desired method is only 2880GB (720 * 4) and is WAY harder > to setup and maintain, especially if you get into the SDS configuration. > > I, for one, welcome our convoluted configuration overlords. I''d also > like to see what the zpool looks like if it works. This is, obviously, > untested. >I don''t think I''d be that comfortable doing it, but I suppose you could just add each drive as a separate vDev, and set copies=2, but even that would get you about 1825GB (If my math is right the disks add up to 3650GB) -Kyle> chris > > > On Fri, Aug 22, 2008 at 11:03 AM, Nils Goroll <nils.goroll at hamburg.de > <mailto:nils.goroll at hamburg.de>> wrote: > > Hi, > > John wrote: > > I''m setting up a ZFS fileserver using a bunch of spare drives. > I''d like some redundancy and to maximize disk usage, so my plan > was to use raid-z. The problem is that the drives are considerably > mismatched and I haven''t found documentation (though I don''t see > why it shouldn''t be possible) to stripe smaller drives together to > match bigger ones. The drives are: 1x750, 2x500, 2x400, 2x320, > 2x250. Is it possible to accomplish the following with those drives: > > > > raid-z > > 750 > > 500+250=750 > > 500+250=750 > > 400+320=720 > > 400+720=720 > > > Though I''ve never used this in production, it seems possible to > layer ZFS on good old SDS (aka SVM, disksuite). > > At least I managed to create a trivial pool on > what-10-mins-ago-was-my-swap-slice: > > haggis:/var/tmp# metadb -f -a -c 3 /dev/dsk/c5t0d0s7 > haggis:/var/tmp# metainit d10 1 1 /dev/dsk/c5t0d0s1 > d10: Concat/Stripe is setup > haggis:/var/tmp# zpool create test /dev/md/dsk/d10 > haggis:/var/tmp# zpool status test > pool: test > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > test ONLINE 0 0 0 > /dev/md/dsk/d10 ONLINE 0 0 0 > > So it looks like you could do the follwing: > > * Put a small slice (10-20m should suffice, by convention it''s > slice 7 on the first cylinders) on each of your disks and make > them the metadb, if you are not using SDS already > metadb -f -a -c 3 <all your slices_7> > > make slice 0 the remainder of each disk > > * for your 500/250G drives, create a concat (stripe not possible) > for each pair. for clarity, I''d recommend to include the 750G disk > as well (syntax from memory, apologies if I''m wrong with details): > > metainit d11 1 1 <700G disk>s0 > metainit d12 2 1 <500G disk>s0 1 <250G disk>s0 > metainit d13 2 1 <500G disk>s0 1 <250G disk>s0 > metainit d14 2 1 <400G disk>s0 1 <320G disk>s0 > metainit d15 2 1 <400G disk>s0 1 <320G disk>s0 > > * create a raidz pool on your metadevices > > zpool create <name> raidz /dev/md/dsk/d11 /dev/md/dsk/d12 > /dev/md/dsk/d13 /dev/md/dsk/d14 /dev/md/dsk/d15 > > Again: I have never tried this, so please don''t blame me if this > doesn''t work. > > Nils > > > This message posted from opensolaris.org <http://opensolaris.org> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org <mailto:zfs-discuss at opensolaris.org> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > > -- > chris -at- microcozm -dot- net > === Si Hoc Legere Scis Nimium Eruditionis Habes > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Heikki Suonsivu on list forwarder
2008-Aug-23 22:04 UTC
[zfs-discuss] Possible to do a stripe vdev?
Kyle McDonald wrote:> Chris Cosby wrote: >> About the best I can see: >> >> zpool create dirtypool raidz 250a 250b 320a raidz 320b 400a 400b raidz >> 500a 500b 750a >> >> And you have to do them in that order. The zpool will create using the >> smallest device. This gets you about 2140GB (500 + 640 + 1000) of >> space. Your desired method is only 2880GB (720 * 4) and is WAY harder >> to setup and maintain, especially if you get into the SDS configuration. >> >> I, for one, welcome our convoluted configuration overlords. I''d also >> like to see what the zpool looks like if it works. This is, obviously, >> untested. >> > I don''t think I''d be that comfortable doing it, but I suppose you could > just add each drive as a separate vDev, and set copies=2, but even that > would get you about 1825GB (If my math is right the disks add up to 3650GB) > > -KyleThere seems to be confusion about whether this works or not. - Marketing speak says metadata is redundant, and in case of at least two disks, it is distributed on at least two disks. - In case of filesystems where copies=2 this should also happen to file data - Which should mean that above configuration should be redundant and tolerate loss of one disk. - People having trouble on the list say that it does not work, if for any reason, after disk failure, the system shuts down, crashes, etc, because you cannot mount the pools - they are in unavailable state, even though according to marketing speak it should be possible to mount and go on, and recover all files with copies=2+, and for other files get information on which files are bad. - So, the QUESTION is: Is the marketing speak totally bogus, or is there missing code/bug/etc which prevents getting pool with a lost disk on-line (looping back to first question). Heikki>> chris >> >> >> On Fri, Aug 22, 2008 at 11:03 AM, Nils Goroll <nils.goroll at hamburg.de >> <mailto:nils.goroll at hamburg.de>> wrote: >> >> Hi, >> >> John wrote: >> > I''m setting up a ZFS fileserver using a bunch of spare drives. >> I''d like some redundancy and to maximize disk usage, so my plan >> was to use raid-z. The problem is that the drives are considerably >> mismatched and I haven''t found documentation (though I don''t see >> why it shouldn''t be possible) to stripe smaller drives together to >> match bigger ones. The drives are: 1x750, 2x500, 2x400, 2x320, >> 2x250. Is it possible to accomplish the following with those drives: >> > >> > raid-z >> > 750 >> > 500+250=750 >> > 500+250=750 >> > 400+320=720 >> > 400+720=720 >> >> >> Though I''ve never used this in production, it seems possible to >> layer ZFS on good old SDS (aka SVM, disksuite). >> >> At least I managed to create a trivial pool on >> what-10-mins-ago-was-my-swap-slice: >> >> haggis:/var/tmp# metadb -f -a -c 3 /dev/dsk/c5t0d0s7 >> haggis:/var/tmp# metainit d10 1 1 /dev/dsk/c5t0d0s1 >> d10: Concat/Stripe is setup >> haggis:/var/tmp# zpool create test /dev/md/dsk/d10 >> haggis:/var/tmp# zpool status test >> pool: test >> state: ONLINE >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> test ONLINE 0 0 0 >> /dev/md/dsk/d10 ONLINE 0 0 0 >> >> So it looks like you could do the follwing: >> >> * Put a small slice (10-20m should suffice, by convention it''s >> slice 7 on the first cylinders) on each of your disks and make >> them the metadb, if you are not using SDS already >> metadb -f -a -c 3 <all your slices_7> >> >> make slice 0 the remainder of each disk >> >> * for your 500/250G drives, create a concat (stripe not possible) >> for each pair. for clarity, I''d recommend to include the 750G disk >> as well (syntax from memory, apologies if I''m wrong with details): >> >> metainit d11 1 1 <700G disk>s0 >> metainit d12 2 1 <500G disk>s0 1 <250G disk>s0 >> metainit d13 2 1 <500G disk>s0 1 <250G disk>s0 >> metainit d14 2 1 <400G disk>s0 1 <320G disk>s0 >> metainit d15 2 1 <400G disk>s0 1 <320G disk>s0 >> >> * create a raidz pool on your metadevices >> >> zpool create <name> raidz /dev/md/dsk/d11 /dev/md/dsk/d12 >> /dev/md/dsk/d13 /dev/md/dsk/d14 /dev/md/dsk/d15 >> >> Again: I have never tried this, so please don''t blame me if this >> doesn''t work. >> >> Nils >> >> >> This message posted from opensolaris.org <http://opensolaris.org> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org <mailto:zfs-discuss at opensolaris.org> >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> >> >> >> -- >> chris -at- microcozm -dot- net >> === Si Hoc Legere Scis Nimium Eruditionis Habes >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Heikki Suonsivu on list forwarder
2008-Aug-24 10:26 UTC
[zfs-discuss] Redundancy with a stripe vdev and copies=2
Nils Goroll wrote:> Hi, > > Heikki Suonsivu on list forwarder wrote: >> - So, the QUESTION is: Is the marketing speak totally bogus, or is >> there missing code/bug/etc which prevents getting pool with a lost >> disk on-line (looping back to first question). > > Besides those practical aspects, for me the question is if there is any > guarantee that with copies>=2, all copies will be placed on different > vdevs if possible.That is what the manual page says: : The copies are stored on different disks, if possible. (Though "if possible" is not defined. I did not peek into code yet)> NilsHeikki
Heikki Suonsivu on list forwarder wrote:> Kyle McDonald wrote: > >> Chris Cosby wrote: >> >>> About the best I can see: >>> >>> zpool create dirtypool raidz 250a 250b 320a raidz 320b 400a 400b raidz >>> 500a 500b 750a >>> >>> And you have to do them in that order. The zpool will create using the >>> smallest device. This gets you about 2140GB (500 + 640 + 1000) of >>> space. Your desired method is only 2880GB (720 * 4) and is WAY harder >>> to setup and maintain, especially if you get into the SDS configuration. >>> >>> I, for one, welcome our convoluted configuration overlords. I''d also >>> like to see what the zpool looks like if it works. This is, obviously, >>> untested. >>> >>> >> I don''t think I''d be that comfortable doing it, but I suppose you could >> just add each drive as a separate vDev, and set copies=2, but even that >> would get you about 1825GB (If my math is right the disks add up to 3650GB) >> >> -Kyle >> > > There seems to be confusion about whether this works or not. >Of course it works as designed...> - Marketing speak says metadata is redundant, and in case of at least > two disks, it is distributed on at least two disks. >I''m not sure what "marketing speak" you''re referring to, there are very few marketing materials for ZFS. Do you have a pointer?> - In case of filesystems where copies=2 this should also happen to file data > >Yes, by definition copies=2 makes the data double redundant and the metadata triple redundant.> - Which should mean that above configuration should be redundant and > tolerate loss of one disk. >It depends on the failure mode. If the disk suffers catastrophic death, then you are in a situation where the entire set of top-level vdevs is not available. Depending on the exact configuration, loss of a top-level vdev will cause the pool to not be importable. For the more common failure modes, it should recover nicely. I believe that the most common use cases for copies=2 is for truly important data or the single-vdev case.> - People having trouble on the list say that it does not work, if for > any reason, after disk failure, the system shuts down, crashes, etc, > because you cannot mount the pools - they are in unavailable state, even > though according to marketing speak it should be possible to mount and > go on, and recover all files with copies=2+, and for other files get > information on which files are bad. >It depends on the failure mode. Most of the ZFS versions out there do have the ability to identify files which have been corrupted with the "zfs status -xv" options.> - So, the QUESTION is: Is the marketing speak totally bogus, or is > there missing code/bug/etc which prevents getting pool with a lost disk > on-line (looping back to first question). >Real life dictates that there is no one, single, true answer -- just a series of trade-offs. If you ask me, I say make your data redundant by at least one method. More redundancy is better. more below...> Heikki > > >>> chris >>> >>> >>> On Fri, Aug 22, 2008 at 11:03 AM, Nils Goroll <nils.goroll at hamburg.de >>> <mailto:nils.goroll at hamburg.de>> wrote: >>> >>> Hi, >>> >>> John wrote: >>> > I''m setting up a ZFS fileserver using a bunch of spare drives. >>> I''d like some redundancy and to maximize disk usage, so my plan >>> was to use raid-z. The problem is that the drives are considerably >>> mismatched and I haven''t found documentation (though I don''t see >>> why it shouldn''t be possible) to stripe smaller drives together to >>> match bigger ones. The drives are: 1x750, 2x500, 2x400, 2x320, >>> 2x250. Is it possible to accomplish the following with those drives: >>>Don''t worry about how much space you have, worry about how much space you need, over time. Consider growing your needs into the space over time. For example, if you need 100 GBytes today, 400 GBytes in 6 months, and 1 TByte next year, then start with: zpool create mypool mirror 320 320 [turn off the remaining disks, no need to burn the power or lifetime] in 6 months zpool add mypool mirror 400 400 next year zpool add mypool mirror 500 500 US street price for disks runs about $100, but the density increases over time, so you could also build in a migration every two years or so. zpool replace mypool 320 1500 [do this for each side] -- richard
[clarification below...] Richard Elling wrote:> Heikki Suonsivu on list forwarder wrote: > >> Kyle McDonald wrote: >> >> >>> Chris Cosby wrote: >>> >>> >>>> About the best I can see: >>>> >>>> zpool create dirtypool raidz 250a 250b 320a raidz 320b 400a 400b raidz >>>> 500a 500b 750a >>>> >>>> And you have to do them in that order. The zpool will create using the >>>> smallest device. This gets you about 2140GB (500 + 640 + 1000) of >>>> space. Your desired method is only 2880GB (720 * 4) and is WAY harder >>>> to setup and maintain, especially if you get into the SDS configuration. >>>> >>>> I, for one, welcome our convoluted configuration overlords. I''d also >>>> like to see what the zpool looks like if it works. This is, obviously, >>>> untested. >>>> >>>> >>>> >>> I don''t think I''d be that comfortable doing it, but I suppose you could >>> just add each drive as a separate vDev, and set copies=2, but even that >>> would get you about 1825GB (If my math is right the disks add up to 3650GB) >>> >>> -Kyle >>> >>> >> There seems to be confusion about whether this works or not. >> >> > > Of course it works as designed... > > >> - Marketing speak says metadata is redundant, and in case of at least >> two disks, it is distributed on at least two disks. >> >> > > I''m not sure what "marketing speak" you''re referring to, there are > very few marketing materials for ZFS. Do you have a pointer? > > >> - In case of filesystems where copies=2 this should also happen to file data >> >> >> > > Yes, by definition copies=2 makes the data double redundant and > the metadata triple redundant. > > >> - Which should mean that above configuration should be redundant and >> tolerate loss of one disk. >> >> > > It depends on the failure mode. If the disk suffers catastrophic > death, then you are in a situation where the entire set of top-level > vdevs is not available. Depending on the exact configuration, > loss of a top-level vdev will cause the pool to not be importable. >clarification: the assumption I made here was that the top-level vdev is not protected. If the top-level vdev is protected (mirrored, raidz[12]) then loss of a disk will still result in the pool being importable. The key concept here is that the copies features works above and in addition to any vdev redundancy. -- richard> For the more common failure modes, it should recover nicely. > I believe that the most common use cases for copies=2 is for > truly important data or the single-vdev case. > > >> - People having trouble on the list say that it does not work, if for >> any reason, after disk failure, the system shuts down, crashes, etc, >> because you cannot mount the pools - they are in unavailable state, even >> though according to marketing speak it should be possible to mount and >> go on, and recover all files with copies=2+, and for other files get >> information on which files are bad. >> >> > > It depends on the failure mode. Most of the ZFS versions out there > do have the ability to identify files which have been corrupted with > the "zfs status -xv" options. > > >> - So, the QUESTION is: Is the marketing speak totally bogus, or is >> there missing code/bug/etc which prevents getting pool with a lost disk >> on-line (looping back to first question). >> >> > > Real life dictates that there is no one, single, true answer -- just a > series > of trade-offs. If you ask me, I say make your data redundant by at least > one method. More redundancy is better. > > more below... > >> Heikki >> >> >> >>>> chris >>>> >>>> >>>> On Fri, Aug 22, 2008 at 11:03 AM, Nils Goroll <nils.goroll at hamburg.de >>>> <mailto:nils.goroll at hamburg.de>> wrote: >>>> >>>> Hi, >>>> >>>> John wrote: >>>> > I''m setting up a ZFS fileserver using a bunch of spare drives. >>>> I''d like some redundancy and to maximize disk usage, so my plan >>>> was to use raid-z. The problem is that the drives are considerably >>>> mismatched and I haven''t found documentation (though I don''t see >>>> why it shouldn''t be possible) to stripe smaller drives together to >>>> match bigger ones. The drives are: 1x750, 2x500, 2x400, 2x320, >>>> 2x250. Is it possible to accomplish the following with those drives: >>>> >>>> > > Don''t worry about how much space you have, worry about how > much space you need, over time. Consider growing your needs > into the space over time. For example, if you need 100 GBytes today, > 400 GBytes in 6 months, and 1 TByte next year, then start with: > zpool create mypool mirror 320 320 > [turn off the remaining disks, no need to burn the power or lifetime] > > in 6 months > zpool add mypool mirror 400 400 > > next year > zpool add mypool mirror 500 500 > > US street price for disks runs about $100, but the density increases > over time, so you could also build in a migration every two years > or so. > zpool replace mypool 320 1500 [do this for each side] > > -- richard > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Heikki Suonsivu on list forwarder
2008-Aug-25 06:25 UTC
[zfs-discuss] Possible to do a stripe vdev?
Richard Elling wrote: > Heikki Suonsivu on list forwarder wrote: >> Kyle McDonald wrote: >> >>> Chris Cosby wrote: >>> >>>> About the best I can see: >>>> >>>> zpool create dirtypool raidz 250a 250b 320a raidz 320b 400a 400b >>>> raidz 500a 500b 750a >>>> >>>> And you have to do them in that order. The zpool will create using >>>> the smallest device. This gets you about 2140GB (500 + 640 + 1000) >>>> of space. Your desired method is only 2880GB (720 * 4) and is WAY >>>> harder to setup and maintain, especially if you get into the SDS >>>> configuration. >>>> >>>> I, for one, welcome our convoluted configuration overlords. I''d also >>>> like to see what the zpool looks like if it works. This is, >>>> obviously, untested. >>>> >>>> >>> I don''t think I''d be that comfortable doing it, but I suppose you >>> could just add each drive as a separate vDev, and set copies=2, but >>> even that would get you about 1825GB (If my math is right the disks >>> add up to 3650GB) >>> >>> -Kyle >>> >> >> There seems to be confusion about whether this works or not. >> > > Of course it works as designed... > >> - Marketing speak says metadata is redundant, and in case of at least >> two disks, it is distributed on at least two disks. >> > > I''m not sure what "marketing speak" you''re referring to, there are > very few marketing materials for ZFS. Do you have a pointer? This particular claim was from zfs manual page. >> - In case of filesystems where copies=2 this should also happen to >> file data >> >> > > Yes, by definition copies=2 makes the data double redundant and > the metadata triple redundant. > >> - Which should mean that above configuration should be redundant and >> tolerate loss of one disk. >> > > It depends on the failure mode. If the disk suffers catastrophic > death, then you are in a situation where the entire set of top-level > vdevs is not available. Depending on the exact configuration, > loss of a top-level vdev will cause the pool to not be importable. > For the more common failure modes, it should recover nicely. > I believe that the most common use cases for copies=2 is for > truly important data or the single-vdev case. Out of last ten failures or so, I do remember increasing bad blocks in two cases, drive dying within few minutes of startup in one case, and total drive death in all the others, making funky noises or none at all. So, lets assume that this is the most common case, drive dies. To simplify question, lets assume that it is replaced with an empty one (as one would do in real RAID case). That would mean that all blocks read from that disk would return data which does not match checksums. As all metadata and data has been written to multiple drives, this should be recoverable situation and the computer should come up with all files accessible, with some log warnings about situation. Why would it not be? >> - People having trouble on the list say that it does not work, if for >> any reason, after disk failure, the system shuts down, crashes, etc, >> because you cannot mount the pools - they are in unavailable state, >> even though according to marketing speak it should be possible to >> mount and go on, and recover all files with copies=2+, and for other >> files get information on which files are bad. >> > > It depends on the failure mode. Most of the ZFS versions out there > do have the ability to identify files which have been corrupted with > the "zfs status -xv" options. > >> - So, the QUESTION is: Is the marketing speak totally bogus, or is >> there missing code/bug/etc which prevents getting pool with a lost >> disk on-line (looping back to first question). >> > > Real life dictates that there is no one, single, true answer -- just a > series > of trade-offs. If you ask me, I say make your data redundant by at least > one method. More redundancy is better. > > more below... >> Heikki >> >> >>>> chris >>>> >>>> >>>> On Fri, Aug 22, 2008 at 11:03 AM, Nils Goroll >>>> <nils.goroll at hamburg.de <mailto:nils.goroll at hamburg.de>> wrote: >>>> >>>> Hi, >>>> >>>> John wrote: >>>> > I''m setting up a ZFS fileserver using a bunch of spare drives. >>>> I''d like some redundancy and to maximize disk usage, so my plan >>>> was to use raid-z. The problem is that the drives are considerably >>>> mismatched and I haven''t found documentation (though I don''t see >>>> why it shouldn''t be possible) to stripe smaller drives together to >>>> match bigger ones. The drives are: 1x750, 2x500, 2x400, 2x320, >>>> 2x250. Is it possible to accomplish the following with those >>>> drives: >>>> > > Don''t worry about how much space you have, worry about how > much space you need, over time. Consider growing your needs > into the space over time. For example, if you need 100 GBytes today, > 400 GBytes in 6 months, and 1 TByte next year, then start with: > zpool create mypool mirror 320 320 > [turn off the remaining disks, no need to burn the power or lifetime] > > in 6 months > zpool add mypool mirror 400 400 > > next year > zpool add mypool mirror 500 500 > > US street price for disks runs about $100, but the density increases > over time, so you could also build in a migration every two years > or so. > zpool replace mypool 320 1500 [do this for each side] > > -- richard