Hi, I have a strange requirement. My pool consists of 2 500GB disks in stripe which I am trying to convert into a RAIDZ setup without data loss but I have only two additional disks: 750GB and 1TB. So, here is what I thought: 1. Carve a 500GB slice (A) in 750GB and 2 500GB slices (B,C) in 1TB. 2. Create a RAIDZ pool out of these 3 slices. Performance will be bad because of seeks in the same disk for B and C but its just temporary. 3. zfs send | recv my current pool data into the new pool. 4. Destroy the current pool. 5. In the new pool, replace B with the 500GB disk freed by the destruction of the current pool. 6. Optionally, replace C with second 500GB to free up the 750GB completely. So, essentially I have slices out of 3 separate disks giving me my needed 1TB space. Additional 500GB on the 1TB drive can be used for scratch non-important data or may be even mirrored with a slice from 750GB disk. Will this work as I am hoping it should? Any potential gotchas? Thanks a bunch! -devsk -- This message posted from opensolaris.org
On 04/20/10 04:13 PM, Sunil wrote:> Hi, > > I have a strange requirement. My pool consists of 2 500GB disks in stripe which I am trying to convert into a RAIDZ setup without data loss but I have only two additional disks: 750GB and 1TB. So, here is what I thought: > > 1. Carve a 500GB slice (A) in 750GB and 2 500GB slices (B,C) in 1TB. > 2. Create a RAIDZ pool out of these 3 slices. Performance will be bad because of seeks in the same disk for B and C but its just temporary. >If the 1TB drive fails, you''re buggered. So there''s not a lot of point setting up a raidz. You may as well create a pool on the 1TB drive and copy to that.> 3. zfs send | recv my current pool data into the new pool. > 4. Destroy the current pool. > 5. In the new pool, replace B with the 500GB disk freed by the destruction of the current pool. > 6. Optionally, replace C with second 500GB to free up the 750GB completely. > >Or use the two 500GB and the 750 GB drive for the raidz. -- Ian.
> On 04/20/10 04:13 PM, Sunil wrote: > > Hi, > > > > I have a strange requirement. My pool consists of 2 > 500GB disks in stripe which I am trying to convert > into a RAIDZ setup without data loss but I have only > two additional disks: 750GB and 1TB. So, here is what > I thought: > > > > 1. Carve a 500GB slice (A) in 750GB and 2 500GB > slices (B,C) in 1TB. > > 2. Create a RAIDZ pool out of these 3 slices. > Performance will be bad because of seeks in the same > disk for B and C but its just temporary. > > > > If the 1TB drive fails, you''re buggered. So there''s > not a lot of point > setting up a raidz.This is temporary. Please read my post again.> > You may as well create a pool on the 1TB drive and > copy to that. > > > 3. zfs send | recv my current pool data into the > new pool. > > 4. Destroy the current pool. > > 5. In the new pool, replace B with the 500GB disk > freed by the destruction of the current pool. > > 6. Optionally, replace C with second 500GB to free > up the 750GB completely. > > > > > Or use the two 500GB and the 750 GB drive for the > raidz.And lose my existing data on those 2 500GB disks? Please, at least read the post before replying....:( -- This message posted from opensolaris.org
On 04/20/10 05:00 PM, Sunil wrote:>> On 04/20/10 04:13 PM, Sunil wrote: >> >>> Hi, >>> >>> I have a strange requirement. My pool consists of 2 >>> >> 500GB disks in stripe which I am trying to convert >> into a RAIDZ setup without data loss but I have only >> two additional disks: 750GB and 1TB. So, here is what >> I thought: >> >>> 1. Carve a 500GB slice (A) in 750GB and 2 500GB >>> >> slices (B,C) in 1TB. >> >>> 2. Create a RAIDZ pool out of these 3 slices. >>> >> Performance will be bad because of seeks in the same >> disk for B and C but its just temporary. >> >>> >>> >> If the 1TB drive fails, you''re buggered. So there''s >> not a lot of point >> setting up a raidz. >> > This is temporary. Please read my post again. > >I know, the comment still stands.>> You may as well create a pool on the 1TB drive and >> copy to that. >> >> >>> 3. zfs send | recv my current pool data into the >>> >> new pool. >> >>> 4. Destroy the current pool. >>> 5. In the new pool, replace B with the 500GB disk >>> >> freed by the destruction of the current pool. >> >>> 6. Optionally, replace C with second 500GB to free >>> >> up the 750GB completely. >> >>> >>> >> Or use the two 500GB and the 750 GB drive for the >> raidz. >> > And lose my existing data on those 2 500GB disks? > >Copy it back form the temporary pool, you are replacing your existing pool, aren''t you? So you''ll loose the data on it regardless.> Please, at least read the post before replying....:( >I did. -- Ian.
On Tue, Apr 20, 2010 at 12:07 PM, Ian Collins <ian at ianshome.com> wrote:>> And lose my existing data on those 2 500GB disks? >> >> > > Copy it back form the temporary pool, you are replacing your existing pool, > aren''t you? ?So you''ll loose the data on it regardless. > >> Please, at least read the post before replying....:( >> > > I did. >a little bit easier to read: current condition: - 2x500gb disks striped, contains data - empty 750gb dan 1tb target: - 3x500gb raidz with third disk from 1tb/750gb originally proposed method: - create 3x500gb raidz using 500gb from 750gb and 500gb x2 from 1tb slices - move data from striped source to new raidz pool - destroy striped pool - replace one 500gb slice at 1tb disk to 500gb disk previously striped - optionally replace 500gb at 750gb (or 1tb) disk to another 500gb disk previously striped suggested method by Ian: - create 1tb pool using 1tb disk - move data from striped source to new raidz pool - destroy striped pool - create 3x500gb raidz pool using 2x500gb disks and 750gb disk - move data back to new raidz pool Using the former method will be slower because of using two components in one disk, increasing write load. -- O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
ouch! My apologies! I did not understand what you were trying to say. I was gearing towards: 1. Using the newer 1TB in the eventual RAIDZ. Newer hardware typically means (slightly) faster access times and sequential throughput. 2. Getting the RAIDZ serviceable quick. Your method will cause two full copy operations. Data will likely be copied to the same extent with my method but it will become and remain available (almost) all the time (minus 1TB failing on me during the transition). -- This message posted from opensolaris.org
On Tue, Apr 20, 2010 at 12:32 PM, Sunil <funtoos at yahoo.com> wrote:> ouch! My apologies! I did not understand what you were trying to say. > > I was gearing towards: > > 1. Using the newer 1TB in the eventual RAIDZ. Newer hardware typically means (slightly) faster access times and sequential throughput. > 2. Getting the RAIDZ serviceable quick. Your method will cause two full copy operations. Data will likely be copied to the same extent with my method but it will become and remain available (almost) all the time (minus 1TB failing on me during the transition). > --probably you can take the second slice on 1tb offline after creating raidz pool to increase speed. -- O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
On 04/20/10 05:32 PM, Sunil wrote:> ouch! My apologies! I did not understand what you were trying to say. > > I was gearing towards: > > 1. Using the newer 1TB in the eventual RAIDZ. Newer hardware typically means (slightly) faster access times and sequential throughput. >Using a slice on a newer 1TB drive will probably be slower than using the whole of a 75GB one (write cache and all that).> 2. Getting the RAIDZ serviceable quick. Your method will cause two full copy operations. Data will likely be copied to the same extent with my method but it will become and remain available (almost) all the time (minus 1TB failing on me during the transition). >It may take two copies, but: a) you end up with a better solution b) I''ve never been inclined to try it, but a raidz with two slices on one drive will probably run like a three legged dog! Replicating 1TB of data doesn''t take that long, so two copies with sensible pool topologies may be quicker than one with a bad one. c) you will have a spare 1TB drive to put in a USB enclosure and use for backups! -- Ian.
On Apr 20, 2010, at 12:13 AM, Sunil <funtoos at yahoo.com> wrote:> Hi, > > I have a strange requirement. My pool consists of 2 500GB disks in > stripe which I am trying to convert into a RAIDZ setup without data > loss but I have only two additional disks: 750GB and 1TB. So, here > is what I thought: > > 1. Carve a 500GB slice (A) in 750GB and 2 500GB slices (B,C) in 1TB. > 2. Create a RAIDZ pool out of these 3 slices. Performance will be > bad because of seeks in the same disk for B and C but its just > temporary. > 3. zfs send | recv my current pool data into the new pool. > 4. Destroy the current pool. > 5. In the new pool, replace B with the 500GB disk freed by the > destruction of the current pool. > 6. Optionally, replace C with second 500GB to free up the 750GB > completely. > > So, essentially I have slices out of 3 separate disks giving me my > needed 1TB space. Additional 500GB on the 1TB drive can be used for > scratch non-important data or may be even mirrored with a slice from > 750GB disk. > > Will this work as I am hoping it should? > > Any potential gotchas?Wouldn''t it just be easier to zfs send to a file on the 1TB, build your raidz, then zfs recv into the new raidz from this file? -Ross
On Mon, Apr 19, 2010 at 9:13 PM, Sunil <funtoos at yahoo.com> wrote:> Any potential gotchas?As others mentioned, doing raidz with two slices on the same disk is pointless from a redundancy perspective. You may as well just create a pool using only the 1TB drive, copy the data over, then create a raidz using the 500s and 750. How much data is in the current pool? Depending on the files, you may be able to create a mirror using the 750 & 1TB drive and fit your data using compression and/or dedup. -B -- Brandon High : bhigh at freaks.com
Ian Collins wrote:> On 04/20/10 04:13 PM, Sunil wrote: >> Hi, >> >> I have a strange requirement. My pool consists of 2 500GB disks in >> stripe which I am trying to convert into a RAIDZ setup without data >> loss but I have only two additional disks: 750GB and 1TB. So, here is >> what I thought: >> >> 1. Carve a 500GB slice (A) in 750GB and 2 500GB slices (B,C) in 1TB. >> 2. Create a RAIDZ pool out of these 3 slices. Performance will be bad >> because of seeks in the same disk for B and C but its just temporary. >> > > If the 1TB drive fails, you''re buggered. So there''s not a lot of > point setting up a raidz.It is possible to survive failures of a single drive with multiple slices on it that are in the same pool. It requires using a RAIDZ level equal or greater than the number of slices on that drive. RAIDZ2 on a 1 TB drive with two slices will survive the same as RAIDZ1 with one slice. (I''m focusing on addressing data survival here. Performance will be worse than usual, but even this impact may be mitigated by using a dedicated ZIL. (Remote and cloud based data storage using remote iSCSI devices and local ZIL devices have been shown to have much better performance characteristics than would have otherwise been expected from a cloud based system. See http://blogs.sun.com/jkshah/entry/zfs_with_cloud_storage_and ) With RAIDZ3, you can survive the loss of one drive with 3 slices on it that are all in one pool. (Of course at that point you can''t handle any further failures. Reliability with this kind of configuration is at worst equal to RAIDZ1, but likely better on average, because you can tolerate some specific multiple drive failure combinations that RAIDZ1 cannot handle. A similar comparison might be made between the reliability of a 4 drive RAIDZ2 pool vs. 4 drives in a stripe-mirror arrangement...you get similar usable space but in one case you can lose any 2 drives, in the other case you can lose any 1 drive and some combinations of 2 drives. I shared a variation of this idea a while ago in a comment here: http://blogs.sun.com/ahl/entry/expand_o_matic_raid_z A how to is below:> You may as well create a pool on the 1TB drive and copy to that. > >> 3. zfs send | recv my current pool data into the new pool. >> 4. Destroy the current pool. >> 5. In the new pool, replace B with the 500GB disk freed by the >> destruction of the current pool. >> 6. Optionally, replace C with second 500GB to free up the 750GB >> completely. >> >> > Or use the two 500GB and the 750 GB drive for the raidz. >Option to get all drives included: 1.) move all data to 1 TB drive 2.) create RAIDZ1/RAIDZ2 pool using 2* 500 GB drives, 750 GB drive, and a sparse file that you delete right after the pool is created. Your pool will be degraded by deleting the sparse file but will still work (because it is a RAIDZ). Use RAIDZ2 if you want ZFS''s protections to be active immediately (as you''ll have 3 out of 4 devices available). 3.) move all data from 1 TB drive to RAIDZ pool 4.) replace sparse file device with 1 TB drive (or 500 GB slice of 1 TB drive) 5.) resilver pool A variation on this is to create a RAIDZ2 using 2* 500 GB drives, 750 GB drive, and 2 sparse files. After the data is moved from the 1 TB drive to the RAIDZ2, two 500 GB slices are created on the 1 TB drive. These 2 slices in turn are used to replace the 2 sparse files. You''ll end up with 3*500GB of usable space and protection from at least 1 drive failure (the 1 TB drive) up to 2 drive failures (any of the other drives). Performance caveats of 2 slices on one drive apply. If you like, you can later add a fifth drive relatively easily by replacing one of the slices with a whole drive.
> If you like, you can later add a fifth drive > relatively easily by > replacing one of the slices with a whole drive. >how does this affect my available storage if I were to replace both of those sparse 500GB files with a real 1TB drive? Will it be same? Or will I have expanded my storage? If I understand correctly, I would need to replace other 3 drives with 1TB as well to expand beyond 3X500GB. So, in essence I can go from 3x500GB to 3X1000GB in-place with this scheme in future if I have the money to upgrade all the drives to 1TB, WITHOUT needing any movement of data to temp? Please say yes!....:-) -- This message posted from opensolaris.org
Sunil wrote:>> If you like, you can later add a fifth drive >> relatively easily by >> replacing one of the slices with a whole drive. >> >> > > how does this affect my available storage if I were to replace both of those sparse 500GB files with a real 1TB drive? Will it be same? Or will I have expanded my storage? If I understand correctly, I would need to replace other 3 drives with 1TB as well to expand beyond 3X500GB. > > So, in essence I can go from 3x500GB to 3X1000GB in-place with this scheme in future if I have the money to upgrade all the drives to 1TB, WITHOUT needing any movement of data to temp? Please say yes!....:-) >It should work to replace devices the way you describe. The only time you need some temp storage space is if you want to change the arrangement of devices that make up the pool, e.g. to go from striped-mirrors to RAIDZ2, or RAIDZ1 to RAIDZ2, or some other combination. If you just want to replace devices with identical or larger sized devices you don''t need to move the data anywhere. The capacity will expand to the lowest common denominator. In some OpenSolaris builds I believe this happened automatically when all member devices had been upgraded. At some point in later builds I think it was changed to require manual intervention to prevent problems (like the pool suddenly growing to fill all the new big drives when the admin really wanted the unused space to stay unused..say for partition/slice based short stroking, or when smaller drives were being kept around as spares. If ZFS had the ability to shrink and use smaller devices this would not have been as big of a problem. As I understand it from the documentation, replacement can happen two ways. First, you can connect the replacement device to the system at the same time as the original device is working, and then issue the replace command. I think this technique is safe, as the original device is still available during the replacement procedure and could be used to provide redundancy to the rest of the pool until the new device finishes resilvering. (Does anyone know if this is really the case...i.e. if redundancy is preserved during the replacement operation when both original and new devices are connected simultaneously and both are functioning correctly? One way to verify this is might be to run zfs replace on a non-redundant pool while both devices are connected.) The second way is to (physically) disconnect the original device and connect the new device in its place. The pool will be degraded because a member device is missing...if you have RAIDZ1, you have no redundancy remaining, if you have RAIDZ2, you still have 1 level of redundancy intact. The zfs replace command should be able to rebuild the missing data onto the replacement new device. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100423/508224fd/attachment.html>