Posted for my friend Marko: I''ve been reading up on ZFS with the idea to build a home NAS. My ideal home NAS would have: - high performance via striping - fault tolerance with selective use of multiple copies attribute - cheap by getting the most efficient space utilization possible (not raidz, not mirroring) - scalability I was hoping to start with 4 1TB disks, in a single striped pool with only some filesystems set to copies=2. I would be able to survive a single disk failure for my data which was on the copies2 filesystem. (trusting that I had enough free space across multiple disks that copies2 writes were not placed on the same physical disk) I could grow this filesystem just by adding single disks. Theoretically, at some point in time I would switch to copies=3 to increase my chances of surviving two disk failures. The block checksums would be a useful in early detection of failed disks. The major snag I discovered is that if a striped pool loses a disk, I can still read and write from the remaining data, but I cannot reboot and remount a partial piece of the stripe, even with -f. For example, if I lost some of my "single copies" data, I''d like to still access the good data, pop in a new (potentially larger) disk, re "cp" the important data to have multiple copies rebuilt, and not have to rebuild the entire pool structure. So the feature request would be for zfs to allow selective disk removal from striped pools, with the resultant data loss, but any data that survived, either by chance (living on the remaining disks) or policy (multiple copies) would still be accessible. Is there some underlying reason in zfs that precludes this functionality? If the filesystem partially-survives while the striped pool member disk fails and the box is still up, why not after a reboot? -- This message posted from opensolaris.org
MC
2008-Nov-22 07:45 UTC
[zfs-discuss] So close to better, faster, cheaper.... zfs stripe pool survival
> Posted for my friend Marko: > > I''ve been reading up on ZFS with the idea to build a > home NAS. > > My ideal home NAS would have: > > - high performance via striping > - fault tolerance with selective use of multiple > copies attribute > - cheap by getting the most efficient space > utilization possible (not raidz, not mirroring) > - scalability > > > I was hoping to start with 4 1TB disks, in a single > striped pool with only some filesystems > set to copies=2. > > I would be able to survive a single disk failure for > my data which was on the copies2 filesystem. > > (trusting that I had enough free space across > multiple disks that copies2 writes were not placed > on the same physical disk) > > I could grow this filesystem just by adding single > disks. > > Theoretically, at some point in time I would switch > to copies=3 to increase my chances of surviving > two disk failures. The block checksums would be a > useful in early detection of failed disks. > > > The major snag I discovered is that if a striped pool > loses a disk, I can still read and write from > the remaining data, but I cannot reboot and remount a > partial piece of the stripe, even with -f. > > For example, if I lost some of my "single copies" > data, I''d like to still access the good data, pop in > a > new (potentially larger) disk, re "cp" the important > data to have multiple copies rebuilt, and not have > to rebuild the entire pool structure. > > > So the feature request would be for zfs to allow > selective disk removal from striped pools, with the > resultant data loss, but any data that survived, > either by chance (living on the remaining disks) or > policy (multiple copies) would still be accessible. > > Is there some underlying reason in zfs that precludes > this functionality? > > If the filesystem partially-survives while the > striped pool member disk fails and the box is still > up, why not after a reboot?You may never get a good answer to this, so I''ll give it to you straight up. ZFS doesn''t do this because no business using Sun products wants to do this. Thus nobody at Sun ever made ZFS do this. Maybe you can convince someone at Sun to care about this feature, but I doubt it because it is a pretty fringe use case. In the end you can probably work around this problem, though. Striping doesn''t improve performance that much and it doesn''t provide that much more space. Next year we''ll be using 2TB hard drives, and when you can make a 6TB RAIDZ array with 4 hard drives one year and a 7.5TB one the year after, and put them both in the same pool so it looks like 13.5TB coming from 8 drives that can tolerate 1/4 + 1/4 drives failing, that isn''t too shabby. -- This message posted from opensolaris.org
Darren J Moffat
2008-Nov-24 10:27 UTC
[zfs-discuss] So close to better, faster, cheaper....
Kam wrote:> Posted for my friend Marko: > > I''ve been reading up on ZFS with the idea to build a home NAS. > > My ideal home NAS would have: > > - high performance via striping > - fault tolerance with selective use of multiple copies attribute > - cheap by getting the most efficient space utilization possible (not raidz, not mirroring) > - scalabilitySo you want it all but don''t want to pay for it ? Why not raidz ? Is it because you can''t boot from it ? Do you really have that much data that you need to maximise space yet don''t have space to add more disks ? -- Darren J Moffat
At this point, this IS an academic exercize. I''ve tried to outline the motivations/justifications for wanting this particular functionality. I believe my architectural "why not?" and "is it possible?" question is sufficiently valid. Its not about disk cost. Its about being able to grow the pool easily, and without having to replace all the drives within the pool at the same time. Its about having the flexibility of not having to pre-determine what amount of data I want protected against failure at the time I build the pool. This theoretical NAS doesn''t exist yet. But I don''t forsee being able to build it out of a machine with more than 8 SATA bays. At this point, its likely to be a cheap dell server with an external 4 bay enclosure. -Marko -- This message posted from opensolaris.org
On Mon, Nov 24, 2008 at 11:41 AM, marko b <largelybored at gmail.com> wrote:> At this point, this IS an academic exercize. I''ve tried to outline the > motivations/justifications for wanting this particular functionality. > > I believe my architectural "why not?" and "is it possible?" question is > sufficiently valid. > > Its not about disk cost. Its about being able to grow the pool easily, and > without having to replace all the drives within the pool at the same time. > > Its about having the flexibility of not having to pre-determine what amount > of data I want protected against failure at the time I build the pool. > > This theoretical NAS doesn''t exist yet. But I don''t forsee being able to > build it out of a machine with more than 8 SATA bays. At this point, its > likely to be a cheap dell server with an external 4 bay enclosure. > > -Marko > -- > > <http://mail.opensolaris.org/mailman/listinfo/zfs-discuss>It''s about what you want in a home device, not what Sun''s target enterprise market uses. I suggest looking into windows home server, it meets your requirements today. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081124/88b2b285/attachment.html>
Darren J Moffat
2008-Nov-24 17:55 UTC
[zfs-discuss] So close to better, faster, cheaper....
marko b wrote:> At this point, this IS an academic exercize. I''ve tried to outline the motivations/justifications for wanting this particular functionality. > > I believe my architectural "why not?" and "is it possible?" question is sufficiently valid. > > Its not about disk cost. Its about being able to grow the pool easily, and without having to replace all the drives within the pool at the same time.Then the best answer is mirroring. To grow the size of the pool will require replacing all sides of the mirror (I say all not both because you can have n disk mirrors). This is done one disk at a time but the additional space doesn''t appear until all sides have been replaced. For example: Start out with 2 disks in the pool in a mirror: time action config size t0 create d1 mirror d2 min(d1,d2) t1 add d1 mirror d2 min(d1,d2) + min(d3,d4) d3 mirror d4 t2 replace d2 d1 mirror d2'' min(d1,d2) + min(d3,d4) d3 mirror d4 t3 replace d1 d1'' mirror d2'' min(d1,d2) + min(d3,d4) d3 mirror d4 t4 reimport d1'' mirror d2'' min(d1''+d2'') + min(d3,d4) d3 mirror d4 Note that at this time you can only boot from simple (ie non striped) mirrors. -- Darren J Moffat
Are there any performance penalties incurred by mixing vdevs? Say you start with a raidz1 with three 500gb disks. Then over time you add a mirror with 2 1TB disks. -- This message posted from opensolaris.org
Darren, Perhaps I misspoke when I said that it wasn''t about cost. It is _partially_ about cost. Monetary cost of drives isn''t a major concern. At about $110-150 each. Loss of efficiency (mirroring 50%), zraid1 (25%), is a concern. Expense of sata bays, either in a single chassis or an external chassis is a concern. Expense of replacing entire pool sets of disks to grow is a concern. At this point zraid seems to be my only choice if I want 3TB of actual space. I may gain some benefit by using compression. Windows Home Server doesn''t help me because I want xvm, iscsi and nfs. But the ''features'' of WHS do approximate this fantasy setup I outlined. Mirroring: Let me see if I''m understanding your suggestion. A stripe of mirrored pairs. I can grow by resizing an existing mirrored pair, or just attaching another mirrored pair to the stripe? This gives me growth capability at the cost of 50% disk capacity and requiring a large amount of sata ports/bays. I''m still holding out for an answer to my original question. :) -- This message posted from opensolaris.org
On Mon, Nov 24, 2008 at 4:04 PM, marko b <largelybored at gmail.com> wrote:> Darren, > > Perhaps I misspoke when I said that it wasn''t about cost. It is _partially_ > about cost. > > Monetary cost of drives isn''t a major concern. At about $110-150 each. > Loss of efficiency (mirroring 50%), zraid1 (25%), is a concern. > Expense of sata bays, either in a single chassis or an external chassis is > a concern. > Expense of replacing entire pool sets of disks to grow is a concern. > > At this point zraid seems to be my only choice if I want 3TB of actual > space. I may gain some benefit by using compression. >Windows Home Server doesn''t help me because I want xvm, iscsi and nfs. But> the ''features'' of WHS do approximate this fantasy setup I outlined. >Windows home server will eventually have hyper-v, and currently has virtualbox or vmware workstation/server. iSCSI? There''s an iscsi initiator built in. If you''re talking iscsi targets, let me tell you right now you''re fricking nuts if you think you''re going to run iSCSI target backed by a non-redundant SATA disk. If you insist, that''s already been created as well: http://www.rocketdivision.com/wind.html NFS can be had with services for unix, or hummingbird software if you have something against free nfs. It''s included in vista.> > Mirroring: > > Let me see if I''m understanding your suggestion. A stripe of mirrored > pairs. I can grow by resizing an existing mirrored pair, or just attaching > another mirrored pair to the stripe? > > This gives me growth capability at the cost of 50% disk capacity and > requiring a large amount of sata ports/bays. > > I''m still holding out for an answer to my original question. :) >Your original question has been answered: Sun''s market has absolutely no use for those features. They aren''t building zfs features for home users that make them no money. They''re a business, their goal is to make money. You have the source, feel free to code it if you want it ;) Just don''t expect to ever see it from Sun. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20081124/88d473dc/attachment.html>
Darren J Moffat
2008-Nov-25 11:52 UTC
[zfs-discuss] So close to better, faster, cheaper....
marko b wrote:> Let me see if I''m understanding your suggestion. A stripe of mirrored pairs. > I can grow by resizing an existing mirrored pair, or just attaching > another mirrored pair to the stripe?Both adding an additional mirrored pair to the stripe and by replacing the sides of the mirror of an existing one with larger disks. -- Darren J Moffat