Hey folks, I guess this is an odd question to be asking here, but I could do with some feedback from anybody who''s actually using ZFS in anger. I''m about to go live with ZFS in our company on a new fileserver, but I have some real concerns about whether I can really trust ZFS to keep my data alive if things go wrong. This is a big step for us, we''re a 100% windows company and I''m really going out on a limb by pushing Solaris. The problems with zpool status hanging concern me, knowing that I can''t hot plug drives is an issue, and the long resilver times bug is also a potential problem. I suspect I can work around the hot plug drive bug with a big warning label on the server, but knowing the pool can hang so easily makes me worry about how well ZFS will handle other faults. On my drive home tonight I was wondering whether I''m going to have to swallow my pride and order a hardware raid controller for this server, letting that deal with the drive issues, and just using ZFS as a very basic filesystem. What has me re-considering ZFS though is that on the other hand I know the Thumpers have sold well for Sun, and they pretty much have to use ZFS. So there''s a big installed base out there using it, and that base has been using it for a few years. I know from the Thumper manual that you have to unconfigure drives before removal on them on those servers, which goes a long way towards making me think that should be a relatively safe way to work. The question is whether I can make a server I can be confident in. I''m now planning a very basic OpenSolaris server just using ZFS as a NFS server, is there anybody out there who can re-assure me that such a server can work well and handle real life drive failures? thanks, Ross This message posted from opensolaris.org
Ross wrote:> Hey folks, > > I guess this is an odd question to be asking here, but I could do with some feedback from anybody who''s actually using ZFS in anger. > > I''m about to go live with ZFS in our company on a new fileserver, but I have some real concerns about whether I can really trust ZFS to keep my data alive if things go wrong. This is a big step for us, we''re a 100% windows company and I''m really going out on a limb by pushing Solaris. > > The problems with zpool status hanging concern me, knowing that I can''t hot plug drives is an issue, and the long resilver times bug is also a potential problem. I suspect I can work around the hot plug drive bug with a big warning label on the server, but knowing the pool can hang so easily makes me worry about how well ZFS will handle other faults. > > On my drive home tonight I was wondering whether I''m going to have to swallow my pride and order a hardware raid controller for this server, letting that deal with the drive issues, and just using ZFS as a very basic filesystem. > > What has me re-considering ZFS though is that on the other hand I know the Thumpers have sold well for Sun, and they pretty much have to use ZFS. So there''s a big installed base out there using it, and that base has been using it for a few years. I know from the Thumper manual that you have to unconfigure drives before removal on them on those servers, which goes a long way towards making me think that should be a relatively safe way to work. > > The question is whether I can make a server I can be confident in. I''m now planning a very basic OpenSolaris server just using ZFS as a NFS server, is there anybody out there who can re-assure me that such a server can work well and handle real life drive failures? > > thanks, > > Ross > > > This message posted from opensolaris.org > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussHi What kind of hardware etc is the fileserver going to be running, and what zpool layout is being planned. As for thumpers, once 138053-02 ( marvell88sx driver patch ) releases within the next two weeks ( assuming no issues found ), then the thumper platform running s10 updates will be up to date in terms of marvel88sx driver fixes, which fixes some pretty important issues for thumper. Strongly suggest applying this patch to thumpers going forward. u6 will have the fixes by default. Enda
On Thu, Jul 31, 2008 at 16:25, Ross <myxiplx at hotmail.com> wrote:> The problems with zpool status hanging concern me, knowing that I can''t hot plug drives is an issue, and the long resilver times bug is also a potential problem. I suspect I can work around the hot plug drive bug with a big warning label on the server, but knowing the pool can hang so easily makes me worry about how well ZFS will handle other faults.Other hardware-failure type things can cause what appear to be big problems, too. We have a scsi->sata enclosure here with some embedded firmware, and it''s connected to a scsi controller on an x4150. I swapped some disks in the enclosure and updated the controller configuration, then rebooted the controller... and the host box died, because ZFS decided that too many disks were unavailable to continue, so it panicked the box. At first I thought this behavior was terrible---my server is down!---but on some reflection, it makes sense: It''s better to quit before anything else on the filesystem is corrupted rather than write garbage across a whole pool because of controller failure or something to that effect. In any case, I thought you''d be interested in this property of zpools. It''s not likely to happen in general (especially with DAS and a dumb controller, like you have), and it''s better than the alternative of potentially scribbling on a pool, but other services running on the same box could suffer if you were incautious.> On my drive home tonight I was wondering whether I''m going to have to swallow my pride and order a hardware raid controller for this server, letting that deal with the drive issues, and just using ZFS as a very basic filesystem.Letting ZFS handle one layer of redundancy is always recommended, if you''re going to use it at all. Otherwise it can get into a situation where it finds checksum errors and can''t do anything about them.> The question is whether I can make a server I can be confident in. I''m now planning a very basic OpenSolaris server just using ZFS as a NFS server, is there anybody out there who can re-assure me that such a server can work well and handle real life drive failures?We haven''t had any "real life" drive failures at work, but at home I took some old flaky IDE drives and put them in a pentium 3 box running Nevada. Several of them were known to cause errors under Linux, so I mirrored them in approximately-the-same-size pairs and set up weekly scrubs. Two drives out of six failed entirely, and were nicely retired, before I gave up on the idea and bought new disks. I didn''t lose any data with this scheme, and ZFS told me every once in a while that it had recovered from a checksum error. Good drives are always recommended, of course, but I saw nothing but good behavior with old broken hardware while I was using it. Finally, at work we''re switching everything over to ZFS because it''s so convenient... but we keep tape backups nonetheless. I strongly recommend having up-to-date backups in any situation, but even more so with ZFS. It''s been very reliable for me personally and at work, but I''ve seen horror stories of corrupt pools from which all data is lost. I''d rather be sitting around the campfire quaking in my boots at story time than have a flashlight pointed at my face doing the telling, if you catch my drift. Will
On Thu, 2008-07-31 at 13:25 -0700, Ross wrote:> Hey folks, > > I guess this is an odd question to be asking here, but I could do with some > feedback from anybody who''s actually using ZFS in anger.ZFS in anger ? That''s an interesting way of putting it :-)> but I have some real concerns about whether I can really trust ZFS to > keep my data alive if things go wrong. This is a big step for us, > we''re a 100% windows company and I''m really going out on a limb by > pushing Solaris.I can appreciate how this could be considered a risk, especially if it is your idea. But let''s put this all in perspective and you''ll see why it isn''t even remotely a question. I have put all sorts of file servers into production with things like Online Disk Suite 1.0, NFS V1 - and slept like a baby. Now, for the non-historians on the list, the quality of Online Disk Suite 1.0 led directly to the creation of the volume management marketplace and Veritas in particular (hey - that''s a joke, OK ???? but only marginally).> The question is whether I can make a server I can be confident in. > I''m now planning a very basic OpenSolaris server just using ZFS as a > NFS server, is there anybody out there who can re-assure me that such > a server can work well and handle real life drive failures?There are two questions in there - can it be built and are you comfortable with it. Those are two different things. The simple answer to the first is yes. Although if this is mission critical (and things like NFS servers generally are - even if they are only serving up iTunes music libraries - ask my daughter). Enda''s point about the Marvell driver updates for Solaris 10 should be carefully considered. If it''s just an NFS server then the vast majority of OpenSolaris benefits won''t be applicable (newer GNOME, better packaging, better Linux interoperability, etc). Putting this one Solaris 10 with Live Upgrade and a service contract would make me sleep like a baby. Now, for the other question - if you are looking at this like an appliance then you might not be quite as happy. It does take a little care and feeding, but nearly every piece of technology more complicated than a toaster needs a little love every once in a while. I would much rather put a Solaris/ZFS file server into a Windows environment than a Windows file server into a Unix environment :-) Bob
> We haven''t had any "real life" drive failures at work, but at home I > took some old flaky IDE drives and put them in a pentium 3 box running > Nevada.Similar story here. Some IDE and SATA drive burps under Linux (and please don''t tell me how wonderful Reiser4 is - ''cause it''s banned in this house forever.... arrrrgh) and Windows. It ate my entire iTunes library. Yeah, lurve that silent data corruption feature.> Several of them were known to cause errors under Linux, so I > mirrored them in approximately-the-same-size pairs and set up weekly > scrubs. Two drives out of six failed entirely, and were nicely > retired, before I gave up on the idea and bought new disks.Pretty cool, eh ?> Finally, at work we''re switching everything over to ZFS because it''s > so convenient... but we keep tape backups nonetheless.A very good idea. Disasters will still occur. With enough storage, snapshots can eliminate the routine file by file restores but a complete meltdown is always a possibility. So backups aren''t optional, but I find myself doing very few restores any more. Bob
On Jul 31, 2008, at 2:56 PM, Bob Netherton wrote:> On Thu, 2008-07-31 at 13:25 -0700, Ross wrote: >> Hey folks, >> >> I guess this is an odd question to be asking here, but I could do >> with some >> feedback from anybody who''s actually using ZFS in anger. > > ZFS in anger ? That''s an interesting way of putting it :-) >If you watch Phil Liggett and/or Paul Sherwen commentating on a cycling event, you''re pretty much guaranteed to hear "turning the pedals in anger" at some point when a rider goes on the attack.
We have 50,000 users worth of mail spool on ZFS. So we''ve been trusting it for production usage for THE most critical & visible enterprise app. Works fine. Our stores are ZFS RAID-10 built of LUNS from pairs of 3510FC. Had an entire array go down once, the system kept going fine. Brought the array back online ran a scrub to be certain of data, came up clean. Running checksum integrity scrub while online, THAT is the killer app that makes me sleep better. This message posted from opensolaris.org
Enda O''Connor wrote:> > As for thumpers, once 138053-02 ( marvell88sx driver patch ) releases > within the next two weeks ( assuming no issues found ), then the thumper > platform running s10 updates will be up to date in terms of marvel88sx > driver fixes, which fixes some pretty important issues for thumper. > Strongly suggest applying this patch to thumpers going forward. > u6 will have the fixes by default. >I''m assuming the fixes listed in these patches are already committed in OpenSolaris (b94 or greater)? -- Dave
>>>>> "r" == Ross <myxiplx at hotmail.com> writes:r> This is a big step for us, we''re a 100% windows company and r> I''m really going out on a limb by pushing Solaris. I''m using it in anger. I''m angry at it, and can''t afford anything that''s better. Whatever I replaced ZFS with, I would make sure it had: * snapshots * weekly scrubbing * dual-parity. to make the rebuild succeed after a disk fails, in case the frequent scrubbing is not adequate. and also to deal with the infant-mortality problem and the relatively high 6% annual failure rate * checksums (block- or filesystem-level, either one is fine) * fix for the RAID5 write hole (either FreeBSD-style RAID3 which is analagous to the ZFS full-stripe-write approach, or battery-backed NVRAM) * built from only drives that have been burned in for 1 month ZFS can have all those things, except the weekly scrubbing. I''m sure the scrubbing works really well for some people like Vincent, but for me it takes much longer than scrubbing took with pre-ZFS RAID, and increases filesystem latency a lot more, too. this is probably partly my broken iSCSI setup, but I''m not sure. I''m having problems where the combined load of ''zpool scrub'' and some filesystem activity bogs down the Linux iSCSI targets so much that ZFS marks the whole pool faulted, so I have to use the pool ``gently'''' during scrub. :( RAID-on-a-card doesn''t usually have these bullet points, so I would use ZFS over RAID-on-a-card. There are too many horror stories about those damn cards, even the ``good'''' ones. Even if they worked well which in my opinion they do not, they make getting access to your pool dependent on getting replacement cards of the same vintage, and get the right drivers for this proprietary, obscure card for the (possibly just re-installed different version of) the OS, possibly cards with silently-different ``steppings'''' or ``firmware revisions'''' or some other such garbage. Also with raid-on-a-card there is no clear way to get a support contract that stands behind the whole system, in terms of the data''s availability, either. With Sun ZFS stuff there sort-of is, and definitely is with a traditional storage hardware vendor, so optimistically even if you are not covered by a contract yourself because you downloaded Solaris or bought a Filer on eBay, some other customer is, so the product (optimistically) won''t make some colossally stupid mistakes that some RAID-on-a-card companies make. I would stay well away from that card crap. many ZFS problems discussed here sound like the fixes are going into s10u6, so are not available on Solaris 10 yet, and are drastic enough to introduce some regressions. I don''t think ZFS in stable solaris will be up to my stability expectations until the end of the year---for now ``that''s fixed in weeks-old b94'''' probably doesn''t fit your application. maybe for a scrappy super-competitive high-roller shared hosting shop, but not for a plodding windows shop. and having fully-working drivers for X4500 right after its replacement is announced makes me thing maybe you should buy an X4500, not the replacement. :( ZFS has been included in stable Solaris for two full years already, and you''re still asking questions about it. The Solaris CIFS server I''ve never tried, but it is even newer, so I think you would be crazy to make yourself the black sheep pushing that within a conservative, hostile environment. If you have some experience with Samba in your environment maybe that''s ok to use in place of CIFS. If you want something more out-of-the-box than Samba, you could get a NetApp StoreVault. I''ve never had one myself, though, so maybe I''ll regret having this suggestion archived on the Interweb forever. I think unlike Samba the StoreVault can accomodate the Windows security model without kludgyness. To my view that''s not necessarily a good thing, but it IS probably what a Windows shop wants. The StoreVault has all those reliability bullet points above AIUI. It''s advertised as a crippled version of their real Filer''s software. It may annoy you by missing certain basic, dignified features, like it is web-managed only?!, maybe you have to pay more to ``unlock'''' the snapshot feature with some stupid registration code, but it should have most of the silent reliability/availability tricks that are in the higher-end Netapp''s. Something cheaper than NetApp like the Adaptec SNAP filer has snapshots, scrubbing, and I assume fix for RAID5 hole, and something like the support-contract-covering-your-data though obviously not anything to set beside NetApp. Also the Windows-security-model support is kludgy. I''m not sure SNAP has dual-parity or checksums. and I''ve found it slightly sketchy---it was locking up every week until I forced an XFS fsck, and there is no supported way to force an XFS fsck. Their integration work does seem to hide some of the Linux crappyness but not all. LVM2 seems to be relatively high-quality on the inside compared to current ZFS. r> The problems with zpool status hanging concern me, Yes. You might distinguish bugs that affect availability from bugs that can cause data loss. The ''zpool status'' not always working is half-way in between because it interferes with responding to failures. The disk-pulled problems, the slow-mirror-component-makes-whole-mirror-slow problems, and the problems of proper error handling being put off over two years with the excuse ``we''re integrating FMA'''' and then FMA once integrated isn''t behaving reasonably problems, are all in the availability category, so maybe they aren''t show-stoppers? For people using ZFS on top of an expensive storage solution, they may not care at all---if there is some weird chain of event leading to an availability problem, use the excuse ``you should have paid more and set up multipath''''---the availability demands on ZFS are lower with big FC arrays. However the reports of ``my pool is corrupt, help'''' / <silence> and ``the kernel {panics,runs out of memory and freezes} every time I do XXX''''---these scare the shit out of me, because it means you lose your data in this frustrating way as if it were encrypted by a data-for-ransom Internet worm: some day, maybe a year from now, the bug will be fixed and maybe you can get your data back. In the mean time, you''re SOL with thousands of dollars of (possibly leased) disk, while the data is just barely out of reach, perhaps sucking your time away with desperate futile maybe-this-will-work attempts. I have fairly high confidence I can recover most of the data off an abused UFS-over-SVM-mirror with dd and fsck, but I don''t have that confidence at all with supposedly ``always-consistent'''' ZFS. Besides several tiers of storage-layer and ZFS-layer redundancy, experience here suggests you also need rsync-level redundnacy---either to another ZFS pool, or to some other cheap backup filesystem, a backup filesystem that might be acceptable even with some of the problems in the bulleted list like not being dual-parity, not having snapshots, or having a RAID5 write hole (but it still needs to be scrubbed). If you get an integrated NAS like the StoreVault, the ZFS machine will probably be cheaper, so you could use it as the cheaper backup filesystem---rsync the storevault onto the ZFS filesystem every night. You can do this for a couple years so you will have a chance to notice if ZFS stability is improving, and maybe conduct more experiments in provoking it. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080731/29496aa2/attachment.bin>
Ross wrote:> Hey folks, > > I guess this is an odd question to be asking here, but I could do with some feedback from anybody who''s actually using ZFS in anger. >I''ve been using ZFS for nearly 3 years now. It has been my (mirrored :-) home directory for that time. I''ve never lost any of that data, though I do spend some time torturing ZFS and hardware. Inside Sun, we use ZFS home directories for a large number of developers and these servers are upgraded every build. As marketing would say, we eat our own dog food.> I''m about to go live with ZFS in our company on a new fileserver, but I have some real concerns about whether I can really trust ZFS to keep my data alive if things go wrong. This is a big step for us, we''re a 100% windows company and I''m really going out on a limb by pushing Solaris. >I''m not that familiar with running Windows file systems for large numbers of users, but my personal experience with them has been frought with data loss and, going back a few years, "ABORT, RETRY, GIVE UP"> The problems with zpool status hanging concern me, knowing that I can''t hot plug drives is an issue, and the long resilver times bug is also a potential problem. I suspect I can work around the hot plug drive bug with a big warning label on the server, but knowing the pool can hang so easily makes me worry about how well ZFS will handle other faults. >While you''ve demonstrated hot unplug problems with USB drives, that is a very different software path than the more traditional hot plug SAS/FC/UltraSCSI devices. USB devices are considered removable media and have a very different use case than what is normally considered for enterprise-class storage devices.> On my drive home tonight I was wondering whether I''m going to have to swallow my pride and order a hardware raid controller for this server, letting that deal with the drive issues, and just using ZFS as a very basic filesystem. >If you put all of your trust in the hardware RAID controller, then one day you may be disappointed. This is why we tend to recommend using some sort of data protection at the ZFS level, regardless of the hardware. If you look at this forum''s archive, you will see someone who has discovered a faulty RAID controller, switch, HBA, or some other device by using ZFS. With other file systems, it would be difficult to isolate the fault.> What has me re-considering ZFS though is that on the other hand I know the Thumpers have sold well for Sun, and they pretty much have to use ZFS. So there''s a big installed base out there using it, and that base has been using it for a few years. I know from the Thumper manual that you have to unconfigure drives before removal on them on those servers, which goes a long way towards making me think that should be a relatively safe way to work. >You can run Windows, RHEL, FreeBSD, and probably another dozen or two OSes on thumpers. We have customers who run many different OSes on our open and industry standard hardware.> The question is whether I can make a server I can be confident in. I''m now planning a very basic OpenSolaris server just using ZFS as a NFS server, is there anybody out there who can re-assure me that such a server can work well and handle real life drive failures? >Going back to your USB remove test, if you protect that disk at the ZFS level, such as a mirror, then when the disk is removed then it will be detected as removed and zfs status will show its state as "removed" and the pool as "degraded" but it will continue to function, as expected. Replacing the USB device will bring it back online, again as expected, and it should resilver automatically. To reiterate, it is best to let ZFS do the data protection regardless of the storage used. -- richard
>Going back to your USB remove test, if you protect that disk >at the ZFS level, such as a mirror, then when the disk is removed >then it will be detected as removed and zfs status will show its >state as "removed" and the pool as "degraded" but it will continue >to function, as expected. >-- richardExcept it doesn''t. The reason I''m doing these single disk tests is that pulling a single SATA drive out of my main pool (5 sets of 3 way mirrors) hangs the whole pool (or if I set failmode=continue, crashes solaris, even though it''s a data pool and holds nothing the OS needs at all). I also saw before with mirrored iSCSI drives that pulling the network cable on one hung the ZFS pool for 3 minutes. ZFS handles checksum errors great, but it doesn''t seem to cope with the loss of devices at all. This message posted from opensolaris.org
On Thu, Jul 31, 2008 at 11:03 PM, Ross <myxiplx at hotmail.com> wrote:> >Going back to your USB remove test, if you protect that disk > >at the ZFS level, such as a mirror, then when the disk is removed > >then it will be detected as removed and zfs status will show its > >state as "removed" and the pool as "degraded" but it will continue > >to function, as expected. > >-- richard > > Except it doesn''t. The reason I''m doing these single disk tests is that > pulling a single SATA drive out of my main pool (5 sets of 3 way mirrors) > hangs the whole pool (or if I set failmode=continue, crashes solaris, even > though it''s a data pool and holds nothing the OS needs at all). > > I also saw before with mirrored iSCSI drives that pulling the network cable > on one hung the ZFS pool for 3 minutes. ZFS handles checksum errors great, > but it doesn''t seem to cope with the loss of devices at all. > >This conversation piques my interest.. I have been reading a lot about Opensolaris/Solaris for the last few weeks. Have even spoken to Sun storage techs about bringing in Thumper/Thor for our storage needs. I have recently brought online a Dell server with a DAS (14 SCSI drives). This will be part of my tests now, physically removing a member of the pool before issuing the removal command for that particular drive. One other issue I have now also, how do you physically locate a failing/failed drive in ZFS? With hardware RAID sets, if the RAID controller itself detects the error, it will inititate a BLINK command to that drive, so the individual drive is now flashing red/amber/whatever on the RAID enclosure. How would this be possible with ZFS? Say you have a JBOD enclosure, (14, hell maybe 48 drives). Knowing c0d0xx failed is no longer helpful, if only ZFS catches an error. Will you be able to isolate the drive quickly, to replace it? Or will you be going "does the enclosure start at logical zero... left to right.. hrmmm" Thanks -- Brent Jones brent at servuhome.net -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080731/5a6c7578/attachment.html>
Hey Brent, On the Sun hardware like the Thumper you do get a nice bright blue "ready to remove" led as soon as you issue the "cfgadm -c unconfigure xxx" command. On other hardware it takes a little more care, I''m labelling our drive bays up *very* carefully to ensure we always remove the right drive. Stickers are your friend, mine will probably be labelled "sata1/0", "sata1/1", "sata1/2", etc. I know Sun are working to improve the LED support, but I don''t know whether that support will ever be extended to 3rd party hardware: http://blogs.sun.com/eschrock/entry/external_storage_enclosures_in_solaris I''d love to use Sun hardware for this, but while things like x2200 servers are great value for money, Sun don''t have anything even remotely competative to a standard 3U server with 16 SATA bays. The x4240 is probably closest, but is at least double the price. Even the J4200 arrays are more expensive than this entire server. Ross PS. Once you''ve tested SCSI removal, could you add your results to my thread, would love to hear how that went.http://www.opensolaris.org/jive/thread.jspa?threadID=67837&tstart=0> This conversation piques my interest.. I have been reading a lot about Opensolaris/Solaris for the last few weeks.> Have even spoken to Sun storage techs about bringing in Thumper/Thor for our storage needs.> I have recently brought online a Dell server with a DAS (14 SCSI drives). This will be part of my tests now, > physically removing a member of the pool before issuing the removal command for that particular drive. > One other issue I have now also, how do you physically locate a failing/failed drive in ZFS? >> With hardware RAID sets, if the RAID controller itself detects the error, it will inititate a BLINK command to that > drive, so the individual drive is now flashing red/amber/whatever on the RAID enclosure.> How would this be possible with ZFS? Say you have a JBOD enclosure, (14, hell maybe 48 drives).> Knowing c0d0xx failed is no longer helpful, if only ZFS catches an error. Will you be able to isolate the drive > quickly, to replace it? Or will you be going "does the enclosure start at logical zero... left to right.. hrmmm" > Thanks > -- > Brent Jones> brent at servuhome.net_________________________________________________________________ 100?s of Nikon cameras to be won with Live Search http://clk.atdmt.com/UKM/go/101719808/direct/01/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080801/1693f1fa/attachment.html>
Enda O''Connor ( Sun Micro Systems Ireland)
2008-Aug-01 09:08 UTC
[zfs-discuss] Can I trust ZFS?
Dave wrote:> > > Enda O''Connor wrote: >> >> As for thumpers, once 138053-02 ( marvell88sx driver patch ) releases >> within the next two weeks ( assuming no issues found ), then the >> thumper platform running s10 updates will be up to date in terms of >> marvel88sx driver fixes, which fixes some pretty important issues for >> thumper. >> Strongly suggest applying this patch to thumpers going forward. >> u6 will have the fixes by default. >> > > I''m assuming the fixes listed in these patches are already committed in > OpenSolaris (b94 or greater)? > > -- > Daveyep. I know this is opensolaris list, but a lot of folk asking questions do seem to be running various update releases. Enda
Hello Ross, I know personally many environments using ZFS in a production for quite some time. Quite often in business critical environments. Some of them are small, some of them are rather large (hundreds of TBs), some of them are clustered. Different usages like file servers, MySQL on ZFS, Oracle on ZFS, mail on ZFS, virtualization on ZFS, ... So far I haven''t seen loosing any data - I hit some issues from time to time but nothing which can''t be work-arounded. That being said ZFS is still relatively young technology so if your top priority regardless of anything else is stability and confidence I would go with UFS or VxFS/VxVM which are in the market for many many years proven in a lot of technologies. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
I have done a bit of testing, and so far so good really. I have a Dell 1800 with a Perc4e and a 14 drive Dell Powervault 220S. I have a RaidZ2 volume named ''tank'' that spans 6 drives. I have made 1 drive available as a spare to ZFS. Normal array: # zpool status pool: tank state: ONLINE scrub: scrub completed with 0 errors on Fri Aug 1 19:37:33 2008 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 spares c0t13d0 AVAIL errors: No known data errors One drive removed: # zpool status pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Fri Aug 1 20:30:39 2008 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 spare DEGRADED 0 0 0 c0t3d0 UNAVAIL 0 0 0 cannot open c0t13d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 spares c0t13d0 INUSE currently in use errors: No known data errors Now lets remove the hot spare ;) # zpool status pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using ''zpool online''. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: resilver completed with 0 errors on Fri Aug 1 20:30:39 2008 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2 DEGRADED 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 spare UNAVAIL 0 656 0 insufficient replicas c0t3d0 UNAVAIL 0 0 0 cannot open c0t13d0 UNAVAIL 0 0 0 cannot open c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 spares c0t13d0 INUSE currently in use errors: No known data errors Now, this Perc4e doesn''t support JBOD, so each drive is a standalone Raid0 (how annoying). With that, I cannot plug the drives back in with the system running, controller will keep them offline until I enter the bios. But in my scenario, this does demonstrate ZFS tolerates hot removal of drives, without issuing a graceful removal of the device. I was copying MP3s to the volume the whole time, and the copy continued uninterrupted, without error. I verified all data was written as well. All data should be online when I reboot and put the pool back in normal state. I am very happy with the test. I don''t know many hardware controllers that''ll loose 3 drives out of an array of 6 (with spare), and still function normally (even if the controller supports Raid6, I''ve seen major issues where writes were not committed). I''ll add my results to your forum thread as well. Regards Brent Jones brent at servuhome.net On Thu, Jul 31, 2008 at 11:56 PM, Ross Smith <myxiplx at hotmail.com> wrote:> Hey Brent, > > On the Sun hardware like the Thumper you do get a nice bright blue "ready > to remove" led as soon as you issue the "cfgadm -c unconfigure xxx" > command. On other hardware it takes a little more care, I''m labelling our > drive bays up *very* carefully to ensure we always remove the right drive. > Stickers are your friend, mine will probably be labelled "sata1/0", > "sata1/1", "sata1/2", etc. > > I know Sun are working to improve the LED support, but I don''t know whether > that support will ever be extended to 3rd party hardware: > http://blogs.sun.com/eschrock/entry/external_storage_enclosures_in_solaris > > I''d love to use Sun hardware for this, but while things like x2200 servers > are great value for money, Sun don''t have anything even remotely competative > to a standard 3U server with 16 SATA bays. The x4240 is probably closest, > but is at least double the price. Even the J4200 arrays are more expensive > than this entire server. > > Ross > > PS. Once you''ve tested SCSI removal, could you add your results to my > thread, would love to hear how that went. > http://www.opensolaris.org/jive/thread.jspa?threadID=67837&tstart=0 > > > This conversation piques my interest.. I have been reading a lot about > Opensolaris/Solaris for the last few weeks. > > Have even spoken to Sun storage techs about bringing in Thumper/Thor for > our storage needs. > > I have recently brought online a Dell server with a DAS (14 SCSI drives). > This will be part of my tests now, > > physically removing a member of the pool before issuing the removal > command for that particular drive. > > One other issue I have now also, how do you physically locate a > failing/failed drive in ZFS? > > > > With hardware RAID sets, if the RAID controller itself detects the error, > it will inititate a BLINK command to that > > drive, so the individual drive is now flashing red/amber/whatever on the > RAID enclosure. > > How would this be possible with ZFS? Say you have a JBOD enclosure, (14, > hell maybe 48 drives). > > Knowing c0d0xx failed is no longer helpful, if only ZFS catches an error. > Will you be able to isolate the drive > > quickly, to replace it? Or will you be going "does the enclosure start at > logical zero... left to right.. hrmmm" > > Thanks > > -- > > Brent Jones > > brent at servuhome.net > > > ------------------------------ > Get Hotmail on your Mobile! Try it Now!<http://clk.atdmt.com/UKM/go/101719965/direct/01/> >-- Brent Jones brent at servuhome.net -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080801/6008fcf2/attachment.html>
According to the hard disk drive guide at http://www.storagereview.com/guide2000/ref/hdd/index.html, a wopping 36% of data loss is due to human error. 49% of data loss was due to hardware or system malfunction. With proper pool design, zfs addresses most of the 49% of data loss due to hardware malfunction. You can do as much MTTDL analysis as you want based on drive reliability and read failure rates, but it still only addresses that 49% of data loss. Zfs makes human error really easy. For example $ zpool destroy mypool $ zfs destroy mypool/mydata The commands are almost instantaneous and are much faster than the classic: $ rm -rf /mydata or % newfs /dev/rdsk/c0t0d0s6 < /dev/null Most problems we hear about on this list are due to one of these issues: * Human error * Beta level OS software * System memory error (particularly non-ECC memory) * Wrong pool design Zfs is a tool which can lead to exceptional reliability. Some forms of human error can be limited by facilities such as snapshots. System administrator human error is still a major factor. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Sun, 2008-08-03 at 11:42 -0500, Bob Friesenhahn wrote:> Zfs makes human error really easy. For example > > $ zpool destroy mypoolNote that "zpool destroy" can be undone by "zpool import -D" (if you get to it before the disks are overwritten).