Casper.Dik at Sun.COM wrote:> > I would suggest that you follow my recipe: not check the boot-archive > during a reboot. And then report back. (I''m assuming that that will take > several weeks) >We are back at square one; or, at the subject line. I did a zpool status -v, everything was hunky dory. Next, a power failure, 2 hours later, and this is what zpool status -v thinks: zpool status -v pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c1d0s0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: //etc/svc/repository-boot-20090419_174236 I know, the hord-core defenders of ZFS will repeat for the umpteenth time that I should be grateful that ZFS can NOTICE and inform about the problem. Others might want to repeat that this is not supposed to happen in the first place. Reliability at power failure? That was my question, and I had to learn that the answer is ''no''. How about my proposal to always have a proper snapshot available? And after some 4 days without any CKSUM error, how can yanking the power cord mess boot-stuff? Uwe
Casper.Dik at Sun.COM
2009-Apr-19 12:16 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
>We are back at square one; or, at the subject line. >I did a zpool status -v, everything was hunky dory. >Next, a power failure, 2 hours later, and this is what zpool status -v >thinks: > >zpool status -v > pool: rpool > state: ONLINE >status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. >action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested >config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > c1d0s0 ONLINE 0 0 0 > >errors: Permanent errors have been detected in the following files: > > //etc/svc/repository-boot-20090419_174236 > >I know, the hord-core defenders of ZFS will repeat for the umpteenth >time that I should be grateful that ZFS can NOTICE and inform about the >problem.:-) The file is created on boot and I assume this was created directly after the boot after the power-failure. Am I correct in thinking that: the last boot happened on 2009/04/19_17:42:36 the system hasn''t reboot since that time>Others might want to repeat that this is not supposed to happen in the >first place.ZFS guarantees that does cannot happen, unless the hardware is bad. Bad means here "the hardware doesn''t promise what ZFS believes the hardware promises". But anything can cause this: hardware problems: - bad memory - bad disk - bad disk controller - bad power supply software problem - memory corruption through any odd driver - any part of the zfs stack My memory would still be a hardware problem. I remember a particular case where ZFS continuously found checksums; replacing the power supply fixed that. Casper
Casper.Dik at Sun.COM wrote:>> We are back at square one; or, at the subject line. >> I did a zpool status -v, everything was hunky dory. >> Next, a power failure, 2 hours later, and this is what zpool status -v >> thinks: >> >> zpool status -v >> pool: rpool >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> rpool ONLINE 0 0 0 >> c1d0s0 ONLINE 0 0 0 >> >> errors: Permanent errors have been detected in the following files: >> >> //etc/svc/repository-boot-20090419_174236 >> >> I know, the hord-core defenders of ZFS will repeat for the umpteenth >> time that I should be grateful that ZFS can NOTICE and inform about the >> problem. >> > > :-) > > The file is created on boot and I assume this was created directly after > the boot after the power-failure. > > Am I correct in thinking that: > the last boot happened on 2009/04/19_17:42:36 > the system hasn''t reboot since that time >Good guess, but wrong. Another two to go ... :)> >> Others might want to repeat that this is not supposed to happen in the >> first place. >> > > ZFS guarantees that does cannot happen, unless the hardware is bad. Bad > means here "the hardware doesn''t promise what ZFS believes the hardware > promises". > > But anything can cause this: > > hardware problems: > - bad memory > - bad disk > - bad disk controller > - bad power supply > > software problem > - memory corruption through any odd driver > - any part of the zfs stack > > My memory would still be a hardware problem. I remember a particular case > where ZFS continuously found checksums; replacing the power supply fixed > that. >Chances are. That Ubuntu as double boot here never finds anything wrong, crashes, etc. And again, someone will inform me that this is the beauty of ZFS: That I know of the corruption. After a scrub, what I see is: zpool status -v pool: rpool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after 0h48m with 1 errors on Sun Apr 19 19:09:26 2009 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 1 c1d0s0 ONLINE 0 0 2 errors: Permanent errors have been detected in the following files: <0xa6>:<0x4f002> Which file to replace? Serious, what would a normal user expected to do here? No, I don''t have a backup of a file that has recently been created, true, at 17:42 on April 19th. Reinstall? While everything was okay 12 hours ago, after some 30 crashes due to power-failures, that were - until recently - rectified with crashes at boot, Failsafe, reboot. A system that has been going up and down without much hassle for 1.5 years, both on OpenSolaris on UFS and Ubuntu? (Let''s not forget the thread started with my question "Why do I have to Failsafe so frequently after a power failure, to correct a corrupted bootarchive?") Uwe
dick hoogendijk
2009-Apr-19 14:58 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
On Sun, 19 Apr 2009 18:15:31 +0800 Uwe Dippel <udippel at gmail.com> wrote:> Reliability at power failure? That was my question, and I had to > learn that the answer is ''no''.Sorry Uwe, but the answer is yes. Assuming that your hardware is in order. I''ve read quite some msgs from you here recently and all of them make me think you''re no fan of zfs at all. Why don''t you quit using it and focus a little more on installing SunStudio (which isn''t that hard to do; at least not so hard as you want us to believe it is in another thread). All I ever had to do was start the installer (in a GUI) and -all- software was placed where it was supposed to.> And after some 4 days without any CKSUM error, how can yanking the > power cord mess boot-stuff?Maybe because on the fifth day some hardware failure occurred? ;-) -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | nevada / opensolaris + All that''s really worth doing is what we do for others (Lewis Carrol)
On 19-Apr-09, at 10:38 AM, Uwe Dippel wrote:> Casper.Dik at Sun.COM wrote: >>> We are back at square one; or, at the subject line. >>> I did a zpool status -v, everything was hunky dory. >>> Next, a power failure, 2 hours later, and this is what zpool >>> status -v thinks: >>> >>> zpool status -v >>> pool: rpool >>> state: ONLINE >>> status: One or more devices has experienced an error resulting in >>> data >>> corruption. Applications may be affected. >>> action: Restore the file in question if possible. Otherwise >>> restore the >>> entire pool from backup. >>> see: http://www.sun.com/msg/ZFS-8000-8A >>> scrub: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> rpool ONLINE 0 0 0 >>> c1d0s0 ONLINE 0 0 0 >>> >>> errors: Permanent errors have been detected in the following files: >>> >>> //etc/svc/repository-boot-20090419_174236 >>> >>> I know, the hord-core defenders of ZFS will repeat for the >>> umpteenth time that I should be grateful that ZFS can NOTICE and >>> inform about the problem. >>> >> >> :-) >> >> The file is created on boot and I assume this was created directly >> after the boot after the power-failure. >> >> Am I correct in thinking that: >> the last boot happened on 2009/04/19_17:42:36 >> the system hasn''t reboot since that time >> > > Good guess, but wrong. Another two to go ... :) >> >>> Others might want to repeat that this is not supposed to happen >>> in the first place. >>> >> >> ZFS guarantees that does cannot happen, unless the hardware is >> bad. Bad means here "the hardware doesn''t promise what ZFS >> believes the hardware promises". >> >> But anything can cause this: >> >> hardware problems: >> - bad memory >> - bad disk >> - bad disk controller >> - bad power supply >> >> software problem >> - memory corruption through any odd driver >> - any part of the zfs stack >> >> My memory would still be a hardware problem. I remember a >> particular case where ZFS continuously found checksums; replacing >> the power supply fixed that. >> > > Chances are. That Ubuntu as double boot here never finds anything > wrong, crashes, etc.Why should it? It isn''t designed to do so.> And again, someone will inform me that this is the beauty of ZFS: > That I know of the corruption. > > After a scrub, what I see is: > > zpool status -v > pool: rpool > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise > restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub completed after 0h48m with 1 errors on Sun Apr 19 > 19:09:26 2009 > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 1 > c1d0s0 ONLINE 0 0 2 > > errors: Permanent errors have been detected in the following files: > > <0xa6>:<0x4f002> > > Which file to replace?Have you thoroughly checked your hardware? Why are you running a non-redundant pool? --Toby> > Serious, what would a normal user expected to do here? No, I don''t > have a backup of a file that has recently been created, true, at > 17:42 on April 19th. > Reinstall? While everything was okay 12 hours ago, after some 30 > crashes due to power-failures, that were - until recently - > rectified with crashes at boot, Failsafe, reboot. > A system that has been going up and down without much hassle for > 1.5 years, both on OpenSolaris on UFS and Ubuntu? > > (Let''s not forget the thread started with my question "Why do I > have to Failsafe so frequently after a power failure, to correct a > corrupted bootarchive?") > > Uwe > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
dick hoogendijk wrote:> Why don''t you quit using it > and focus a little more on installing SunStudio (which isn''t that hard > to do; at least not so hard as you want us to believe it is in another > thread). All I ever had to do was start the installer (in a GUI) and > -all- software was placed where it was supposed to. >Lucky you. So you doubt that I ran the ./installer, the GUI came up, and in the end netbeans wasn''t there? Why should I make that up?? It took me until here http://docs.sun.com/app/docs/doc/820-2972/gabcd?a=view to find the solution, even though the title didn''t fit.> Maybe because on the fifth day some hardware failure occurred? ;-) >That would be which? The system works and is up and running beautifully. OpenSolaris, as of now. Ah, you''re hinting at a rare hardware glitch as underlying problem? AFAIU, it is a proclaimed feature of ZFS that writes are atomic, out and over. Uwe, who is a big fan of a ZFS that fulfills all of its promises. Snapshots and luupgrade have yet to fail me on it. And a few other beautiful things. It is the reliability that makes me wonder if UFS/FFS/ext3 are not better choices in this respect. Blaming standard, off-the-shelf hardware as ''too cheap'' is a too slippery slope, btw.
Toby Thain wrote:> >> >> Chances are. That Ubuntu as double boot here never finds anything >> wrong, crashes, etc. > > Why should it? It isn''t designed to do so.I knew this would inevitably creep up. :)> > Why are you running a non-redundant pool?Because. 90+% of the normal desktop users will run a non-redundant pool, and expect their filesystems to not add operational failures, but come back after a yanked power cord without fail. Uwe
Dennis Clarke
2009-Apr-19 15:55 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
>> And after some 4 days without any CKSUM error, how can yanking the >> power cord mess boot-stuff? > > Maybe because on the fifth day some hardware failure occurred? ;-)ha ha ! sorry .. that was pretty funny. -- Dennis
Bob Friesenhahn
2009-Apr-19 16:24 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
On Sun, 19 Apr 2009, Uwe Dippel wrote:>> >> Why are you running a non-redundant pool? > > Because. > 90+% of the normal desktop users will run a non-redundant pool, and expect > their filesystems to not add operational failures, but come back after a > yanked power cord without fail.OpenSolaris desktop users are surely less than 0.5% of the desktop population. Are the 90+% of the normal desktop users you are talking about the Microsoft Windows users, which is indeed something like 90%? If you really want to be part of the majority, perhaps you installed the wrong operating system. If you want to be included in the 0.5% of the desktop population who are smart enough to run OpenSolaris, maybe you should add a mirror drive. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
dick hoogendijk
2009-Apr-19 16:38 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
On Sun, 19 Apr 2009 11:24:26 -0500 (CDT) Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:> If you want to be included in the 0.5% of the desktop population who > are smart enough to run OpenSolaris, maybe you should add a mirror > drive.You took the words right out of my mouth. I often see/read messages from people who seem to thing (open)solaris is some kind of windows or even linux. The latter is famous to run on cheap and often even very old hardware. That''s OK, because linux is not only a modern system, it''s also a geek system. And windows runs on almost everything, BSOD''s included. Solaris/OpenSolaris is not a system for all. It has hardware demands; that is, if you want to run it safe. I know people who run ZFS on a 32bit system and that often goes well. Until the system comes under heavy load and strange errors appear. Although mirroring existed in hardware and software "solutions" were present in the OS''ses I ran before Solaris it''s only since I run my systems on ZFS (S10/nevada/OpenSolaris) that all my drives are mirrored. Prices are cheap (and I mean CHEAP). If you still run ZFS on a single drive (or worse: on a part of it) you don''t follow the "rules". That''s not "professional" and even for home users it''s not wise. -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | nevada / opensolaris + All that''s really worth doing is what we do for others (Lewis Carrol)
Bob Friesenhahn wrote:> > OpenSolaris desktop users are surely less than 0.5% of the desktop > population. Are the 90+% of the normal desktop users you are talking > about the Microsoft Windows users, which is indeed something like 90%? > If you really want to be part of the majority, perhaps you installed > the wrong operating system. If you want to be included in the 0.5% of > the desktop population who are smart enough to run OpenSolaris, maybe > you should add a mirror drive.Thanks for the advice, Bob! Though I don''t insist to belong to that less than 0.5% of the population who are smart enough to run OpenSolaris and add a mirror drive, I''d still like to run OpenSolaris, and without mirror drive. Where does that put me? Uwe
dick hoogendijk
2009-Apr-19 16:52 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
On Mon, 20 Apr 2009 00:41:49 +0800 Uwe Dippel <udippel at gmail.com> wrote:> I''d still like to run OpenSolaris, and without mirror drive. > Where does that put me?Somewhere I wouldn''t want to be. NOT if I run production servers, that is. Systems to play with are OK of course. You need redundancy and you don''t get that on a single drive. A sound use of ZFS needs it. Otherwise the system is "crippled" before you start using it. The only place I run ZFS on singles drives is in a xVM ;-) -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | nevada / opensolaris + All that''s really worth doing is what we do for others (Lewis Carrol)
Bob Friesenhahn
2009-Apr-19 17:14 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
On Sun, 19 Apr 2009, Eric D. Mudama wrote:> > Additionally, over the last few months I''m pretty sure I''ve seen this > same discussion and report of corruption when the person *did* have > mirrored boot and had an unsafe power fail. I''ll have to dig to find > it though.You are right that there have been reports of boot archive corruption, but this corruption is at a higher level than zfs. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Eric D. Mudama
2009-Apr-19 17:14 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
On Sun, Apr 19 at 18:38, dick hoogendijk wrote:>On Sun, 19 Apr 2009 11:24:26 -0500 (CDT) >Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote: > >> If you want to be included in the 0.5% of the desktop population who >> are smart enough to run OpenSolaris, maybe you should add a mirror >> drive. > >You took the words right out of my mouth. >I often see/read messages from people who seem to thing (open)solaris >is some kind of windows or even linux. The latter is famous to run on >cheap and often even very old hardware. That''s OK, because linux is not >only a modern system, it''s also a geek system. And windows runs on >almost everything, BSOD''s included.Just to play devil''s advocate, those new Sun blades have a single flash DIMM per processing node as storage. Additionally, over the last few months I''m pretty sure I''ve seen this same discussion and report of corruption when the person *did* have mirrored boot and had an unsafe power fail. I''ll have to dig to find it though. --eric -- Eric D. Mudama edmudama at mail.bounceswoosh.org
Oscar del Rio
2009-Apr-19 17:17 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
Uwe Dippel wrote:> Next, a power failure, 2 hours later, and this is what zpool status -v > thinks:> Reliability at power failure? That was my question, and I had to learnYour question should be about HARDWARE reliability after power failure. Some (cheap) hardware are very unreliable, either the HDD or the PSU or both. Many systems (Linux, Windows, whatever) silently become corrupted until the day they no longer boot, and a HDD surface scan usually finds several bad sectors. Just this week I had to low-level reformat a box - the partition table became unreadable/unwritable after a dirty shutdown. (A desktop machine, not a server, and the HDD showed only ONE bad sector, so replacing the HDD was not justifiable in this case)
Mario Goebbels
2009-Apr-19 17:38 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
>> Because. >> 90+% of the normal desktop users will run a non-redundant pool, and >> expect their filesystems to not add operational failures, but come >> back after a yanked power cord without fail. > > OpenSolaris desktop users are surely less than 0.5% of the desktop > population. Are the 90+% of the normal desktop users you are talking > about the Microsoft Windows users, which is indeed something like 90%? > If you really want to be part of the majority, perhaps you installed the > wrong operating system. If you want to be included in the 0.5% of the > desktop population who are smart enough to run OpenSolaris, maybe you > should add a mirror drive.Not to be a party pooper, but once the Apple brigade gets their filthy hands on ZFS (post-Snow Leopard?), it will be an issue. Personally, I run a mirror. -mg
Richard Elling
2009-Apr-19 20:56 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
Uwe Dippel wrote:> Casper.Dik at Sun.COM wrote: >> >> I would suggest that you follow my recipe: not check the boot-archive >> during a reboot. And then report back. (I''m assuming that that will >> take several weeks) >> > > We are back at square one; or, at the subject line. > I did a zpool status -v, everything was hunky dory. > Next, a power failure, 2 hours later, and this is what zpool status -v > thinks: > > zpool status -v > pool: rpool > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > c1d0s0 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > //etc/svc/repository-boot-20090419_174236This file is created at boot time, not when power has failed. So the fault likely occurred during the boot. With this knowledge, the rest of your argument makes no sense. -- richard> > I know, the hord-core defenders of ZFS will repeat for the umpteenth > time that I should be grateful that ZFS can NOTICE and inform about > the problem. > Others might want to repeat that this is not supposed to happen in the > first place. > > Reliability at power failure? That was my question, and I had to learn > that the answer is ''no''. > How about my proposal to always have a proper snapshot available? And > after some 4 days without any CKSUM error, how can yanking the power > cord mess boot-stuff? > > Uwe > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Marion Hakanson
2009-Apr-19 21:17 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
udippel at gmail.com said:> dick at nagual.nl wrote: >> Maybe because on the fifth day some hardware failure occurred? ;-) > > That would be which? The system works and is up and running beautifully. > OpenSolaris, as of now.Running beautifully as long as the power stays on? Is it hard to believe hardware might glitch at power-failure (or power-on-after-failure)?> Ah, you''re hinting at a rare hardware glitch as underlying problem? AFAIU, > it is a proclaimed feature of ZFS that writes are atomic, out and overNot only does ZFS advertise atomic updates, it also _depends_ on them, and checks for them having happened, likely more so than other filesystems. Is it hard to believe that ZFS is exercising and/or checking up on your hardware in ways that Linux does not do?> Uwe, > who is a big fan of a ZFS that fulfills all of its promises. Snapshots and > luupgrade have yet to fail me on it. And a few other beautiful things. It is > the reliability that makes me wonder if UFS/FFS/ext3 are not better choices > in this respect. Blaming standard, off-the-shelf hardware as ''too cheap'' is a > too slippery slope, btw.Sorry to hear you''re still having this issue. I can only offer anecdotal experience: Running Solaris-10 here, non-mirrored ZFS root/boot since last December (other ZFS filesystems, mirrored and non-mirrored, for 2 years prior), on standard off-the-shelf PC, slightly more than 5 years old. This system has been through multiple power-failures, never with any corruption. Same goes for a 2-yr-old Dell desktop PC at work, with mirrored ZFS root/boot; Multiple power failures, never any reported checksum errors or other corruption. We also have Solaris-10 systems at work, non-ZFS-boot, but with ZFS running without redundancy on non-Sun fiberchannel RAID gear. These have had power failures and other SAN outages without causing corruption of ZFS filesystems. We have experienced a number of times where systems failed to boot after power-failure, due to boot-archive being out of date. Not corrupted, just out of date. Annoying and inconvient for production systems, but nothing at all to do with ZFS. So, I personally have not found ZFS to be any less reliable in presence of power failures than Solaris-10/UFS or Linux on the same hardware. I wonder what is it that''s unique or rare about your situation, that OpenSolaris and/or ZFS is uncovering? I also wonder how hard it might be to make ZFS resilient to whatever unique/rare circumstances you have, as compared to finding/fixing/avoiding those circumstances. Regards, Marion
David Magda
2009-Apr-19 21:56 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
On Apr 19, 2009, at 12:52, dick hoogendijk wrote:> You need redundancy and you don''t get that on a single drive. A > sound use of ZFS needs it.Not quite the same, but... "zfs set copies=2 myzfsfs" ?
dick hoogendijk
2009-Apr-19 22:17 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
On Sun, 19 Apr 2009 17:56:54 -0400 David Magda <dmagda at ee.ryerson.ca> wrote:> On Apr 19, 2009, at 12:52, dick hoogendijk wrote: > > > You need redundancy and you don''t get that on a single drive. A > > sound use of ZFS needs it. > Not quite the same, but... > > "zfs set copies=2 myzfsfs" ?Like you say: not quite the same. If your drive fails, you''re scr**d -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | nevada / opensolaris + All that''s really worth doing is what we do for others (Lewis Carrol)
Richard Elling wrote:>> >> //etc/svc/repository-boot-20090419_174236 > > This file is created at boot time, not when power has failed. > So the fault likely occurred during the boot. With this knowledge, > the rest of your argument makes no sense.reboot system boot Sun Apr 19 17:46 reboot system down Sun Apr 19 17:45 reboot system boot Sun Apr 19 17:44 reboot system down Sun Apr 19 17:44 reboot system boot Sun Apr 19 17:43 reboot system down Sat Apr 18 15:09 The result that you saw was the one after the last boot at 17:46. You are probably correct with your statement that the fault probably occurred at boot time. Uwe
Robert Thurlow
2009-Apr-20 16:27 UTC
[zfs-discuss] [on-discuss] Reliability at power failure?
dick hoogendijk wrote:> Sorry Uwe, but the answer is yes. Assuming that your hardware is in > order. I''ve read quite some msgs from you here recently and all of them > make me think you''re no fan of zfs at all. Why don''t you quit using it > and focus a little more on installing SunStudioI would really like to NOT chase people away from ZFS for any reason. There''s no need. ZFS is currently a little too expert-friendly. I''m used to ZFS, so when it shows me messages, I know what it''s saying. But when I read them a second time, I always wonder if we could word them to be more approachable without losing the precision. I would like to see alternate wordings suggested in RFEs, since I think some folks had good suggestions. As an example of wording that needs an upgrade: > errors: Permanent errors have been detected in the following files: > <0xa6>:<0x4f002> Could we not offer a clue that this was in metadata, even if it is darned hard to print a meaningful path name? Obligatory positive message: I was rewiring my monitors yesterday to get them all on a switchable power bar, and bumped a power switch briefly. The old dual Opteron machine hosting my storage pool did not power up again after that. I had an external Firewire case the pool had been destined for, and so I removed the drives and put them in the external case, and plugged the case into my SunBlade 2500. ''zpool import -f'' went nicely, and I didn''t lose a thing. I don''t think any other filesystem or OS would make a recovery operation like this any easier. Oh yeah, this was after a mostly-effortless ZFS-accelerated Live Upgrade from snv_91 to snv_112 (almost a year) on another box. Rob T