Nomen Nescio
2012-Aug-30 14:08 UTC
[zfs-discuss] ZFS ok for single disk dev box? <D1B1A95FBD CF7341AC8EB0A97FCCC477127FD5D4@SN2PRD0410MB372.namprd04.prod.outlook.com>
> > Hi. I have a spare off the shelf consumer PC and was thinking about loading > > Solaris on it for a development box since I use Studio @work and like it > > better than gcc. I was thinking maybe it isn''t so smart to use ZFS since it > > has only one drive. If ZFS detects something bad it might kernel panic and > > lose the whole system right? I realize UFS /might/ be ignorant of any > > corruption but it might be more usable and go happily on it''s way without > > noticing? Except then I have to size all the partitions and lose out on > > compression etc. Any suggestions thankfully received. > > Suppose you start getting checksum errors. Then you *do* want to notice.I''m not convinced. I understand the theoretical value of ZFS but it introduces a whole new layer of problems other filesystems don''t have. Even if it''s right in theory it doesn''t always make things better in reality. I like the features it provides and not having to size filesystems like in the old days is great, but ZFS can and does have bugs and like anything else is not perfect. Aside from Microsoft which used to be guaranteed to corrupt filesystems I haven''t ever had corruption that caused me any problems. Certainly there must have been corruptions because of software bugs and crappy hardware but they had no visible effect and that is good enough for me in this situation I asked about. I feel this issue is a little overblown given most of the world runs on other enterprise filesystems and the world hasn''t come to and end yet. ZFS is an important step in the right direction but it doesn''t mean you can''t live without it''s error detection. We lived without it until now. What I find hard to live without is the management features it gives you which is why I have a dilemna. In this specific use case I would rather have a system that''s still bootable and runs as best it can than an unbootable system that has detected an integrity problem especially at this point in ZFS''s life. If ZFS would not panic the kernel and give the option to fail or mark file(s) bad, I would like it more. But having the ability manage the disk with one pool and the other nice features like compression plus the fact it works nicely on good hardware make it hard to go back once you made the jump. Choices, choices.> > Even if your system does crash, at least you now have an opportunity to > > recognize there is a problem, and think about your backups, rather than > > allowing the corruption to proliferate.This isn''t a production box as I said it''s an unused PC with a single drive, and I don''t have anybody''s bank accounts on it. I can rsync whatever I work on that day to a backup server. It won''t be a disaster if UFS suddenly becomes unreliable and I lose a file or two, or if a drive fails, but it would be very annoying if ZFS barfed on a technicality and I had to reinstall the whole OS because of a kernel panic and an unbootable system.
Sašo Kiselkov
2012-Aug-30 14:21 UTC
[zfs-discuss] ZFS ok for single disk dev box? <D1B1A95FBD CF7341AC8EB0A97FCCC477127FD5D4@SN2PRD0410MB372.namprd04.prod.outlook.com>
On 08/30/2012 04:08 PM, Nomen Nescio wrote:>>> Hi. I have a spare off the shelf consumer PC and was thinking about loading >>> Solaris on it for a development box since I use Studio @work and like it >>> better than gcc. I was thinking maybe it isn''t so smart to use ZFS since it >>> has only one drive. If ZFS detects something bad it might kernel panic and >>> lose the whole system right? I realize UFS /might/ be ignorant of any >>> corruption but it might be more usable and go happily on it''s way without >>> noticing? Except then I have to size all the partitions and lose out on >>> compression etc. Any suggestions thankfully received. >> >> Suppose you start getting checksum errors. Then you *do* want to notice. > > I''m not convinced. I understand the theoretical value of ZFS but it > introduces a whole new layer of problems other filesystems don''t have. Even > if it''s right in theory it doesn''t always make things better in reality. I > like the features it provides and not having to size filesystems like in > the old days is great, but ZFS can and does have bugs and like anything else > is not perfect. Aside from Microsoft which used to be guaranteed to corrupt > filesystems I haven''t ever had corruption that caused me any problems. > Certainly there must have been corruptions because of software bugs and > crappy hardware but they had no visible effect and that is good enough for > me in this situation I asked about. I feel this issue is a little overblown > given most of the world runs on other enterprise filesystems and the world > hasn''t come to and end yet. ZFS is an important step in the right direction > but it doesn''t mean you can''t live without it''s error detection. We lived > without it until now. What I find hard to live without is the management > features it gives you which is why I have a dilemna.1) Anecdotal evidence is nearly worthless in matters of technology. 2) Data corruption does happen, and HDD manufacturers can even pin a number to it (the typical bit error rate on modern HDDs is around 10^-13, i.e. one bit error per ~10TB transferred). That it didn''t hit your sensitive data but only some random pixel in an MPEG movie is good for you. But ZFS was built to handle environments where all data is critically important. 3) Data corruption also happens in-transit on the SATA/SAS buses and in memory (that''s why there is a thing as ECC memory). 4) If it so bothers you, simply set checksum=off and fly without the parachute (a single core of a modern CPU can checksum at a rate upwards of 4GB/s, but if the few CPU cycles are so important to you, turn it off).> In this specific use case I would rather have a system that''s still bootable > and runs as best it can than an unbootable system that has detected an > integrity problem especially at this point in ZFS''s life. If ZFS would not > panic the kernel and give the option to fail or mark file(s) bad, I would > like it more.ZFS doesn''t panic in case of an unrecoverable single-block error, it simply returns an I/O error to the calling application. The panic only *can* take place in case of a catastrophic pool failure and isn''t the default anyway. See man zpool(1M) for the description of the "failmode" option.> But having the ability manage the disk with one pool and the other nice > features like compression plus the fact it works nicely on good hardware > make it hard to go back once you made the jump. Choices, choices.So you want to enable compression (which is a huge CPU hug) and worry about checksumming (which is tiny in comparison)? If you''re compressing data, you''ve got all the more reason to enable checksumming, since compression tends to make all data corruption much, much worse (e.g. that''s why a single-bit error in a compressed MPEG stream doesn''t simply slightly alter the color of a single pixel, but typically instead results in a whole macroblock or row of macroblocks messing up completely).>>> Even if your system does crash, at least you now have an opportunity to >>> recognize there is a problem, and think about your backups, rather than >>> allowing the corruption to proliferate. > > This isn''t a production box as I said it''s an unused PC with a single drive, > and I don''t have anybody''s bank accounts on it. I can rsync whatever I work > on that day to a backup server. It won''t be a disaster if UFS suddenly > becomes unreliable and I lose a file or two, or if a drive fails, but it > would be very annoying if ZFS barfed on a technicality and I had to > reinstall the whole OS because of a kernel panic and an unbootable system.As noted before, simple checksum errors won''t panic your box, and neither will catastrophic pool failure (the default failmode=wait). You have to explicitly tell ZFS that you want it to panic your system in this situation. Cheers, -- Saso
Justin Stringfellow
2012-Aug-30 14:30 UTC
[zfs-discuss] ZFS ok for single disk dev box? <D1B1A95FBD CF7341AC8EB0A97FCCC477127FD5D4@SN2PRD0410MB372.namprd04.prod.outlook.com>
> would be very annoying if ZFS barfed on a technicality and I had to reinstall the whole OS because of a kernel panic and an unbootable system.Is this a known scenario with ZFS then? I can''t recall hearing of this happening. I''ve seen plenty of UFS filesystems dieing with "panic: freeing free" and then the ensuing fsck-athon convinces the user to just rebuild the fs in question. cheers, --justin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120830/18167e87/attachment.html>
Fajar A. Nugraha
2012-Aug-30 14:47 UTC
[zfs-discuss] ZFS ok for single disk dev box? <D1B1A95FBD CF7341AC8EB0A97FCCC477127FD5D4@SN2PRD0410MB372.namprd04.prod.outlook.com>
On Thu, Aug 30, 2012 at 9:08 PM, Nomen Nescio <nobody at dizum.com> wrote:> In this specific use case I would rather have a system that''s still bootable > and runs as best it canThat''s what would happen if the corruption happens on part of the disk (e.g. bad sector).> than an unbootable system that has detected an > integrity problem especially at this point in ZFS''s life. If ZFS would not > panic the kernel and give the option to fail or mark file(s) bad,You''d be unable to access that particular file. Access to other files would still be fine.> it > would be very annoying if ZFS barfed on a technicality and I had to > reinstall the whole OS because of a kernel panic and an unbootable system.It shouldn''t do that. Plus, if you look around a bit, you''ll find some tutorials to back up the entire OS using zfs send-receive. So even if for some reason the OS becomes unbootable (e.g. blocks on some critical file is corrupted, which would cause panic/crash no matter what filesystem you use), the "reinstall" process is basically just a zfs send-receive plus installing the bootloader, so it can be VERY fast. This is what I do on linux (ubuntu + zfsonlinux). Two notebooks and one USB disk (which function as rescue/backup disk) basically store the same copy of the OS dataset, with very small variations (only four files) for each environment. I can even update one of them and copy the update result (using incremental send) to the others, making sure I will always have the same working environment no matter which notebook I''m working on. -- Fajar
> > would be very annoying if ZFS barfed on a technicality and I had to > > reinstall the whole OS because of a kernel panic and an unbootable system. > > It shouldn''t do that.I agree but it seems like other people had it happen.> Plus, if you look around a bit, you''ll find some tutorials to back up > the entire OS using zfs send-receive. So even if for some reason the > OS becomes unbootable (e.g. blocks on some critical file is corrupted, > which would cause panic/crash no matter what filesystem you use), the > "reinstall" process is basically just a zfs send-receive plus > installing the bootloader, so it can be VERY fast.Now that is interesting. But how do you do a receive before you reinstall? Live cd?? Thanks
>Now that is interesting. But how do you do a receive before you reinstall? >Live cd??Just boot off of the CD (or jumpstart server) to single user mode. Format your new disk, create a zpool, zfs recv, installboot (or installgrub), reboot and done.
I asked what I thought was a simple question but most of the answers don''t have too much to do with the question. Now it seems to be an argument of your filesystem is better than any other filesystem. I don''t think it is because I have seen the horror stories lurking on this list. I had no intention to get into this and I think you should have no intention either. I like ZFS, I use it at workand I am not here to knock it.> 1) Anecdotal evidence is nearly worthless in matters of technology.Agree but fail to see the relevance. Bug reports on this list aren''t worthless or the list wouldn''t exist.> 2) Data corruption does happen, and HDD manufacturers can even pin a > number to it (the typical bit error rate on modern HDDs is around > 10^-13, i.e. one bit error per ~10TB transferred). That it didn''t > hit your sensitive data but only some random pixel in an MPEG movie > is good for you. But ZFS was built to handle environments where all > data is critically important.I don''t think I have 10TB of source code ;) Other file systems also handle critically important data. Every design has its tradeoffs and I don''t believe ZFS is superior to anything else although it has many nice management features which aren''t available in the same feature set elsewhere. I am not criticising ZFS, but I don''t believe it solves every problem either.> 3) Data corruption also happens in-transit on the SATA/SAS buses and > in memory (that''s why there is a thing as ECC memory).Right.> > 4) If it so bothers you, simply set checksum=off and fly without the > parachute (a single core of a modern CPU can checksum at a rate > upwards of 4GB/s, but if the few CPU cycles are so important to you, > turn it off).You''re making up imaginary motives and blaming them on me? I didn''t say I don''t want to spend cycles on checksumming. I said I don''t want to lose a system because of a filesystem error. There''s no need to be snide or condescending. Maybe you need a vacation? Who''s your boss?> > > In this specific use case I would rather have a system that''s still bootable > > and runs as best it can than an unbootable system that has detected an > > integrity problem especially at this point in ZFS''s life. If ZFS would not > > panic the kernel and give the option to fail or mark file(s) bad, I would > > like it more. > > ZFS doesn''t panic in case of an unrecoverable single-block error, it > simply returns an I/O error to the calling application. The panic only > *can* take place in case of a catastrophic pool failure and isn''t the > default anyway. See man zpool(1M) for the description of the "failmode" > option.ZFS is not perfect and although it may be designed to do what you say I think errors in ZFS are more likely than bit errors on hard drives. I''m betting on hardware and /in this scenario/ I would prefer a filesystem that tolerates it even ignorantly rather than protecting me from myself. What I''d really like is an option (maybe it exists) in ZFS to say when a block fails a checksum tell me which file it affects and let me decide to proceed or dump.> > But having the ability manage the disk with one pool and the other nice > > features like compression plus the fact it works nicely on good hardware > > make it hard to go back once you made the jump. Choices, choices. > > So you want to enable compression (which is a huge CPU hug) and worry > about checksumming (which is tiny in comparison)?Yes, you got it right this time. You''re the one trying to put words in my mouth. Nowhere did I ever suggest CPU cycles are an issue. The issue is what I said. Scroll up.> If you''re compressing data, you''ve got all the more reason to enable > checksumming, since compression tends to make all data corruption much, > much worse (e.g. that''s why a single-bit error in a compressed MPEG stream > doesn''t simply slightly alter the color of a single pixel, but typically > instead results in a whole macroblock or row of macroblocks messing up > completely).Sounds reasonable.> > >>> Even if your system does crash, at least you now have an opportunity to > >>> recognize there is a problem, and think about your backups, rather than > >>> allowing the corruption to proliferate. > > > > This isn''t a production box as I said it''s an unused PC with a single drive, > > and I don''t have anybody''s bank accounts on it. I can rsync whatever I work > > on that day to a backup server. It won''t be a disaster if UFS suddenly > > becomes unreliable and I lose a file or two, or if a drive fails, but it > > would be very annoying if ZFS barfed on a technicality and I had to > > reinstall the whole OS because of a kernel panic and an unbootable system. > > As noted before, simple checksum errors won''t panic your box, and > neither will catastrophic pool failure (the default failmode=wait). You > have to explicitly tell ZFS that you want it to panic your system in > this situation.I have read reports on this list that show ZFS does panic the system by default in some cases. It may not have been for checksum failures, I have no idea why it did, but enough people wrote about crashed boxes to make me ask the question I asked. Thanks for the copies suggestion. I''m too busy to argue with you so please pretend this thread never happened.
> I asked what I thought was a simple question but most of the answers don''t > have too much to do with the question.Hehe, welcome to mailing lists ;).> What I''d > really like is an option (maybe it exists) in ZFS to say when a block fails > a checksum tell me which file it affectsIt does exactly that.> I have read reports on this list that show ZFS does panic the system by > default in some cases. It may not have been for checksum failures, I have no > idea why it did, but enough people wrote about crashed boxes to make me ask > the question I asked.I''ve never heard or experienced anything like that.
On Thu, Aug 30, 2012 at 11:15 PM, Nomen Nescio <nobody at dizum.com> wrote:>> Plus, if you look around a bit, you''ll find some tutorials to back up >> the entire OS using zfs send-receive. So even if for some reason the >> OS becomes unbootable (e.g. blocks on some critical file is corrupted, >> which would cause panic/crash no matter what filesystem you use), the >> "reinstall" process is basically just a zfs send-receive plus >> installing the bootloader, so it can be VERY fast. > > Now that is interesting. But how do you do a receive before you reinstall? > Live cd??Live CD, live USB, or better yet, a full-blown installation on a USB disk. This is different from a live USB in that it''s faster and you can customize it (i.e. add/remove packages) just like a normal installation. -- Fajar
Thanks, sounds awesome! Pretty much takes away my concern of using ZFS! Stu> > >Now that is interesting. But how do you do a receive before you reinstall? > >Live cd?? > > > Just boot off of the CD (or jumpstart server) to single user mode. Format > your new disk, create a zpool, zfs recv, installboot (or installgrub), > reboot and done.