This subject may have been ridden to death... I missed it if so. Not wanting to start a flame fest or whatever but.... As a common slob who isn''t very skilled, I like to see some commentary from some of the pros here as to any comparison of zfs against btrfs. I realize btrfs is a lot less `finished'' but I see it is starting to show up as an option on some linux install routines... Debian an ubuntu I noticed and probably many others. My main reasons for using zfs are pretty basic compared to some here and I wondered how btrfs stacks up on the basic qualities.
On Mon, Oct 17, 2011 at 8:29 AM, Harry Putnam <reader at newsguy.com> wrote:> This subject may have been ridden to death... I missed it if so. > > Not wanting to start a flame fest or whatever but.... > > As a common slob who isn''t very skilled, I like to see some commentary > from some of the pros here as to any comparison of zfs against btrfs. > > I realize btrfs is a lot less `finished'' but I see it is starting to > show up as an option on some linux install routines... Debian an > ubuntu I noticed and probably many others. > > My main reasons for using zfs are pretty basic compared to some here > and I wondered how btrfs stacks up on the basic qualities. >If you only want RAID0 or RAID1, then btrfs is okay. There''s no support for RAID5+ as yet, and it''s been "in development" for a couple of years now. There''s no working fsck tool for btrfs. It''s been "in development" and "released in two weeks" for over a year now. Don''t put any data you need onto btrfs. It''s extremely brittle in the face of power loss. My biggest gripe with btrfs is that they have come up with all new terminology that only applies to them. Filesystem now means "a collection of block devices grouped together". While "sub-volume" is what we''d normally call a "filesystem". And there''s a few other weird terms thrown in as well.>From all that I''ve read on the btrfs mailing list, and news sites around theweb, btrfs is not ready for production use on any system with data that you can''t afford to lose. If you absolutely must run Linux on your storage server, for whatever reason, then you probably won''t be running ZFS. For the next year or two, it would probably be safer to run software RAID (md), with LVM on top, with XFS or Ext4 on top. It''s not the easiest setup to manage, but it would be safer than btrfs. If you don''t need to run Linux on your storage server, then definitely give ZFS a try. There are many options, depending on your level of expertise: FreeNAS for plug-n-play simplicity with a web GUI, FreeBSD for a simpler OS that runs well on x86/amd64 systems, any of the OpenSolaris-based distros, or even Solaris if you have the money. With ZFS you get: - working single, dual, triple parity raidz (RAID5, RAID6, "RAID7" equivalence) - n-way mirroring - end-to-end checksums for all data/metadata blocks - unlimited snapshots - pooled storage - unlimited filesystems - send/recv capabilities - built-in compression - built-in dedupe - built-in encryption (in ZFSv31, which is currently only in Solaris 11) - built-in CIFS/NFS sharing (on Solaris-based systems; FreeBSD uses normal nfsd and Samba for this) - automatic hot-spares (on Solaris-based systems; FreeBSD only supports manual spares) - and more Maybe in another 5 years or so, Btrfs will be up to the point of ZFS today. Just image where ZFS will be in 5 years of so. :) -- Freddie Cash fjwcash at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111017/00fb413c/attachment.html>
On Mon, Oct 17, 2011 at 11:29 AM, Harry Putnam <reader at newsguy.com> wrote:> My main reasons for using zfs are pretty basic compared to some hereWhat are they ? (the reasons for using ZFS)> and I wondered how btrfs stacks up on the basic qualities.I use ZFS @ work because it is the only FS we have been able to find that scales to what we need (hundreds of millions of small files in ONE filesystem). I use ZFS @ home because I really can''t afford to have my data corrupted and I can''t afford Enterprise grade hardware. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
Or, if you absolutely must run linux for the operating system, see: http://zfsonlinux.org/ On Oct 17, 2011, at 8:55 AM, Freddie Cash wrote:> If you absolutely must run Linux on your storage server, for whatever reason, then you probably won''t be running ZFS. For the next year or two, it would probably be safer to run software RAID (md), with LVM on top, with XFS or Ext4 on top. It''s not the easiest setup to manage, but it would be safer than btrfs.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111017/8ac7db73/attachment-0001.html>
Freddie Cash <fjwcash at gmail.com> writes:> If you only want RAID0 or RAID1, then btrfs is okay. There''s no support for > RAID5+ as yet, and it''s been "in development" for a couple of years now.[...] snipped excellent information Thanks much, I''ve very appreciative of the good information. Much better to hear from actual users than pouring thru webpages to get a picture. I''m googling on the citations you posted: FreeNAS and freebsd. Maybe you can give a little synopsis of those too. I mean when it comes to utilizing zfs; is it much the same as if running it on solaris? I knew freebsd had a port, but assumed it would stack up kind of sorry compared to Solaris zfs. Maybe something on the order of the linux fuse/zfs adaptation in usability. Is that assumption wrong? I actually have some experience with Freebsd, (long before there was a zfs port), and it is very linux like in many ways.
On Mon, Oct 17, 2011 at 10:50 AM, Harry Putnam <reader at newsguy.com> wrote:> Freddie Cash <fjwcash at gmail.com> writes: > > > If you only want RAID0 or RAID1, then btrfs is okay. There''s no support > for > > RAID5+ as yet, and it''s been "in development" for a couple of years now. > > [...] snipped excellent information > > Thanks much, I''ve very appreciative of the good information. Much > better to hear from actual users than pouring thru webpages to get a > picture. > > I''m googling on the citations you posted: > > FreeNAS and freebsd. > > Maybe you can give a little synopsis of those too. I mean when it > comes to utilizing zfs; is it much the same as if running it on > solaris? > > FreeBSD 8-STABLE (what will become 8.3) and 9.0-RELEASE (will be releasedhopefully this month) both include ZFSv28, the latest open-source version of ZFS. This includes raidz3 and dedupe support, same as OpenSolaris, Illumos, and other OSol-based distros. Not sure what the latest version of ZFS is in Solaris 10. The ZFS bits work the same as on Solaris with only 2 small differences: - sharenfs property just writes data to /etc/zfs/exports, which is read by the standard NFS daemons (it''s easier to just use /etc/exports to share ZFS filesystems) - sharesmb property doesn''t do anything; you have to use Samba to share ZFS filesystems The only real differences are how the OSes themselves work. If you are fluent in Solaris, then FreeBSD will seem strange (and vice-versa). If you are fluent in Linux, then FreeBSD will be similar (but a lot more cohesive and "put-together").> I knew freebsd had a port, but assumed it would stack up kind of sorry > compared to Solaris zfs. > > Maybe something on the order of the linux fuse/zfs adaptation in usability. > > Is that assumption wrong? > > Absolutely, completely, and utterly false. :) The FreeBSD port of ZFS ispretty much on par with ZFS on OpenSolaris. The Linux port of ZFS is just barely usable. No comparison at all. :)> I actually have some experience with Freebsd, (long before there was a > zfs port), and it is very linux like in many ways. > > That''s like saying that OpenIndiana is very Linux-like in many ways. :)-- Freddie Cash fjwcash at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111017/6bc158be/attachment.html>
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Harry Putnam > > As a common slob who isn''t very skilled, I like to see some commentary > from some of the pros here as to any comparison of zfs against btrfs.I recently put my first btrfs system into production. Here are the similarities/differences I noticed different between btrfs and zfs: Differences: * Obviously, one is meant for linux and the other solaris (etc) * In btrfs, there is only raid1. They don''t have raid5, 6, etc yet. * In btrfs, snapshots are read-write. Cannot be made read-only without quotas, which aren''t implemented yet. * zfs supports quotas. Also, by default creates snapshots read-only but could be made read-write by cloning. * In btrfs, there is no equivalent or alternative to "zfs send | zfs receive" * In zfs, you have the hidden ".zfs" subdir that contains your snapshots. * In btrfs, your snapshots need to be mounted somewhere, inside the same filesystem. So in btrfs, you do something like this... Create a filesystem, then create a subvol called "@" and use it to store all your work. Later when you create snapshots, you essentially duplicate that subvol "@2011-10-18-07-40-00" or something. * btrfs is able to shrink. zfs is not able to shrink. * btrfs is able to defrag. zfs doesn''t have defrag yet. * btrfs is able to balance. (after adding new blank devices, rebalance, so the data & workload are distributed across all the devices.) zfs is not able to do this yet. * zfs has storage tiering. (cache & log devices, such as SSD''s to accelerate performance.) btrfs doesn''t have this yet. * btrfs has no dedup yet. They are planning to do offline dedup. ZFS has online dedup. I wouldn''t recommend zfs dedup yet until performance issues are resolved, which seems like never. But when and if zfs dedup performance issues are resolved, online dedup should greatly outperform offline dedup, both in terms of speed and disk usage. * zfs has the concept of a zvol, you can export iscsi or format with any filesystem you like. If you want to do the same in btrfs, you have to create a file and use it loopback. This accomplishes the same thing, but the creation time is much longer (zero time versus linear time, could literally be called "infinitely" longer) ... so this is an advantage for zfs. * zfs has filesystem property inheritance and recursion of commands like "snapshot" and "send." Btrfs doesn''t. * zfs has permissions - allow users or groups to create/destroy snapshots and stuff like that. In btrfs you''ll have to kludge something through sudo or whatever. Similarities: * Both are able to grow. (Add devices & storage) * Neither one has a fsck. They both have scrub. (btrfs calls it "scan" and zfs calls it "scrub.") (Correction ... In the latest btrfs beta, I see there exists btrfsck, but I don''t know if it''s a full fledged fsck. Maybe it''s just a frontend for scan? People are still saying there is no fsck.) * Both do compression. By default zfs compression is fast but you could use zlib if you want. By default btrfs uses zlib, but you could opt for fast if you want.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Harry Putnam > > FreeNAS and freebsd. > > Maybe you can give a little synopsis of those too. I mean when it > comes to utilizing zfs; is it much the same as if running it on > solaris?For somebody who didn''t want to start a flame war, you sure picked the wrong question. ;-) I personally will say: I personally use only solaris. I have reasons for that, but there are a lot of other people here who use other systems.
On 10/18/11 13:18, Edward Ned Harvey wrote:> * btrfs is able to balance. (after adding new blank devices, rebalance, so > the data& workload are distributed across all the devices.) zfs is not > able to do this yet.ZFS does slightly biases new vdevs for new writes so that we will get to a more even spread. It doesn''t go and move already written blocks onto the new vdevs though. So while there isn''t an admin interface to rebalancing ZFS does do something in this area. This is implemented in metaslab_alloc_dva() http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c See lines 1356-1378 -- Darren J Moffat
2011-10-18 16:26, Darren J Moffat ?????:> On 10/18/11 13:18, Edward Ned Harvey wrote: >> * btrfs is able to balance. (after adding new blank devices, >> rebalance, so >> the data& workload are distributed across all the devices.) zfs is not >> able to do this yet. > > ZFS does slightly biases new vdevs for new writes so that we will get > to a more even spread. It doesn''t go and move already written blocks > onto the new vdevs though. So while there isn''t an admin interface to > rebalancing ZFS does do something in this area. > > This is implemented in metaslab_alloc_dva() > > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c > > > See lines 1356-1378 >And the admin interface would be what exactly?.. After adding a device, I''d kick it "go rewrite old data including snapshots and clones so it''s written in a balanced manner anew? Kind of like send-recv in the same pool? Why is it not done yet? ;) //Jim
On 10/18/11 14:04, Jim Klimov wrote:> 2011-10-18 16:26, Darren J Moffat ?????: >> On 10/18/11 13:18, Edward Ned Harvey wrote: >>> * btrfs is able to balance. (after adding new blank devices, >>> rebalance, so >>> the data& workload are distributed across all the devices.) zfs is not >>> able to do this yet. >> >> ZFS does slightly biases new vdevs for new writes so that we will get >> to a more even spread. It doesn''t go and move already written blocks >> onto the new vdevs though. So while there isn''t an admin interface to >> rebalancing ZFS does do something in this area. >> >> This is implemented in metaslab_alloc_dva() >> >> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c >> >> >> See lines 1356-1378 >> > > And the admin interface would be what exactly?..As I said there isn''t one because that isn''t how it works today it is all automatic and only for new writes. I was pointing out that ZFS does do ''something'' not that it had an exactly matching feature. -- Darren J Moffat
I looked into btrfs some time ago for the same reasons. I had a Linux system that I wanted to do more intelligent things with storage. However, I reverted to Ext3/4 and MD because of the portions of btrfs that haven''t been completed. It seems that btrfs development is very slow, which doesn''t make me feel that a bug that I find (or even a fsck tool) will be fixed/provided. Another item that made me nervous was my experience with ZFS. Even when called ''ready for production'', a number of bugs were found that were pretty nasty. They''ve since been fixed (years ago), but there were some surprises there that I''d rather not encounter on a Linux system. While I like to try the latest thing, I''ve spent quite a bit of time generating/collecting my data. I really don''t want to lose it if I can avoid it. :-) I came to the conclusion that btrfs isn''t ready for prime time. I''ll re-evaluate as development continues and the missing portions are provided. I''m seriously thinking about converting the Linux system in question into a FreeBSD system so that I can use ZFS. On Oct 17, 2011, at 9:29 AM, Harry Putnam wrote:> This subject may have been ridden to death... I missed it if so. > > Not wanting to start a flame fest or whatever but.... > > As a common slob who isn''t very skilled, I like to see some commentary > from some of the pros here as to any comparison of zfs against btrfs. > > I realize btrfs is a lot less `finished'' but I see it is starting to > show up as an option on some linux install routines... Debian an > ubuntu I noticed and probably many others. > > My main reasons for using zfs are pretty basic compared to some here > and I wondered how btrfs stacks up on the basic qualities. > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss----- Gregory Shaw, Enterprise IT Architect Phone: (303) 246-5411 Oracle Global IT Service Design Group 500 Eldorado Blvd, UBRM02-157 greg.shaw at oracle.com (work) Broomfield, CO 80021 gregs at fmsoft.com (home) Hoping the problem magically goes away by ignoring it is the "microsoft approach to programming" and should never be allowed. (Linus Torvalds)
Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> writes:> I recently put my first btrfs system into production. Here are the > similarities/differences I noticed different between btrfs and zfs:Great input.. thanks for the details.
Gregory Shaw <greg.shaw at oracle.com> writes:> I looked into btrfs some time ago for the same reasons. I had a Linux > system that I wanted to do more intelligent things with storage.Great details, thanks.
On Tue, Oct 18, 2011 at 9:13 AM, Darren J Moffat <darrenm at opensolaris.org> wrote:> On 10/18/11 14:04, Jim Klimov wrote: >> >> 2011-10-18 16:26, Darren J Moffat ?????:>>> >>> ZFS does slightly biases new vdevs for new writes so that we will get >>> to a more even spread. It doesn''t go and move already written blocks >>> onto the new vdevs though. So while there isn''t an admin interface to >>> rebalancing ZFS does do something in this area. >>> >>> This is implemented in metaslab_alloc_dva() >>> >>> >>> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c >>> >>> >>> See lines 1356-1378 >>> >> >> And the admin interface would be what exactly?.. > > As I said there isn''t one because that isn''t how it works today it is all > automatic and only for new writes. > > I was pointing out that ZFS does do ''something'' not that it had an exactly > matching feature.I have done a "poor man''s" rebalance by copying data after adding devices. I know this is not a substitute for a real online rebalance, but it gets the job done (if you can take the data offline, I do it a small chunk at a time). -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Tue, 18 Oct 2011, Gregory Shaw wrote:> > I''m seriously thinking about converting the Linux system in question > into a FreeBSD system so that I can use ZFS.FreeBSD is a wonderfully stable, coherent, and well-documented system which has stood the test of time and has an excellent development team. Zfs 28 is fairly new to FreeBSD but there is every reason to believe that it will be close to "production" grade when FreeBSD 9.0 is released. The main shortcoming of zfs in FreeBSD is that kernel memory allocation is not yet coherent/shared as it is in Solaris. If you install enough memory, then this becomes a non-issue. If you are planning to build an NFS server, then it is good to know that Solaris does NFS better than Linux or FreeBSD. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On 10/18/11 07:18 AM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Harry Putnam >> >> As a common slob who isn''t very skilled, I like to see some commentary >> from some of the pros here as to any comparison of zfs against btrfs. >> >> * Neither one has a fsck. They both have scrub. (btrfs calls it "scan" and >> zfs calls it "scrub.") (Correction ... In the latest btrfs beta, I see >> there exists btrfsck, but I don''t know if it''s a full fledged fsck. Maybe >> it''s just a frontend for scan? People are still saying there is no fsck.)I just wanted to add something on fsck on ZFS - because for me that used to make ZFS ''not ready for prime-time'' in 24x7 5+ 9s uptime environments. Where ZFS doesn''t have an fsck command - and that really used to bug me - it does now have a -F option on zpool import. To me it''s the same functionality for my environment - the ability to try to roll back to a ''hopefully'' good state and get the filesystem mounted up, leaving the corrupted data objects corrupted. So that if the 10-1000 files and objects that went missing aren''t required for my 24x7 5+ 9s application to run (e.g. log files), I can get it rolling again without them quickly, and then get those files recovered from backup afterwards as needed, without having to recover the entire pool from backup. cheers, Brian>> -- >> >> >> ----------------------------------------------------------------------------------- >> Brian Wilson, Solaris SE, UW-Madison DoIT >> Room 3114 CS&S 608-263-8047 >> brian.wilson(a)doit.wisc.edu >> ''I try to save a life a day. Usually it''s my own.'' - John Crichton >> ----------------------------------------------------------------------------------- >> -- >> ----------------------------------------------------------------------------------- >> Brian Wilson, Solaris SE, UW-Madison DoIT >> Room 3114 CS&S 608-263-8047 >> brian.wilson(a)doit.wisc.edu >> ''I try to save a life a day. Usually it''s my own.'' - John Crichton >> -----------------------------------------------------------------------------------
On Tue, Oct 18, 2011 at 9:35 AM, Brian Wilson <bfwilson at doit.wisc.edu> wrote:> I just wanted to add something on fsck on ZFS - because for me that used to > make ZFS ''not ready for prime-time'' in 24x7 5+ 9s uptime environments. > Where ZFS doesn''t have an fsck command - and that really used to bug me - it > does now have a -F option on zpool import. ?To me it''s the same > functionality for my environment - the ability to try to roll back to a > ''hopefully'' good state and get the filesystem mounted up, leaving the > corrupted data objects corrupted. ?[...]Yes, that''s exactly what it is. There''s no point calling it fsck because fsck fixes individual filesystems, while ZFS fixups need to happen at the volume level (at volume import time). It''s true that this should have been in ZFS from the word go. But it''s there now, and that''s what matters, IMO. It''s also true that this was never necessary with hardware that doesn''t lie, but it''s good to have it anyways, and is critical for personal systems such as laptops. Nico --
On Oct 18, 2011, at 11:09 AM, Nico Williams wrote:> On Tue, Oct 18, 2011 at 9:35 AM, Brian Wilson wrote: >> I just wanted to add something on fsck on ZFS - because for me that used to >> make ZFS ''not ready for prime-time'' in 24x7 5+ 9s uptime environments. >> Where ZFS doesn''t have an fsck command - and that really used to bug me - it >> does now have a -F option on zpool import. To me it''s the same >> functionality for my environment - the ability to try to roll back to a >> ''hopefully'' good state and get the filesystem mounted up, leaving the >> corrupted data objects corrupted. [...] > > Yes, that''s exactly what it is. There''s no point calling it fsck > because fsck fixes individual filesystems, while ZFS fixups need to > happen at the volume level (at volume import time). > > It''s true that this should have been in ZFS from the word go. But > it''s there now, and that''s what matters, IMO.Doesn''t a scrub do more than what ''fsck'' does?> > It''s also true that this was never necessary with hardware that > doesn''t lie, but it''s good to have it anyways, and is critical for > personal systems such as laptops.IIRC, fsck was seldom needed at my former site once UFS journalling became available. Sweet update. Mark
On 10/18/11 11:46 AM, Mark Sandrock wrote:> On Oct 18, 2011, at 11:09 AM, Nico Williams wrote: > >> On Tue, Oct 18, 2011 at 9:35 AM, Brian Wilson wrote: >>> I just wanted to add something on fsck on ZFS - because for me that used to >>> make ZFS ''not ready for prime-time'' in 24x7 5+ 9s uptime environments. >>> Where ZFS doesn''t have an fsck command - and that really used to bug me - it >>> does now have a -F option on zpool import. To me it''s the same >>> functionality for my environment - the ability to try to roll back to a >>> ''hopefully'' good state and get the filesystem mounted up, leaving the >>> corrupted data objects corrupted. [...] >> Yes, that''s exactly what it is. There''s no point calling it fsck >> because fsck fixes individual filesystems, while ZFS fixups need to >> happen at the volume level (at volume import time). >> >> It''s true that this should have been in ZFS from the word go. But >> it''s there now, and that''s what matters, IMO. > Doesn''t a scrub do more than what > ''fsck'' does?Oh yes, I wasn''t trying to talk about scrub, in comparison with ''fsck'' - I was talking about zpool import -F. I believe scrub does a lot more.>> It''s also true that this was never necessary with hardware that >> doesn''t lie, but it''s good to have it anyways, and is critical for >> personal systems such as laptops. > IIRC, fsck was seldom needed at > my former site once UFS journalling > became available. Sweet update. > > Mark > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ----------------------------------------------------------------------------------- Brian Wilson, Solaris SE, UW-Madison DoIT Room 3114 CS&S 608-263-8047 brian.wilson(a)doit.wisc.edu ''I try to save a life a day. Usually it''s my own.'' - John Crichton -----------------------------------------------------------------------------------
On Tue, Oct 18, 2011 at 11:46 AM, Mark Sandrock <mark.sandrock at oracle.com>wrote:> > On Oct 18, 2011, at 11:09 AM, Nico Williams wrote: > > > On Tue, Oct 18, 2011 at 9:35 AM, Brian Wilson wrote: > >> I just wanted to add something on fsck on ZFS - because for me that used > to > >> make ZFS ''not ready for prime-time'' in 24x7 5+ 9s uptime environments. > >> Where ZFS doesn''t have an fsck command - and that really used to bug me > - it > >> does now have a -F option on zpool import. To me it''s the same > >> functionality for my environment - the ability to try to roll back to a > >> ''hopefully'' good state and get the filesystem mounted up, leaving the > >> corrupted data objects corrupted. [...] > > > > Yes, that''s exactly what it is. There''s no point calling it fsck > > because fsck fixes individual filesystems, while ZFS fixups need to > > happen at the volume level (at volume import time). > > > > It''s true that this should have been in ZFS from the word go. But > > it''s there now, and that''s what matters, IMO. > > Doesn''t a scrub do more than what > ''fsck'' does? > >Not really. fsck will work on an offline filesystem to correct errors and bring it back online. Scrub won''t even work until the filesystem is already imported and online. If it''s corrupted you can''t even import it, hence the -F flag addition. Plus, IIRC, scrub won''t actually correct any errors, it will only flag them. Manually fixing what scrub finds can be a giant pain.> > > > It''s also true that this was never necessary with hardware that > > doesn''t lie, but it''s good to have it anyways, and is critical for > > personal systems such as laptops. > > IIRC, fsck was seldom needed at > my former site once UFS journalling > became available. Sweet update. > > Mark > >We all hope to never have to run fsck, but not having it at all is a bit of a non-starter in most environments. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111018/09942ff0/attachment.html>
On 10/19/11 03:12 AM, Paul Kraus wrote:> On Tue, Oct 18, 2011 at 9:13 AM, Darren J Moffat > <darrenm at opensolaris.org> wrote: >> On 10/18/11 14:04, Jim Klimov wrote: >>> 2011-10-18 16:26, Darren J Moffat ?????: >>>> ZFS does slightly biases new vdevs for new writes so that we will get >>>> to a more even spread. It doesn''t go and move already written blocks >>>> onto the new vdevs though. So while there isn''t an admin interface to >>>> rebalancing ZFS does do something in this area. >>>> >>>> This is implemented in metaslab_alloc_dva() >>>> >>>> >>>> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c >>>> >>>> >>>> See lines 1356-1378 >>>> >>> And the admin interface would be what exactly?.. >> As I said there isn''t one because that isn''t how it works today it is all >> automatic and only for new writes. >> >> I was pointing out that ZFS does do ''something'' not that it had an exactly >> matching feature. > I have done a "poor man''s" rebalance by copying data after adding > devices. I know this is not a substitute for a real online rebalance, > but it gets the job done (if you can take the data offline, I do it a > small chunk at a time). >I do the same. Whether you do the balance by hand, or the filesystem does it the data still has to be moved around which can be resource intensive. I''d rather do that at a time of my choosing. -- Ian.
On 10/19/11 01:18 AM, Edward Ned Harvey wrote:> I recently put my first btrfs system into production. Here are the > similarities/differences I noticed different between btrfs and zfs: > > Differences: > * Obviously, one is meant for linux and the other solaris (etc) > * In btrfs, there is only raid1. They don''t have raid5, 6, etc yet. > * In btrfs, snapshots are read-write. Cannot be made read-only without > quotas, which aren''t implemented yet. > * zfs supports quotas. Also, by default creates snapshots read-only but > could be made read-write by cloning. > * In btrfs, there is no equivalent or alternative to "zfs send | zfs > receive" > * In zfs, you have the hidden ".zfs" subdir that contains your snapshots. > * In btrfs, your snapshots need to be mounted somewhere, inside the same > filesystem. So in btrfs, you do something like this... Create a > filesystem, then create a subvol called "@" and use it to store all your > work. Later when you create snapshots, you essentially duplicate that > subvol "@2011-10-18-07-40-00" or something. > * btrfs is able to shrink. zfs is not able to shrink. > * btrfs is able to defrag. zfs doesn''t have defrag yet. > * btrfs is able to balance. (after adding new blank devices, rebalance, so > the data& workload are distributed across all the devices.) zfs is not > able to do this yet. > * zfs has storage tiering. (cache& log devices, such as SSD''s to > accelerate performance.) btrfs doesn''t have this yet.So does it suffer the same performance issues as zfs (without a log device) when serving over NFS?> * btrfs has no dedup yet. They are planning to do offline dedup. ZFS has > online dedup. I wouldn''t recommend zfs dedup yet until performance issues > are resolved, which seems like never. But when and if zfs dedup performance > issues are resolved, online dedup should greatly outperform offline dedup, > both in terms of speed and disk usage. > * zfs has the concept of a zvol, you can export iscsi or format with any > filesystem you like. If you want to do the same in btrfs, you have to create > a file and use it loopback. This accomplishes the same thing, but the > creation time is much longer (zero time versus linear time, could literally > be called "infinitely" longer) ... so this is an advantage for zfs. > * zfs has filesystem property inheritance and recursion of commands like > "snapshot" and "send." Btrfs doesn''t. > * zfs has permissions - allow users or groups to create/destroy snapshots > and stuff like that. In btrfs you''ll have to kludge something through sudo > or whatever. > > Similarities: > * Both are able to grow. (Add devices& storage) > * Neither one has a fsck. They both have scrub. (btrfs calls it "scan" and > zfs calls it "scrub.") (Correction ... In the latest btrfs beta, I see > there exists btrfsck, but I don''t know if it''s a full fledged fsck. Maybe > it''s just a frontend for scan? People are still saying there is no fsck.) > * Both do compression. By default zfs compression is fast but you could use > zlib if you want. By default btrfs uses zlib, but you could opt for fast if > you want. >Good input, thanks. Does btrfs have NFSv4 ACL support? -- Ian.
On Tue, 18 Oct 2011 12:05:29 -0500, Tim Cook <tim at cook.ms> wrote:>> Doesn''t a scrub do more than what >> ''fsck'' does? >> > Not really. fsck will work on an offline filesystem to correct errors and > bring it back online. Scrub won''t even work until the filesystem is already > imported and online. If it''s corrupted you can''t even import it, hence the > -F flag addition. Plus, IIRC, scrub won''t actually correct any errors, it > will only flag them. Manually fixing what scrub finds can be a giant pain.IIRC Scrub will correct errors if the pool has sufficient redundancy. So will any read of a corrupted block. http://hub.opensolaris.org/bin/view/Community+Group+zfs/selfheal -- ( Kees Nuyt ) c[_]
On Tue, Oct 18, 2011 at 2:41 PM, Kees Nuyt <k.nuyt at zonnet.nl> wrote:> On Tue, 18 Oct 2011 12:05:29 -0500, Tim Cook <tim at cook.ms> wrote: > > >> Doesn''t a scrub do more than what > >> ''fsck'' does? > >> > > Not really. fsck will work on an offline filesystem to correct errors > and > > bring it back online. Scrub won''t even work until the filesystem is > already > > imported and online. If it''s corrupted you can''t even import it, hence > the > > -F flag addition. Plus, IIRC, scrub won''t actually correct any errors, > it > > will only flag them. Manually fixing what scrub finds can be a giant > pain. > > IIRC Scrub will correct errors if the pool has sufficient > redundancy. So will any read of a corrupted block. > > http://hub.opensolaris.org/bin/view/Community+Group+zfs/selfheal > -- > ( Kees Nuyt > ) > c[_] > >Every scrub I''ve ever done that has found an error required manual fixing. Every pool I''ve ever created has been raid-z or raid-z2, so the silent healing, while a great story, has never actually happened in practice in any environment I''ve used ZFS in. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111018/af1d0716/attachment.html>
On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook <tim at cook.ms> wrote:> > Every scrub I''ve ever done that has found an error required manual fixing. > ?Every pool I''ve ever created has been raid-z or raid-z2, so the silent > healing, while a great story, has never actually happened in practice in any > environment I''ve used ZFS in.You have, of course, reported each such failure, because if that was indeed the case then it''s a clear and obvious bug? For what it''s worth, I''ve had ZFS repair data corruption on several occasions - both during normal operation and as a result of a scrub, and I''ve never had to intervene manually. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Tue, Oct 18, 2011 at 3:06 PM, Peter Tribble <peter.tribble at gmail.com>wrote:> On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook <tim at cook.ms> wrote: > > > > Every scrub I''ve ever done that has found an error required manual > fixing. > > Every pool I''ve ever created has been raid-z or raid-z2, so the silent > > healing, while a great story, has never actually happened in practice in > any > > environment I''ve used ZFS in. > > You have, of course, reported each such failure, because if that > was indeed the case then it''s a clear and obvious bug? > > For what it''s worth, I''ve had ZFS repair data corruption on > several occasions - both during normal operation and as a > result of a scrub, and I''ve never had to intervene manually. > > -- > -Peter Tribble > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ >Given that there are guides on how to manually fix the corruption, I don''t see any need to report it. It''s considered acceptable and expected behavior from everyone I''ve talked to at Sun... http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111018/98d9aa84/attachment.html>
On Tue, Oct 18, 2011 at 9:12 PM, Tim Cook <tim at cook.ms> wrote:> > > On Tue, Oct 18, 2011 at 3:06 PM, Peter Tribble <peter.tribble at gmail.com> > wrote: >> >> On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook <tim at cook.ms> wrote: >> > >> > Every scrub I''ve ever done that has found an error required manual >> > fixing. >> > ?Every pool I''ve ever created has been raid-z or raid-z2, so the silent >> > healing, while a great story, has never actually happened in practice in >> > any >> > environment I''ve used ZFS in. >> >> You have, of course, reported each such failure, because if that >> was indeed the case then it''s a clear and obvious bug? >> >> For what it''s worth, I''ve had ZFS repair data corruption on >> several occasions - both during normal operation and as a >> result of a scrub, and I''ve never had to intervene manually. >> >> -- >> -Peter Tribble >> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ > > > Given that there ?are guides on how to manually fix the?corruption, I don''t > see any need to report it. ?It''s considered acceptable and expected behavior > from everyone I''ve talked to at Sun... > http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.htmlIf you have adequate redundancy, ZFS will - and does - repair errors. The document you quote is for the case where you don''t actually have adequate redundancy: ZFS will refuse to make up data for you, and report back where the problem was. Exactly as designed. (And yes, I''ve come across systems without redundant storage, or had multiple simultaneous failures. The original statement was that if you have redundant copies of the data or, in the case of raidz, enough information to reconstruct it, then ZFS will repair it for you. Which has been exactly in accord with my experience.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
On Tue, Oct 18, 2011 at 3:27 PM, Peter Tribble <peter.tribble at gmail.com>wrote:> On Tue, Oct 18, 2011 at 9:12 PM, Tim Cook <tim at cook.ms> wrote: > > > > > > On Tue, Oct 18, 2011 at 3:06 PM, Peter Tribble <peter.tribble at gmail.com> > > wrote: > >> > >> On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook <tim at cook.ms> wrote: > >> > > >> > Every scrub I''ve ever done that has found an error required manual > >> > fixing. > >> > Every pool I''ve ever created has been raid-z or raid-z2, so the > silent > >> > healing, while a great story, has never actually happened in practice > in > >> > any > >> > environment I''ve used ZFS in. > >> > >> You have, of course, reported each such failure, because if that > >> was indeed the case then it''s a clear and obvious bug? > >> > >> For what it''s worth, I''ve had ZFS repair data corruption on > >> several occasions - both during normal operation and as a > >> result of a scrub, and I''ve never had to intervene manually. > >> > >> -- > >> -Peter Tribble > >> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ > > > > > > Given that there are guides on how to manually fix the corruption, I > don''t > > see any need to report it. It''s considered acceptable and expected > behavior > > from everyone I''ve talked to at Sun... > > http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html > > If you have adequate redundancy, ZFS will - and does - > repair errors. The document you quote is for the case > where you don''t actually have adequate redundancy: ZFS > will refuse to make up data for you, and report back where > the problem was. Exactly as designed. > > (And yes, I''ve come across systems without redundant > storage, or had multiple simultaneous failures. The original > statement was that if you have redundant copies of the data > or, in the case of raidz, enough information to reconstruct > it, then ZFS will repair it for you. Which has been exactly in > accord with my experience.) > > -- > -Peter Tribble > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ >I had and have redundant storage, it has *NEVER* automatically fixed it. You''re the first person I''ve heard that has had it automatically fix it. Per the page "or an unlikely series of events conspired to corrupt multiple copies of a piece of data." Their unlikely series of events, that goes unnamed, is not that unlikely in my experience. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111018/21f23cb5/attachment.html>
On Tue, Oct 18, 2011 at 4:31 PM, Tim Cook <tim at cook.ms> wrote:> I had and have redundant storage, it has *NEVER* automatically fixed it. > ?You''re the first person I''ve heard that has had it automatically fix it.I have had ZFS automatically repair corrupted raw data when one component of the redundancy failed, just as DiskSuite (SLVM) will resync a failed mirror. I think you may be using different definitions of "corrupt". In my case, the backend storage / drive that was part of a redundant zpool failed (or became unreliable). Once the issue was resolved, a resilver operation rewrote the data that had been corrupted on the failing component. No corrupt data was ever presented to the application. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Tue, Oct 18, 2011 at 10:31 PM, Tim Cook <tim at cook.ms> wrote:> > > I had and have redundant storage, it has *NEVER* automatically fixed it. > ?You''re the first person I''ve heard that has had it automatically fix it.Well, here comes another person - I have ZFS automatically fixing corrupted data on a number of raidz pools. Moreover, my laptop (single drive) with copies=2 experienced a number of corruptions that were fixed automatically due to extra copy of the relevant data. I am pretty sure there are much more people with similar experience... -- Regards, ? ? ? ? Cyril
On 10/19/11 09:31 AM, Tim Cook wrote:> > I had and have redundant storage, it has *NEVER* automatically fixed > it. You''re the first person I''ve heard that has had it automatically > fix it.I''m another, I have had many cases of ZFS fixing corrupted data on a number of different pool configurations.> Per the page "or an unlikely series of events conspired to corrupt > multiple copies of a piece of data." > > Their unlikely series of events, that goes unnamed, is not that > unlikely in my experience. >The only one I''ve seen where ZFS reported, but was unable to repair was data corruption caused by bad memory. I haven''t seen any of those since adopting a "no ZFS without ECC" rule. I would probably still be blissfully unaware of the corruption is I wasn''t using ZFS... -- Ian.
On 2011-Oct-18 23:18:02 +1100, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:>I recently put my first btrfs system into production. Here are the >similarities/differences I noticed different between btrfs and zfs:Thanks for that.>* zfs has storage tiering. (cache & log devices, such as SSD''s to >accelerate performance.) btrfs doesn''t have this yet.I''d call that "multi-level caching and journalling". To me, storage tiering means something like HSM - something that lets me push rarely used data to near-line storage (eg big green SATA drives that are spun down most of the time) whilst retaining the ability to transparently access it. On 2011-Oct-19 03:46:30 +1100, Mark Sandrock <mark.sandrock at oracle.com> wrote:>Doesn''t a scrub do more than what ''fsck'' does?It does different things. I''m not sure about "more". fsck verifies the logical consistency of a filesystem. For UFS, this includes: used data blocks are allocated to exactly one file, directory entries point to valid inodes, allocated inodes have at least one link, the number of links in an inode exactly matches the number of directory entries pointing to that inode, directories form a single tree without loops, file sizes are consistent with the number of allocated blocks, unallocated data/inodes blocks are in the relevant free bitmaps, redundant superblock data is consistent. It can''t verify data. scrub uses checksums to verify the contents of all blocks and attempts to correct errors using redundant copies of blocks. This implicitly detects some types of logical errors. I don''t know if scrub includes explicit logic to detect things like directory loops, missing free blocks, unreachable allocated blocks, multiply allocated blocks, etc.>IIRC, fsck was seldom needed at >my former site once UFS journalling >became available. Sweet update.Whilst Solaris very rarely insists we run fsck, we have had a number of cases where we have found files corrupted following a crash - even with UFS journalling enabled. Unfortunately, this isn''t the sort of thing that fsck could detect. -- Peter Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111019/3df8c852/attachment.bin>
On Wed, 19 Oct 2011, Peter Jeremy wrote:>> Doesn''t a scrub do more than what ''fsck'' does? > > It does different things. I''m not sure about "more".Zfs scrub validates user data while ''fsck'' does not. I consider that as being definitely "more". Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Tue, Oct 18, 2011 at 8:38 PM, Gregory Shaw <greg.shaw at oracle.com> wrote:> I came to the conclusion that btrfs isn''t ready for prime time. ?I''ll re-evaluate as development continues and the missing portions are provided.For someone with @oracle.com email address, you could probably arrive to that conclusion faster by asking Chris Mason directly :)> > I''m seriously thinking about converting the Linux system in question into a FreeBSD system so that I can use ZFS.FreeBSD? Not Solaris? Hmmm ... :) Anyway, the way I see it now Linux has more choices. You can try out either btrfs or zfs (even without separate /boot) with a few tweaks. Neither of it are labeled production-ready, but that doesn''t stop some people (which, presumably, know what they''re doing) from putting in in production. I''m still hoping oracle would release source updates to zfs soon so other OS can also use its new features (e.g. encryption). -- Fajar
On Tue, Oct 18, 2011 at 7:18 PM, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:> I recently put my first btrfs system into production. ?Here are the > similarities/differences I noticed different between btrfs and zfs: > > Differences: > * Obviously, one is meant for linux and the other solaris (etc) > * In btrfs, there is only raid1. ?They don''t have raid5, 6, etc yet. > * In btrfs, snapshots are read-write. ?Cannot be made read-only without > quotas, which aren''t implemented yet.Minor correction: btrfs support ro snapshot. It''s available on vanilla linux, but IIRC it requires an (unofficial) updated btrfs-progs (which basically tracks patches sent but not yet integrated to official tree), but it works.> * zfs supports quotas. ?Also, by default creates snapshots read-only but > could be made read-write by cloning.There are proposed patches for btrfs quota support, but the kernel part has not been accepted upstream.> * In btrfs, there is no equivalent or alternative to "zfs send | zfs > receive"Planned. No actual working implementation yet.> * In zfs, you have the hidden ".zfs" subdir that contains your snapshots. > * In btrfs, your snapshots need to be mounted somewhere, inside the same > filesystem. ?So in btrfs, you do something like this... ?Create a > filesystem, then create a subvol called "@" and use it to store all your > work. ?Later when you create snapshots, you essentially duplicate that > subvol "@2011-10-18-07-40-00" or something.Yes. basically btrfs treats a subvolume and snapshot in the same way.> * Both do compression. ?By default zfs compression is fast but you could use > zlib if you want. ?By default btrfs uses zlib, but you could opt for fast if > you want.lzo is planned to be the default in the future. -- Fajar
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Paul Kraus > > I have done a "poor man''s" rebalance by copying data after adding > devices. I know this is not a substitute for a real online rebalance, > but it gets the job done (if you can take the data offline, I do it a > small chunk at a time).I have done the same thing. It''s uncomfortable. It was like this... I want to rebalance, or add compression to existing data, or one of the other reasons somebody might want to do this. I find a directory that is temporarily static, and I do this: (cd workdir ; sudo tar cpf - .) | (mkdir workdir2 ; cd workdir2 ; sudo tar xpf - ) ; sudo mv workdir trash ; sudo mv workdir2 workdir ; sudo rm -rf trash Unfortunately that failed. The idea was to reconstruct the data without anybody noticing, and then then perform an instantaneous "mv" operation to put it into place. Unfortunately, if anything is being used at all in the old dir, then the mv fails, and I end up with workdir/workdir2 and two copies on disk. In practice, I only found this to work: sudo rm -rf workdir ; mkdir workdir ; (cd /blah/snapshot/mysnap ; sudo tar cpf - .) | (cd workdir ; sudo tar xpf -) Hence, I say, it''s uncomfortable.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Tim Cook > > I had and have redundant storage, it has *NEVER* automatically fixed > it. ?You''re the first person I''ve heard that has had it automatically fixit. That''s probably just because it''s normal and expected behavior to automatically fix it - I always have redundancy, and every cksum error I ever find is always automatically fixed. I never tell anyone here because it''s normal and expected. If you have redundancy, and cksum errors, and it''s not automatically fixed, then you should report the bug. I do have a few suggestions, possible ways that you may think you have redundancy and still have such an error... If you''re using hardware raid, then ZFS will only see one virtual aggregate device. There''s no interface to tell the hardware "go read the other copy, because this one was bad." You have to present the individual JBOD disks to the OS, and let ZFS assembe a raid volume out of it. Then ZFS will manage the redundant copies. If your cksum error happened in memory, or in the bus or something, then even fetching new copies from the (actually good) disks might still be received corrupted in memory and result in a cksum error.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Bob Friesenhahn > > On Wed, 19 Oct 2011, Peter Jeremy wrote: > >> Doesn''t a scrub do more than what ''fsck'' does? > > > > It does different things. I''m not sure about "more". > > Zfs scrub validates user data while ''fsck'' does not. I consider that > as being definitely "more".Yes, but when scrub encounters uncorrectable errors, it doesn''t attempt to correct them. Fsck will do things like recover lost files into the lost+found directory, and stuff like that... So, scrub does more of one thing, and fsck does more of a different thing... Which one you call "more" is a matter of perspective. I would just call them different, and each one "better" in its own way, depending on your needs.
> From: Fajar A. Nugraha [mailto:work at fajar.net] > Sent: Tuesday, October 18, 2011 7:46 PM > > > * In btrfs, there is no equivalent or alternative to "zfs send | zfs > > receive" > > Planned. No actual working implementation yet.In fact, I saw, actual work started on this task about a month ago. So it''s not just planned, it''s really in the works. Now we''re talking open source timelines here, which means, "you''ll get it when it''s ready," and nobody knows when that will be. As mentioned elsewhere in this thread, there are some other major features that have been "ready in 2 weeks" for like 2 years now... YMMV. But to me personally, zfs send is one of the HUGEST winning characteristics, so I''m really eager for btrfs send to exist... That''s one of the biggest missing characteristics that make btrfs seriously less attractive than ZFS for me right now. But I''ll sure tell you, building a time machine server (mac) using the latest netatalk on ubuntu beta is sure a HECK of a lot easier than doing the same thing on solaris right now. ;-) Not to mention, I''m happy to run ubuntu on dell servers where solaris was formerly a crash & burn. So I''m using btrfs anywhere that linux is required, and using ZFS anywhere that is OS agnostic (or solaris advantaged) and I just need a filesystem.
On Oct 18, 2011, at 20:26, Edward Ned Harvey wrote:> Yes, but when scrub encounters uncorrectable errors, it doesn''t attempt to > correct them. Fsck will do things like recover lost files into the > lost+found directory, and stuff like that...You say "recover lost files" like you know that they''re actually recovered properly. :) Fsck does place things in lost+found, but there is no guarantee of their usefulness. I recently had to redeploy a VM because the hosting machine''s NIC was corrupting data, and so the underlying disk image became completely hosed. The Linux guest instance merrily went trying to run even though large parts of the Ext3 file system were a mess. After first noticing the problem we did an fsck and lost+found had several thousand entries. It was simpler to redeploy from scratch than wade through the ''recovered'' files.
On Oct 18, 2011, at 20:35, Edward Ned Harvey wrote:> In fact, I saw, actual work started on this task about a month ago. So it''s > not just planned, it''s really in the works. Now we''re talking open source > timelines here, which means, "you''ll get it when it''s ready," and nobody > knows when that will be. As mentioned elsewhere in this thread, there are > some other major features that have been "ready in 2 weeks" for like 2 years > now... YMMV.To be fair, we''ve been waiting for bp* rewrite for a while as well. :)
On Oct 18, 2011, at 10:35, Brian Wilson wrote:> Where ZFS doesn''t have an fsck command - and that really used to bug me - it does now have a -F option on zpool import. To me it''s the same functionality for my environment - the ability to try to roll back to a ''hopefully'' good state and get the filesystem mounted up, leaving the corrupted data objects corrupted. So that if the 10-1000 files and objects that went missing aren''t required for my 24x7 5+ 9s application to run (e.g. log files), I can get it rolling again without them quickly, and then get those files recovered from backup afterwards as needed, without having to recover the entire pool from backup.To a certain extent fsck is a false sense of security: while the utility has walked the file system and fixed some data structures (and perhaps put some stuff in lost+found), what guarantees does that actually give you that you don''t have corrupted files from incomplete, in-flight operations. Without checksums you''re assuming everything is fine. Faith may be fine for some aspects of life, but not necessarily for others. :)
On Oct 18, 2011, at 5:21 PM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Tim Cook >> >> I had and have redundant storage, it has *NEVER* automatically fixed >> it. You''re the first person I''ve heard that has had it automatically fix > it. > > That''s probably just because it''s normal and expected behavior to > automatically fix it - I always have redundancy, and every cksum error I > ever find is always automatically fixed. I never tell anyone here because > it''s normal and expected.Yes, and in fact the automated tests for ZFS developers intentionally corrupts data so that the repair code can be tested. Also, the same checksum code is used to calculate the checksum when writing and reading.> If you have redundancy, and cksum errors, and it''s not automatically fixed, > then you should report the bug.For modern Solaris-based implementations, each checksum mismatch that is repaired reports the bitmap of the corrupted vs expected data. Obviously, if the data cannot be repaired, you cannot know the expected data, so the error is reported without identification of the broken bits. In the archives, you can find reports of recoverable and unrecoverable errors attributed to: 1. ZFS software (rare, but a bug a few years ago mishandled a raidz case) 2. SAN switch firmware 3. "Hardware" RAID array firmware 4. Power supplies 5. RAM 6. HBA 7. PCI-X bus 8. BIOS settings 9. CPU and chipset errata Personally, I''ve seen all of the above except #7, because PCI-X hardware is hard to find now. If consistently see unrecoverable data from a system that has protected data, then there may be an issue with a part of the system that is a single point of failure. Very, very, very few x86 systems are designed with no SPOF. -- richard -- ZFS and performance consulting http://www.RichardElling.com VMworld Copenhagen, October 17-20 OpenStorage Summit, San Jose, CA, October 24-27 LISA ''11, Boston, MA, December 4-9
On Wed, Oct 19, 2011 at 08:40:59AM +1100, Peter Jeremy wrote:> fsck verifies the logical consistency of a filesystem. For UFS, this > includes: used data blocks are allocated to exactly one file, > directory entries point to valid inodes, allocated inodes have at > least one link, the number of links in an inode exactly matches the > number of directory entries pointing to that inode, directories form a > single tree without loops, file sizes are consistent with the number > of allocated blocks, unallocated data/inodes blocks are in the > relevant free bitmaps, redundant superblock data is consistent. It > can''t verify data.Well said. I''d add that people who insist on ZFS having a fsck are missing the whole point of ZFS transactional model and copy-on-write design. Fsck can only fix known file system inconsistencies in file system structures. Because there is no atomicity of operations in UFS and other file systems it is possible that when you remove a file, your system can crash between removing directory entry and freeing inode or blocks. This is expected with UFS, that''s why there is fsck to verify that no such thing happend. In ZFS on the other hand there are no inconsistencies like that. If all blocks match their checksums and you find directory loop or something like that, it is a bug in ZFS, not expected inconsistency. It should be fixed in ZFS and not work-arounded with some fsck for ZFS. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111019/f9b6fab7/attachment.bin>
2011-10-19 15:52, Richard Elling wrote:> In the archives, you can find reports of recoverable and unrecoverable errors > attributed to: > 1. ZFS software (rare, but a bug a few years ago mishandled a raidz case) > 2. SAN switch firmware > 3. "Hardware" RAID array firmware > 4. Power supplies > 5. RAM > 6. HBA > 7. PCI-X bus > 8. BIOS settings > 9. CPU and chipset errata10. Broken HDDs ;) For weird inexplicable bugs, insufficient or faulty power supplies and cooling are often the core cause, at least in "enthisiast PCs". Perhaps the PS is okay to run but fails under some peak loads, and that leads to random bits being generated in RAM or on connection buses... Also some interference can be caused by motors, etc. in the HDDs and cooling fans - with older audio cards you could actually hear your HDD or CDROM spin up - by a characteristic buzz in the headphones or on the loudspeakers. Whether other components would fail or not under such EMI - that depends. //Jim
2011-10-19 15:52, Richard Elling ?????:> In the archives, you can find reports of recoverable and unrecoverable errors > attributed to: > ...Ah, yes, and 11. Faulty disk cabling (i.e. plastic connectors that soften with heat and fall of) - that has happened to cause strange behavior as well ;) Even if the connectors don''t fall off, unreliable physical connection (including oxydization of metal plugs) leads to all sorts of noise on the wire which may be misinterpreted as random bits. These can often be fixed (and diagnozed) by pulling the connectors and plugging them back in - the oxyde film is scratched off, and the cable works again, for a few months more... //Jim
I''d argue that from a *developer* point of view, an fsck tool for ZFS might well be useful. Isn''t that what zdb is for? :-) But ordinary administrative users should never need something like this, unless they have encountered a bug in ZFS itself. (And bugs are as likely to exist in the checker tool as in the filesystem. ;-) - Garrett On Oct 19, 2011, at 2:15 PM, Pawel Jakub Dawidek wrote:> On Wed, Oct 19, 2011 at 08:40:59AM +1100, Peter Jeremy wrote: >> fsck verifies the logical consistency of a filesystem. For UFS, this >> includes: used data blocks are allocated to exactly one file, >> directory entries point to valid inodes, allocated inodes have at >> least one link, the number of links in an inode exactly matches the >> number of directory entries pointing to that inode, directories form a >> single tree without loops, file sizes are consistent with the number >> of allocated blocks, unallocated data/inodes blocks are in the >> relevant free bitmaps, redundant superblock data is consistent. It >> can''t verify data. > > Well said. I''d add that people who insist on ZFS having a fsck are > missing the whole point of ZFS transactional model and copy-on-write > design. > > Fsck can only fix known file system inconsistencies in file system > structures. Because there is no atomicity of operations in UFS and other > file systems it is possible that when you remove a file, your system can > crash between removing directory entry and freeing inode or blocks. > This is expected with UFS, that''s why there is fsck to verify that no > such thing happend. > > In ZFS on the other hand there are no inconsistencies like that. If all > blocks match their checksums and you find directory loop or something > like that, it is a bug in ZFS, not expected inconsistency. It should be > fixed in ZFS and not work-arounded with some fsck for ZFS. > > -- > Pawel Jakub Dawidek http://www.wheelsystems.com > FreeBSD committer http://www.FreeBSD.org > Am I Evil? Yes, I Am! http://yomoli.com > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
On Oct 19, 2011, at 1:52 PM, Richard Elling wrote:> On Oct 18, 2011, at 5:21 PM, Edward Ned Harvey wrote: > >>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >>> bounces at opensolaris.org] On Behalf Of Tim Cook >>> >>> I had and have redundant storage, it has *NEVER* automatically fixed >>> it. You''re the first person I''ve heard that has had it automatically fix >> it. >> >> That''s probably just because it''s normal and expected behavior to >> automatically fix it - I always have redundancy, and every cksum error I >> ever find is always automatically fixed. I never tell anyone here because >> it''s normal and expected. > > Yes, and in fact the automated tests for ZFS developers intentionally corrupts data > so that the repair code can be tested. Also, the same checksum code is used to > calculate the checksum when writing and reading. > >> If you have redundancy, and cksum errors, and it''s not automatically fixed, >> then you should report the bug. > > For modern Solaris-based implementations, each checksum mismatch that is > repaired reports the bitmap of the corrupted vs expected data. Obviously, if the > data cannot be repaired, you cannot know the expected data, so the error is > reported without identification of the broken bits. > > In the archives, you can find reports of recoverable and unrecoverable errors > attributed to: > 1. ZFS software (rare, but a bug a few years ago mishandled a raidz case) > 2. SAN switch firmware > 3. "Hardware" RAID array firmware > 4. Power supplies > 5. RAM > 6. HBA > 7. PCI-X bus > 8. BIOS settings > 9. CPU and chipset errata > > Personally, I''ve seen all of the above except #7, because PCI-X hardware is > hard to find now.I''ve seen #7. I have some PCI-X hardware that is flaky in my home lab. ;-) There was a case of #1 not very long ago, but it was a difficult to trigger race and is fixed in illumos and I presume other derivatives (including NexentaStor). - Garrett> > If consistently see unrecoverable data from a system that has protected data, then > there may be an issue with a part of the system that is a single point of failure. Very, > very, very few x86 systems are designed with no SPOF. > -- richard > > -- > > ZFS and performance consulting > http://www.RichardElling.com > VMworld Copenhagen, October 17-20 > OpenStorage Summit, San Jose, CA, October 24-27 > LISA ''11, Boston, MA, December 4-9 > > > > > > > > > > > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Thank you. The following is the best "layman''s" explanation as to _why_ ZFS does not have an fsck equivalent (or even needs one). On the other hand, there are situations where you really do need to force ZFS to do something that may not be a"good idea", but is the best of a bad set of choices. Hence the zpool import -F (and other such tools available via zdb). While the ZFS data may not be corrupt, it is possible to corrupt the ZFS metadata, uberblock, and labals in such a way that force is necessary. On Wed, Oct 19, 2011 at 8:15 AM, Pawel Jakub Dawidek <pjd at freebsd.org> wrote:> Well said. I''d add that people who insist on ZFS having a fsck are > missing the whole point of ZFS transactional model and copy-on-write > design. > > Fsck can only fix known file system inconsistencies in file system > structures. Because there is no atomicity of operations in UFS and other > file systems it is possible that when you remove a file, your system can > crash between removing directory entry and freeing inode or blocks. > This is expected with UFS, that''s why there is fsck to verify that no > such thing happend. > > In ZFS on the other hand there are no inconsistencies like that. If all > blocks match their checksums and you find directory loop or something > like that, it is a bug in ZFS, not expected inconsistency. It should be > fixed in ZFS and not work-arounded with some fsck for ZFS. > > -- > Pawel Jakub Dawidek ? ? ? ? ? ? ? ? ? ? ? http://www.wheelsystems.com > FreeBSD committer ? ? ? ? ? ? ? ? ? ? ? ? http://www.FreeBSD.org > Am I Evil? Yes, I Am! ? ? ? ? ? ? ? ? ? ? http://yomoli.com > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >-- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Wed, October 19, 2011 08:15, Pawel Jakub Dawidek wrote:> Fsck can only fix known file system inconsistencies in file system > structures. Because there is no atomicity of operations in UFS and other > file systems it is possible that when you remove a file, your system can > crash between removing directory entry and freeing inode or blocks. > This is expected with UFS, that''s why there is fsck to verify that no > such thing happend.Slightly OT, but this non-atomic delay between meta-data updates and writes to the disk is exploited by "soft updates" with FreeBSD''s UFS: http://www.freebsd.org/doc/en/books/handbook/configtuning-disk.html#SOFT-UPDATES It may be of some interest to the file system geeks on the list.
On 10/18/11 03:31 PM, Tim Cook wrote:> > > On Tue, Oct 18, 2011 at 3:27 PM, Peter Tribble > <peter.tribble at gmail.com <mailto:peter.tribble at gmail.com>> wrote: > > On Tue, Oct 18, 2011 at 9:12 PM, Tim Cook <tim at cook.ms > <mailto:tim at cook.ms>> wrote: > > > > > > On Tue, Oct 18, 2011 at 3:06 PM, Peter Tribble > <peter.tribble at gmail.com <mailto:peter.tribble at gmail.com>> > > wrote: > >> > >> On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook <tim at cook.ms > <mailto:tim at cook.ms>> wrote: > >> > > >> > Every scrub I''ve ever done that has found an error required > manual > >> > fixing. > >> > Every pool I''ve ever created has been raid-z or raid-z2, so > the silent > >> > healing, while a great story, has never actually happened in > practice in > >> > any > >> > environment I''ve used ZFS in. > >> > >> You have, of course, reported each such failure, because if that > >> was indeed the case then it''s a clear and obvious bug? > >> > >> For what it''s worth, I''ve had ZFS repair data corruption on > >> several occasions - both during normal operation and as a > >> result of a scrub, and I''ve never had to intervene manually. > >> > >> -- > >> -Peter Tribble > >> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ > > > > > > Given that there are guides on how to manually fix > the corruption, I don''t > > see any need to report it. It''s considered acceptable and > expected behavior > > from everyone I''ve talked to at Sun... > > http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html > > If you have adequate redundancy, ZFS will - and does - > repair errors. The document you quote is for the case > where you don''t actually have adequate redundancy: ZFS > will refuse to make up data for you, and report back where > the problem was. Exactly as designed. > > (And yes, I''ve come across systems without redundant > storage, or had multiple simultaneous failures. The original > statement was that if you have redundant copies of the data > or, in the case of raidz, enough information to reconstruct > it, then ZFS will repair it for you. Which has been exactly in > accord with my experience.) > > -- > -Peter Tribble > http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ > > > > > I had and have redundant storage, it has *NEVER* automatically fixed > it. You''re the first person I''ve heard that has had it automatically > fix it. > Per the page "or an unlikely series of events conspired to corrupt > multiple copies of a piece of data." > > Their unlikely series of events, that goes unnamed, is not that > unlikely in my experience. > > --TimJust another 2 cents towards a euro/dollar/yen. I''ve only had data redundancy in ZFS via mirrors (not that it should matter as long as there''s redundancy), and in every case I''ve had it repair data automatically via a scrub. The one case where it didn''t was when the disk controller both drives happened to share (bad design, yes) started erroring and corrupting writes to both disks in parallel, so there was no good data to fix it with. I was still happy to be using ZFS, as a filesystem without a scrub/scan of some sort wouldn''t have even noticed in my experience - I suspect btrfs would have if it''s scan works similarly. cheers, Brian> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- ----------------------------------------------------------------------------------- Brian Wilson, Solaris SE, UW-Madison DoIT Room 3114 CS&S 608-263-8047 brian.wilson(a)doit.wisc.edu ''I try to save a life a day. Usually it''s my own.'' - John Crichton ----------------------------------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111019/eb4969a9/attachment.html>
On Wed, Oct 19, 2011 at 7:24 AM, Garrett D''Amore <Garrett.DAmore at nexenta.com> wrote:> I''d argue that from a *developer* point of view, an fsck tool for ZFS might well be useful. ?Isn''t that what zdb is for? :-) > > But ordinary administrative users should never need something like this, unless they have encountered a bug in ZFS itself. ?(And bugs are as likely to exist in the checker tool as in the filesystem. ;-)zdb can be useful for admins -- say, to gather stats not reported by the system, to explore the fs/vol layout, for educational purposes, and so on. Nico --
On Wed, Oct 19, 2011 at 10:13:56AM -0400, David Magda wrote:> On Wed, October 19, 2011 08:15, Pawel Jakub Dawidek wrote: > > > Fsck can only fix known file system inconsistencies in file system > > structures. Because there is no atomicity of operations in UFS and other > > file systems it is possible that when you remove a file, your system can > > crash between removing directory entry and freeing inode or blocks. > > This is expected with UFS, that''s why there is fsck to verify that no > > such thing happend. > > Slightly OT, but this non-atomic delay between meta-data updates and > writes to the disk is exploited by "soft updates" with FreeBSD''s UFS: > > http://www.freebsd.org/doc/en/books/handbook/configtuning-disk.html#SOFT-UPDATES > > It may be of some interest to the file system geeks on the list.Well, soft-updates thanks to careful ordering of operation allow to mount file system even in inconsistent state and run fsck in background, as the only inconsistencies are resource leaks - directory entry will never point at unallocated inode and an inode will never point at unallocated block, etc. This is still not atomic. With recent versions of FreeBSD, soft-updates were extended to journal those resource leaks, so background fsck is not needed anymore. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20111019/cd67131c/attachment-0001.bin>
Paul Kraus wrote:>> My main reasons for using zfs are pretty basic compared to some here > > What are they ? (the reasons for using ZFS)All technical reasons aside, I can tell you one huge reason I love ZFS, and it''s one that is clearly being completely ignored by btrfs: ease of use. The zfs command set is wonderful and very English-like (for a unix command set). It''s simple, clear, and logical. The grammar makes sense. I almost never have to refer to the man page. The last time I looked, the commands for btrfs were the usual incomprehensible gibberish with a thousand squiggles and numbers. It looked like a real freaking headache, to be honest. With zfs I can do really complex operations off the top of my head. It''s very clear to me that someone spent a lot of time making the commands work that way, and that the commands have a lot of intelligence behind the scenes. After many years spent poring over manuals for SVM and VxFS and writing meter-long commands with a thousand fiddly little parameters, it is SUCH a relief. It''s a pleasure to use. Like swimming in crystal clear water after years in murky soup. I haven''t used btrfs. But just from what I''ve heard, I have two suggestions for it: 1) Change the stupid name. "Btrfs" is neither a pronounceable word nor a good acromyn. "ButterFS" sounds stupid. Just call it "BFS" or something, please. 2) After renaming it BFS, steal the entire ZFS command set and change the "z"s to "b"s. Have ''bpool'' and ''bfs'' commands, and exactly copy their syntax. The source code underneath may be copyrighted, but I doubt you can copyright command names, and probably even Oracle wouldn''t be petty enough to raise a legal stink (though you never now with them). It would be nice if, for once, people writing software actually took usability into account, and the ulcers of sysadmins. Kudos to ZFS for bucking the horrible trend of impossibly complex syntax. ---------- Learn more about Merchant Link at www.merchantlink.com. THIS MESSAGE IS CONFIDENTIAL. This e-mail message and any attachments are proprietary and confidential information intended only for the use of the recipient(s) named above. If you are not the intended recipient, you may not print, distribute, or copy this message or any attachments. If you have received this communication in error, please notify the sender by return e-mail and delete this message and any attachments from your computer.
On Fri, Nov 11, 2011 at 1:39 PM, Linder, Doug <Doug.Linder at merchantlink.com> wrote:> Paul Kraus wrote: > >>> My main reasons for using zfs are pretty basic compared to some here >> >> What are they ? (the reasons for using ZFS) > > All technical reasons aside, I can tell you one huge reason I love ZFS, and it''s one that is clearly being completely ignored by btrfs: ease of use. ?The zfs command set is wonderful and very English-like (for a unix command set). ?It''s simple, clear, and logical. ?The grammar makes sense. ?I almost never have to refer to the man page. ?The last time I looked, the commands for btrfs were the usual incomprehensible gibberish with a thousand squiggles and numbers. ?It looked like a real freaking headache, to be honest. >The command syntax paradigm of zfs (command sub-command object parameters) is not unique to zfs, but seems to have been the "way of doing things" in Solaris 10. The _new_ functions of Solaris 10 were all this way (to the best of my knowledge)... zonecfg zoneadm svcadm svccfg ... and many others are written this way. To boot the zone named foo you use the command "zoneadm -z foo boot", to disable the service named sendmail, "svcadm disable sendmail", etc. Someone at Sun was thinking :-) -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Fri, Nov 11, 2011 at 4:27 PM, Paul Kraus <paul at kraus-haus.org> wrote:> The command syntax paradigm of zfs (command sub-command object > parameters) is not unique to zfs, but seems to have been the "way of > doing things" in Solaris 10. The _new_ functions of Solaris 10 were > all this way (to the best of my knowledge)... > > zonecfg > zoneadm > svcadm > svccfg > ... and many others are written this way. To boot the zone named foo > you use the command "zoneadm -z foo boot", to disable the service > named sendmail, "svcadm disable sendmail", etc. Someone at Sun was > thinking :-)I''d have preferred "zoneadm boot foo". The -z zone command thing is a bit of a sore point, IMO. But yes, all these new *adm(1M( and *cfg(1M) commands in S10 are wonderful, especially when compared to past and present alternatives in the Unix/Linux world. Nico --
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Linder, Doug > > All technical reasons aside, I can tell you one huge reason I love ZFS,and it''s> one that is clearly being completely ignored by btrfs: ease of use. Thezfs> command set is wonderful and very English-like (for a unix command set). > It''s simple, clear, and logical. The grammar makes sense. I almost neverhave> to refer to the man page. The last time I looked, the commands for btrfs > were the usual incomprehensible gibberish with a thousand squiggles and > numbers. It looked like a real freaking headache, to be honest.Maybe you''re doing different things from me. btrfs subvol create, delete, snapshot, mkfs, ... For me, both ZFS and BTRFS have "normal" user interfaces and/or command syntax.> 1) Change the stupid name. "Btrfs" is neither a pronounceable word nor a > good acromyn. "ButterFS" sounds stupid. Just call it "BFS" or something, > please.LOL. Well, for what it''s worth, there are three common pronunciations for btrfs. Butterfs, Betterfs, and B-Tree FS (because it''s based on b-trees.) Check wikipedia. (This isn''t really true, but I like to joke, after saying something like that, I wrote the wikipedia page just now.) ;-) Speaking of which. zettabyte filesystem. ;-) Is it just a dumb filesystem with a lot of address bits? Or is it something that offers functionality that other filesystems don''t have? .... ;-)
On Sat, Nov 12, 2011 at 9:25 AM, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Linder, Doug >> >> All technical reasons aside, I can tell you one huge reason I love ZFS, > and it''s >> one that is clearly being completely ignored by btrfs: ease of use. ?The > zfs >> command set is wonderful and very English-like (for a unix command set). >> It''s simple, clear, and logical. ?The grammar makes sense. ?I almost never > have >> to refer to the man page. ?The last time I looked, the commands for btrfs >> were the usual incomprehensible gibberish with a thousand squiggles and >> numbers. ?It looked like a real freaking headache, to be honest. > > Maybe you''re doing different things from me. ?btrfs subvol create, delete, > snapshot, mkfs, ... > For me, both ZFS and BTRFS have "normal" user interfaces and/or command > syntax.the gramatically-correct syntax would be "btrfs create subvolume", but the current tool/syntax is an improvement over the old ones (btrfsctl, btrfs-vol, etc).> > >> 1) Change the stupid name. ? "Btrfs" is neither a pronounceable word nor a >> good acromyn. ?"ButterFS" sounds stupid. ?Just call it "BFS" or something, >> please. > > LOL. ?Well, for what it''s worth, there are three common pronunciations for > btrfs. ?Butterfs, Betterfs, and B-Tree FS (because it''s based on b-trees.)... as long as you don''t call it BiTterly bRoken FS :) -- Fajar
> LOL. Well, for what it''s worth, there are three common pronunciations for > btrfs. Butterfs, Betterfs, and B-Tree FS (because it''s based on b-trees.) > Check wikipedia. (This isn''t really true, but I like to joke, after > saying something like that, I wrote the wikipedia page just now.) ;-)You forget Broken Tree File System, Badly Trashed File System, etc. Follow the newsgroup and you''ll get plenty more ideas for names ;-)
On 11/13/2011 05:18 PM, Nomen Nescio wrote:>> LOL. Well, for what it''s worth, there are three common pronunciations for >> btrfs. Butterfs, Betterfs, and B-Tree FS (because it''s based on b-trees.) >> Check wikipedia. (This isn''t really true, but I like to joke, after >> saying something like that, I wrote the wikipedia page just now.) ;-) > > You forget Broken Tree File System, Badly Trashed File System, etc. Follow > the newsgroup and you''ll get plenty more ideas for names ;-)Why not give some tolerance to Btrfs? You can kindly drop an email to its mail list for any issue you are not satisfied with. Satirize or lampoon does not make sense to any open source project. Thanks, -Jeff> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jeff Liu > > Why not give some tolerance to Btrfs? You can kindly drop an email to > its mail list for any issue you are not satisfied with. > Satirize or lampoon does not make sense to any open source project.Agreed. Not only that, but probably most people who use zfs would also have interest in btrfs and actually like it. It''s not like posting an anti-MS email on a pro-Apple mailing list or something... ZFS is more mature, btrfs is comparitively lacking some important features, but the same is true in both directions. Each is better in its own way. But for most things, right now, zfs is better in most ways, due to maturity.
On Fri, Nov 11, 2011 at 9:25 PM, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:> LOL. ?Well, for what it''s worth, there are three common pronunciations for > btrfs. ?Butterfs, Betterfs, and B-Tree FS (because it''s based on b-trees.) > Check wikipedia. ?(This isn''t really true, but I like to joke, after saying > something like that, I wrote the wikipedia page just now.) ? ;-)Is it really B-Tree based? Apple''s HFS+ is B-Tree based and falls apart (in terms of performance) when you get too many objects in one FS, which is specifically what drove us to ZFS. We had 4.5 TB of data in about 60 million files/directories on an Apple X-Serve and X-RAID and the overall response was terrible. We moved the data to ZFS and the performance was limited by the Windows client at that point.> Speaking of which. zettabyte filesystem. ? ;-) ?Is it just a dumb filesystem > with a lot of address bits? ?Or is it something that offers functionality > that other filesystems don''t have? ? .... ? ;-)The stories I have heard indicate that the name came after the TLA. "zfs" came first and "zettabyte" later. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Mon, Nov 14, 2011 at 14:40, Paul Kraus <paul at kraus-haus.org> wrote:> On Fri, Nov 11, 2011 at 9:25 PM, Edward Ned Harvey > <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote: > >> LOL. ?Well, for what it''s worth, there are three common pronunciations for >> btrfs. ?Butterfs, Betterfs, and B-Tree FS (because it''s based on b-trees.) >> Check wikipedia. ?(This isn''t really true, but I like to joke, after saying >> something like that, I wrote the wikipedia page just now.) ? ;-) > > Is it really B-Tree based? Apple''s HFS+ is B-Tree based and falls > apart (in terms of performance) when you get too many objects in one > FS, which is specifically what drove us to ZFS. We had 4.5 TB of data > in about 60 million files/directories on an Apple X-Serve and X-RAID > and the overall response was terrible. We moved the data to ZFS and > the performance was limited by the Windows client at that point. > >> Speaking of which. zettabyte filesystem. ? ;-) ?Is it just a dumb filesystem >> with a lot of address bits? ?Or is it something that offers functionality >> that other filesystems don''t have? ? .... ? ;-) > > The stories I have heard indicate that the name came after the TLA. > "zfs" came first and "zettabyte" later.as Jeff told it (IIRC), the "expanded" version of zfs underwent several changes during the development phase, until it was decided one day to attach none of them to "zfs" and just have it be "the last word in filesystems". (perhaps he even replied to a similar message on this list ... check the archives :-) regards -- Michael Schuster http://recursiveramblings.wordpress.com/
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Paul Kraus > > Is it really B-Tree based? Apple''s HFS+ is B-Tree based and falls > apart (in terms of performance) when you get too many objects in one > FS, which is specifically what drove us to ZFS. We had 4.5 TB of dataAccording to wikipedia, btrfs is a b-tree. I know in ZFS, the DDT is an AVL tree, but what about the rest of the filesystem? B-trees should be logarithmic time, which is the best O() you can possibly achieve. So if HFS+ is dog slow, it''s an implementation detail and not a general fault of b-trees.
On Mon, Nov 14, 2011 at 8:33 AM, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Paul Kraus >> >> Is it really B-Tree based? Apple''s HFS+ is B-Tree based and falls >> apart (in terms of performance) when you get too many objects in one >> FS, which is specifically what drove us to ZFS. We had 4.5 TB of data > > According to wikipedia, btrfs is a b-tree. > I know in ZFS, the DDT is an AVL tree, but what about the rest of the > filesystem?ZFS directories are hashed. Aside from this, the filesystem (and volume) have a tree structure, but that''s not what''s interesting here -- what''s interesting is how directories are indexed.> B-trees should be logarithmic time, which is the best O() you can possibly > achieve. ?So if HFS+ is dog slow, it''s an implementation detail and not a > general fault of b-trees.Hash tables can do much better than O(log N) for searching: O(1) for best case, and O(n) for the worst case. Also, b-trees are O(log_b N), where b is the number of entries per-node. 6e7 entries/directory probably works out to 2-5 reads (assuming 0% cache hit rate) depending on the size of each directory entry and the size of the b-tree blocks. Nico --
> From: Nico Williams [mailto:nico at cryptonector.com] > > > B-trees should be logarithmic time, which is the best O() you can possibly > > achieve. So if HFS+ is dog slow, it''s an implementation detail and not a > > general fault of b-trees. > > Hash tables can do much better than O(log N) for searching: O(1) for > best case, and O(n) for the worst case.You''re right to challenge me saying O(log) is the best you can possibly achieve - The assumption I was making is that the worst case is what matters, and that''s not always true. Which is better? An algorithm whose best case and worse case are both O(log n), or an algorithm that takes O(1) in the best case and O(n) in the worst case? The answer is subjective - and the question might be completely irrelevant, as it doesn''t necessarily relate to any of the filesystems we''re talking about anyway. ;-)