What would a version FS buy us that cron+ zfs snapshots doesn''t? -- Regards, Jeremy
On Thu, Oct 05, 2006 at 10:18:18PM +0800, Jeremy Teo wrote:> What would a version FS buy us that cron+ zfs snapshots doesn''t?Instant file copy. with cron you could make multiple changes between snapshot runs. -brian
On 10/5/06, Jeremy Teo <white.wristband at gmail.com> wrote:> What would a version FS buy us that cron+ zfs snapshots doesn''t?Finer granularity; no chance of missing a change. TOPS-20 did this, and it was *tremendously* useful . Snapshots, source control, and other alternatives aren''t, in fact, alternatives. They''re useful in and of themselves, very useful indeed, but they don''t address the same needs as versioning. -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote:> On 10/5/06, Jeremy Teo <white.wristband at gmail.com> wrote: > >What would a version FS buy us that cron+ zfs snapshots doesn''t? > > Finer granularity; no chance of missing a change. > > TOPS-20 did this, and it was *tremendously* useful . Snapshots, source > control, and other alternatives aren''t, in fact, alternatives. > They''re useful in and of themselves, very useful indeed, but they > don''t address the same needs as versioning.VMS _still_ does this, and it''s one of my favorite features of the OS. -brian
Brian Hechinger wrote:> On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote: > >> On 10/5/06, Jeremy Teo <white.wristband at gmail.com> wrote: >> >>> What would a version FS buy us that cron+ zfs snapshots doesn''t? >>> >> Finer granularity; no chance of missing a change. >> >> TOPS-20 did this, and it was *tremendously* useful . Snapshots, source >> control, and other alternatives aren''t, in fact, alternatives. >> They''re useful in and of themselves, very useful indeed, but they >> don''t address the same needs as versioning. >> > > VMS _still_ does this, and it''s one of my favorite features of the OS. > > -brian > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >I too remember VMS and the FS versioning feature. Doing versioning at the file-system layer allows block-level changes to be stored, so it doesn''t consume enormous amounts of extra space. In fact, it''s more efficient than any versioning software (CVS, SVN, teamware, etc) for storing versions. However, there are three BIG drawbacks for using versioning in your FS (that assumes that it is a tunable parameter and can be turned off for a FS when not desired): (1) File listing symantics become a bit of a mess. VMS stores versions as <filename>;<version> That is, it uses the semi-colon as a divider. Now, I''m not at all sure how we can make ZFS POSIX-compliant and still do something like this. Versioning filesystems tend to be a complete mess - it is hard to present usable information about which versions are available, and at the same time keep things clean. Even keeping versions in a hidden dir (say .zfs_versions) in each directory still leaves that directory filled with a huge mess of files. (2) File Versioning is no replacement for source code control, as you miss all the extra features (tagging, branching, comments, etc) that go with a file version check-in. (3) Many apps continuously save either temp copies or actual copies of the file you are working on. This leads to a version explosion, where you end up with 100s of versions of a commonly used file. This tends to be worse than useless, as people have an incredibly hard time figuring out which (older) version they might actually want to look at. And, this problem ISN''T ever going to go away, as it would require apps to understand filesystem features for ZFS, which isn''t going to happen. I''d discourage File Versioning at this late stage in UNIX. Source Code control systems fulfill the need for serious uses, and casual usage is obviated by the mantra of "save early, save often" that has been beaten into the userbase. Trying to change that is a recipe for disaster. Maybe when we change filesystems to a DB, we can look at automatic versioning again, as a DB can mitigate #1 and #3 issues above, and can actually implement #2 completely. OracleFS, here I come. (<groan>) -Erik
Brian Hechinger wrote:> On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote: >> On 10/5/06, Jeremy Teo <white.wristband at gmail.com> wrote: >>> What would a version FS buy us that cron+ zfs snapshots doesn''t? >> Finer granularity; no chance of missing a change. >> >> TOPS-20 did this, and it was *tremendously* useful . Snapshots, source >> control, and other alternatives aren''t, in fact, alternatives. >> They''re useful in and of themselves, very useful indeed, but they >> don''t address the same needs as versioning. > > VMS _still_ does this, and it''s one of my favorite features of the OS.It is a real PITA if you are unfortunate enough to use quotas :-( -- richard
>Brian Hechinger wrote: >> On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote: >>> On 10/5/06, Jeremy Teo <white.wristband at gmail.com> wrote: >>>> What would a version FS buy us that cron+ zfs snapshots doesn''t? >>> Finer granularity; no chance of missing a change. >>> >>> TOPS-20 did this, and it was *tremendously* useful . Snapshots, source >>> control, and other alternatives aren''t, in fact, alternatives. >>> They''re useful in and of themselves, very useful indeed, but they >>> don''t address the same needs as versioning. >> >> VMS _still_ does this, and it''s one of my favorite features of the OS. > >It is a real PITA if you are unfortunate enough to use quotas :-(It''s one of the things I hated about VMS; so I quickly wrote a script which on logout purged all extra copies and renamed all files back to *;1. Casper
On 10/5/06, Erik Trimble <Erik.Trimble at sun.com> wrote:> Doing versioning at the file-system layer allows block-level changes to > be stored, so it doesn''t consume enormous amounts of extra space. In > fact, it''s more efficient than any versioning software (CVS, SVN, > teamware, etc) for storing versions.Comparing to cvs/svn misses the point; as I said, they address comletely different needs.> However, there are three BIG drawbacks for using versioning in your FS > (that assumes that it is a tunable parameter and can be turned off for a > FS when not desired): > > (1) File listing symantics become a bit of a mess. VMS stores versions > as <filename>;<version> That is, it uses the semi-colon as a > divider. Now, I''m not at all sure how we can make ZFS POSIX-compliant > and still do something like this. Versioning filesystems tend to be a > complete mess - it is hard to present usable information about which > versions are available, and at the same time keep things clean. Even > keeping versions in a hidden dir (say .zfs_versions) in each directory > still leaves that directory filled with a huge mess of files."Complete mess" is certainly not my experience (I worked with TOPS-20 from 1977 to 1985 and VMS from 1979 to 1985). The key is that you need to *clean up*; specifically, you need to use the command which deletes all but the most recent copy of each file in a directory at the end of pretty much each work session. It''s trivial to present information on which versions are available; you simply list each one as a file, which has the date info any file has, and the version number.> (2) File Versioning is no replacement for source code control, as you > miss all the extra features (tagging, branching, comments, etc) that go > with a file version check-in.It''s very definitely not an alternative or replacement for source code control, no. It provides a very useful feature to use *alongside* source control. Source code control is also not a replacement for file versioning (I end up creating spare copies of files with funny names for things I''d otherwise get from versioning; and I end up losing time through not having through to create such a file, whereas versioning is automatic).> (3) Many apps continuously save either temp copies or actual copies of > the file you are working on. This leads to a version explosion, where > you end up with 100s of versions of a commonly used file. This tends to > be worse than useless, as people have an incredibly hard time figuring > out which (older) version they might actually want to look at. And, > this problem ISN''T ever going to go away, as it would require apps to > understand filesystem features for ZFS, which isn''t going to happen.Files treated that way are often deleted at the end of the session automatically, so no problem there. Or else they''ll be cleaned up when you do your session-end cleanup. What the heck was that command on TOPS-20 anyway? Maybe "purge"? Sorry, 20-year-old memories are fuzzy on some details. File versioning worked a lot better on TOPS-20 than on VMS, as I remember it. The facility looked the same, but actually working with it was much cleaner and easier. Making it somewhat controllable would be useful. Starting with maybe an inheritable default, so some directory trees could be set not to version.> I''d discourage File Versioning at this late stage in UNIX. Source Code > control systems fulfill the need for serious uses, and casual usage is > obviated by the mantra of "save early, save often" that has been beaten > into the userbase. Trying to change that is a recipe for disaster.Actually, "save early and often" is exactly why versioning is important. If you discover you''ve gone down a blind alley in some code, it makes it easy to get back to the earlier spots. This, in my experience, happens at a detail level where you won''t (in fact can''t) be doing checkins to version control. -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
On Thu, Oct 05, 2006 at 04:08:13PM -0700, David Dyer-Bennet wrote:> > when you do your session-end cleanup. What the heck was that command > on TOPS-20 anyway? Maybe "purge"? Sorry, 20-year-old memories are > fuzzy on some details.It''s PURGE under VMS, so knowing DEC, it was named PURGE under TOPS-20 as well. Hmmmm, gotta get the DECsystem-2020 powered up one of these days. -brian
On Thu, 2006-10-05 at 16:08 -0700, David Dyer-Bennet wrote:> On 10/5/06, Erik Trimble <Erik.Trimble at sun.com> wrote: > > > Doing versioning at the file-system layer allows block-level changes to > > be stored, so it doesn''t consume enormous amounts of extra space. In > > fact, it''s more efficient than any versioning software (CVS, SVN, > > teamware, etc) for storing versions. > > Comparing to cvs/svn misses the point; as I said, they address > comletely different needs. >I was making a general point, to make it clear FS versioning isn''t a disk pig.> > However, there are three BIG drawbacks for using versioning in your FS > > (that assumes that it is a tunable parameter and can be turned off for a > > FS when not desired): > > > > (1) File listing symantics become a bit of a mess. VMS stores versions > > as <filename>;<version> That is, it uses the semi-colon as a > > divider. Now, I''m not at all sure how we can make ZFS POSIX-compliant > > and still do something like this. Versioning filesystems tend to be a > > complete mess - it is hard to present usable information about which > > versions are available, and at the same time keep things clean. Even > > keeping versions in a hidden dir (say .zfs_versions) in each directory > > still leaves that directory filled with a huge mess of files. > > "Complete mess" is certainly not my experience (I worked with TOPS-20 > from 1977 to 1985 and VMS from 1979 to 1985). The key is that you > need to *clean up*; specifically, you need to use the command which > deletes all but the most recent copy of each file in a directory at > the end of pretty much each work session. > > It''s trivial to present information on which versions are available; > you simply list each one as a file, which has the date info any file > has, and the version number. >I stand by the "complete mess" statement. _You_ have trained yourself to get around the problem, by eliminating most of the reason for file versioning - you delete everything when you log out. A normal user (or even, most scripts) aren''t going to do this. Indeed, I would argue that it makes no sense to implement versioning if all you are going to use it for is on a per-session basis. And, try thinking of a directory with a few dozen files in it, each with a dozen or more versions. that''s hideous, from a normal user standpoint. VMS''s implementation of <filename>;<version> is completely unwieldy if you have more than a few files, or more than a few versions. And, in modern typical use, it is _highly_ likely both will be true.> > (2) File Versioning is no replacement for source code control, as you > > miss all the extra features (tagging, branching, comments, etc) that go > > with a file version check-in. > > It''s very definitely not an alternative or replacement for source code > control, no. It provides a very useful feature to use *alongside* > source control. Source code control is also not a replacement for > file versioning (I end up creating spare copies of files with funny > names for things I''d otherwise get from versioning; and I end up > losing time through not having through to create such a file, whereas > versioning is automatic).File versioning would certainly be nice in many cases, but I think it''s better implemented in the application (think of Photoshop''s unlimited undo feature, though better than that), than in the FS, where it creates a whole lot of clutter and confusion real fast, where it is only specifically useful for a very limited selection of files.> > (3) Many apps continuously save either temp copies or actual copies of > > the file you are working on. This leads to a version explosion, where > > you end up with 100s of versions of a commonly used file. This tends to > > be worse than useless, as people have an incredibly hard time figuring > > out which (older) version they might actually want to look at. And, > > this problem ISN''T ever going to go away, as it would require apps to > > understand filesystem features for ZFS, which isn''t going to happen. > > Files treated that way are often deleted at the end of the session > automatically, so no problem there. Or else they''ll be cleaned up > when you do your session-end cleanup. What the heck was that command > on TOPS-20 anyway? Maybe "purge"? Sorry, 20-year-old memories are > fuzzy on some details.So, here''s a question: if I delete file X;1, do I delete X;x ? That is, do I delete all versions of a file when I delete the actual file? what about deleting a (non-head) version? And, exactly how many different files have to be cleaned up when you logout? How does this get configured? Who does the configuring? What if I _want_ versions of some files, but not the others? And, what about network-sharing? For non-interactive use? (i.e. via SAMBA, or other apps where you''re not looking at the FS via a command prompt?)> File versioning worked a lot better on TOPS-20 than on VMS, as I > remember it. The facility looked the same, but actually working with > it was much cleaner and easier. > > Making it somewhat controllable would be useful. Starting with maybe > an inheritable default, so some directory trees could be set not to > version. > > > I''d discourage File Versioning at this late stage in UNIX. Source Code > > control systems fulfill the need for serious uses, and casual usage is > > obviated by the mantra of "save early, save often" that has been beaten > > into the userbase. Trying to change that is a recipe for disaster. > > Actually, "save early and often" is exactly why versioning is > important. If you discover you''ve gone down a blind alley in some > code, it makes it easy to get back to the earlier spots. This, in my > experience, happens at a detail level where you won''t (in fact can''t) > be doing checkins to version control.Then, IMHO, you aren''t using VC properly. File Versioning should NEVER, EVER, EVER be used for anything around VC. It might be useful for places VC isn''t traditionally use (Office documents, small scripts, etc.), but the example you provide is one which is easily solved by use of frequent checkins to VC - indeed, that''s what VC is supposed to be for. File versioning is really only useful when we can hide the versioning mess from the end-user, and yet provide them with some reasonable mechanism for accessing the file versions if need be. And we keep versions around, period. I don''t see that as being possible using the traditional UNIX/POSIX filesystem layout. Like I said before, maybe when the FS becomes a RDBMS, but even then... -- Erik Trimble Java System Support Mailstop: usca14-102 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
On Thu, Oct 05, 2006 at 04:40:09PM -0700, Erik Trimble wrote:> > So, here''s a question: if I delete file X;1, do I delete X;x ? That > is, do I delete all versions of a file when I delete the actual file? > what about deleting a (non-head) version? And, exactly how manyUnder VMS at least, that is entirely up to you, you can delete X;1, X;2 or X;* if you so desire.> different files have to be cleaned up when you logout? How does this > get configured? Who does the configuring? What if I _want_ versions of > some files, but not the others?That is where it gets tricky. Under the DEC !UNIX OSes, file versioning was just a way of life since it was on all the time for everyone, period. Trying to apply that to UNIX, where file versionioning previously didn''t exist? Not so easy. ;)> And, what about network-sharing? For non-interactive use? (i.e. via > SAMBA, or other apps where you''re not looking at the FS via a command > prompt?)A way to not allow those access to the versioning system sounds reasonable.> File versioning is really only useful when we can hide the versioning > mess from the end-user, and yet provide them with some reasonable > mechanism for accessing the file versions if need be. And we keep > versions around, period. I don''t see that as being possible using the > traditional UNIX/POSIX filesystem layout. Like I said before, maybe > when the FS becomes a RDBMS, but even then...The way digital did it is spot on, however, the use of ; is a problem once you apply UNIX/POSIX filesystem requirements to it. It may not work. On the other hand ODS *is* an RDBMS really, so................. ;) -brian
A lot of this we''re clearly not going to agree on and I''ve said what I had to contribute. There''s one remaining point, though... On 10/5/06, Erik Trimble <Erik.Trimble at sun.com> wrote:> On Thu, 2006-10-05 at 16:08 -0700, David Dyer-Bennet wrote:> > Actually, "save early and often" is exactly why versioning is > > important. If you discover you''ve gone down a blind alley in some > > code, it makes it easy to get back to the earlier spots. This, in my > > experience, happens at a detail level where you won''t (in fact can''t) > > be doing checkins to version control. > > Then, IMHO, you aren''t using VC properly. File Versioning should NEVER, > EVER, EVER be used for anything around VC. It might be useful for > places VC isn''t traditionally use (Office documents, small scripts, > etc.), but the example you provide is one which is easily solved by use > of frequent checkins to VC - indeed, that''s what VC is supposed to be > for.No, any sane VC protocol must specifically forbid the checkin of the stuff I want versioning (or file copies or whatever) for. It''s partial changes, probably doesn''t compile, nearly certainly doesn''t work. This level of work product *cannot* be committed to the repository. Well, unless you have a better VCS than CVS or SVN. I first met this as an obscure, buggy, expensive, short-lived SUN product, actually; I believe it was called NSE, the Network Software Engineering environment. And I used one commercial product (written by an NSE user after NSE was discontinued) that supported the feature needed. Both of these had what I might call a two-level VCS. Each developer had one or more private repositories (the way people have working directories now with SVN), but you had full VCS checkin/checkout (and compare and rollback and so forth) within that. Then, when your code was ready for the repository, you did a "commit" step that pushed it up from your private repository to the public repository. One of the big problems with CVS and SVN and Microsoft SourceSafe is that you don''t have the benefits of version control most of the time, because all commits are *public*. -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
On Thu, 2006-10-05 at 17:25 -0700, David Dyer-Bennet wrote:> > Well, unless you have a better VCS than CVS or SVN. I first met this > as an obscure, buggy, expensive, short-lived SUN product, actually; I > believe it was called NSE, the Network Software Engineering > environment. And I used one commercial product (written by an NSE > user after NSE was discontinued) that supported the feature needed. > Both of these had what I might call a two-level VCS. Each developer > had one or more private repositories (the way people have working > directories now with SVN), but you had full VCS checkin/checkout (and > compare and rollback and so forth) within that. Then, when your code > was ready for the repository, you did a "commit" step that pushed it > up from your private repository to the public repository. > > One of the big problems with CVS and SVN and Microsoft SourceSafe is > that you don''t have the benefits of version control most of the time, > because all commits are *public*.Just FYI: that buggy, expensive, short-lived SUN product eventually became "Teamware". Check out (no pun intended) Mercurial and similar products, which have similar behavior to Teamware - each developer has a "workspace" for code, and you can do VC inside that workspace without having to do a putback into the "main" tree. That way, you do frequent VC checkins, but don''t putback to the main tree until things actually work. Or, at least, you _claim_ them to work. :-) -- Erik Trimble Java System Support Mailstop: usca14-102 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800)
On 10/6/06, David Dyer-Bennet <dd-b at dd-b.net> wrote:> One of the big problems with CVS and SVN and Microsoft SourceSafe is > that you don''t have the benefits of version control most of the time, > because all commits are *public*.David, That is exactly what "branch" is for in CVS and SVN. Dunno much about M$ SourceSafe. -- Just me, Wire ...
On Oct 5, 2006, at 5:40 PM, Erik Trimble wrote:> And, try thinking of a directory with a few dozen files in it, each > with > a dozen or more versions. that''s hideous, from a normal user > standpoint. > VMS''s implementation of <filename>;<version> is completely unwieldy if > you have more than a few files,No it is not. I worked for DEC and used VMS up through 1993 and never found it unwieldy. Even if I had 100 versions of one file. It is 1) what you are used to 2) what you are trained to do that makes it unwieldy or not I find the "unix" conventions of storying a file and file~ or any of the other myriad billion ways of doing it that each app has invented to be much more unwieldy. Yes, you have to "purge" your directories once in a while. The same way you have to clean up any file "mess" you make on you computer (download area, desktop, etc).> or more than a few versions. And, in > modern typical use, it is _highly_ likely both will be true.So what if you have more than a few versions of a file. Beauty is in the eye of the beholder, and just because YOU find it unwieldy does not make it so for the general user or anyone else. I would LOVE to have a VMS style (sorry, my TOPS-20 usage was very little so I have no remembrance of it there) file versioning built in to the system. "save early, save often" ONLY makes sense with a file versioning system, or else you lose previous edits if you decide you have gone down a wrong alley. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061005/7413e5ff/attachment.bin>
On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet <dd-b at dd-b.net> wrote:> Well, unless you have a better VCS than CVS or SVN. I first met this > as an obscure, buggy, expensive, short-lived SUN product, actually; I > believe it was called NSE, the Network Software Engineering > environment. And I used one commercial product (written by an NSE > user after NSE was discontinued) that supported the feature needed. > Both of these had what I might call a two-level VCS. Each developer > had one or more private repositories (the way people have working > directories now with SVN), but you had full VCS checkin/checkout (and > compare and rollback and so forth) within that. Then, when your code > was ready for the repository, you did a "commit" step that pushed it > up from your private repository to the public repository.I wouldn''t call that 2-level, it''s simply branching, and all VCS/SCM systems have this, even rcs. Some expose all changes in the private branch to everyone (modulo protection mechanisms), some only expose changes that are "put back" (to use Sun teamware terminology). Both CVS and SVN have this. -frank
On Oct 5, 2006, at 7:47 PM, Chad Leigh -- Shire.Net LLC wrote:> I find the "unix" conventions of storying a file and file~ or any > of the other myriad billion ways of doing it that each app has > invented to be much more unwieldy.sorry, "storing" a file, not "storying" --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061005/0e821b93/attachment.bin>
On Oct 5, 2006, at 6:48 PM, Frank Cusack wrote:> On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet <dd-b at dd- > b.net> wrote: >> Well, unless you have a better VCS than CVS or SVN. I first met this >> as an obscure, buggy, expensive, short-lived SUN product, actually; I >> believe it was called NSE, the Network Software Engineering >> environment. And I used one commercial product (written by an NSE >> user after NSE was discontinued) that supported the feature needed. >> Both of these had what I might call a two-level VCS. Each developer >> had one or more private repositories (the way people have working >> directories now with SVN), but you had full VCS checkin/checkout (and >> compare and rollback and so forth) within that. Then, when your code >> was ready for the repository, you did a "commit" step that pushed it >> up from your private repository to the public repository. > > I wouldn''t call that 2-level, it''s simply branching, and all VCS/SCM > systems have this, even rcs. Some expose all changes in the private > branch to everyone (modulo protection mechanisms), some only expose > changes > that are "put back" (to use Sun teamware terminology). > > Both CVS and SVN have this. > > -frankDavid is describing a different behavior. Even a branch is still ultimately on the single, master server with CVS, SVN, and more other versioning systems. Teamware, and a few other versioning systems, let you have more arbitrary parent and child relationships. In Teamware, you can create a project gate, have a variety of people check code into this project gate, and do all of this without ever touching the parent gate. When the project is done, you then checkin the changes to the project gate''s parent. The gate parent may itself be a child of some other gate, making the above project gate a grand-child of some higher gate. You can also change a child''s parent, so you could in fact skip the parent and go straight to the "grand" parent if you wish. For that matter, you can re-parent the "parent" to sync with the former child if you had some reason to do so. A Teamware putback really isn''t a matter of exposure. Until you do a putback to the parent, the code is not physically (or even logically) present in the parent. Teamware''s biggest drawbacks are a lack of change sets (like how Subversion tracks simultaneous, individual changes as a group) and that it only runs via file access (no network protocol, filesystem or NFS only.) Mercurial seems to be similar to Teamware in terms of parenting, but with network protocol support builtin. Which is presumably OpenSolaris will be using it. ckl
On October 5, 2006 7:02:29 PM -0700 Chad Lewis <Chad.Lewis at Sun.COM> wrote:> > On Oct 5, 2006, at 6:48 PM, Frank Cusack wrote: > >> On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet <dd-b at dd- >> b.net> wrote: >>> Well, unless you have a better VCS than CVS or SVN. I first met this >>> as an obscure, buggy, expensive, short-lived SUN product, actually; I >>> believe it was called NSE, the Network Software Engineering >>> environment. And I used one commercial product (written by an NSE >>> user after NSE was discontinued) that supported the feature needed. >>> Both of these had what I might call a two-level VCS. Each developer >>> had one or more private repositories (the way people have working >>> directories now with SVN), but you had full VCS checkin/checkout (and >>> compare and rollback and so forth) within that. Then, when your code >>> was ready for the repository, you did a "commit" step that pushed it >>> up from your private repository to the public repository. >> >> I wouldn''t call that 2-level, it''s simply branching, and all VCS/SCM >> systems have this, even rcs. Some expose all changes in the private >> branch to everyone (modulo protection mechanisms), some only expose >> changes >> that are "put back" (to use Sun teamware terminology). >> >> Both CVS and SVN have this. >> >> -frank > > > David is describing a different behavior. Even a branch is still ultimately on the single, > master server with CVS, SVN, and more other versioning systems. Teamware, and a few > other versioning systems, let you have more arbitrary parent and child relationships.How are branches not arbitrary parent and child relationships? (except in cvs where branches pretty much suck but still it''s close)> A Teamware putback really isn''t a matter of exposure. Until you do a putback to the > parent, the code is not physically (or even logically) present in the parent.That is what I meant by exposure -- whether or not "private" code is available to others. But how does that matter? The difference between teamware (or git or bk or mercurial) and cvs (or svn or p4) here is that everyone can see all private branches and everyone can see each change in a private branch (again, modulo protections). That doesn''t matter to the main branch. The code is not in the main branch logically (physically doesn''t matter) until you integrate or putback. My point is that having a private branch, where you can check in changes to your heart''s content, and re-branch at will, and don''t have to follow "must compile" rules, can be handled by most any VCS. Which is what David was saying is needed for it to replace the functionality of a versioned filesystem. Some of them (eg p4) handle branching much better than others, making this easier, but all of them can do it. Wow, I''m surprised teamware doesn''t have changelists or a similar concept. Talk about stone ages. :-) -frank
I seem to remember that one could configure the max. number of versions VMS would retain for you on a per-file basis - setting this to 1 would de facto turn off versioning. IFF versioning were implemented in ZFS, AND was made configurable on a per-file basis (everything else wouldn''t make any sense at all, IMO), the default could be set to 1, to avoid the various horror scenarios that have been painted here, and people could increase the number of versions they want for those files that need it. cheers Michael Chad Leigh -- Shire.Net LLC wrote:> > On Oct 5, 2006, at 5:40 PM, Erik Trimble wrote: > >> And, try thinking of a directory with a few dozen files in it, each with >> a dozen or more versions. that''s hideous, from a normal user standpoint. >> VMS''s implementation of <filename>;<version> is completely unwieldy if >> you have more than a few files, > > No it is not. I worked for DEC and used VMS up through 1993 and never > found it unwieldy. Even if I had 100 versions of one file. It is > > 1) what you are used to > > 2) what you are trained to do > > that makes it unwieldy or not > > I find the "unix" conventions of storying a file and file~ or any of the > other myriad billion ways of doing it that each app has invented to be > much more unwieldy. > > Yes, you have to "purge" your directories once in a while. The same way > you have to clean up any file "mess" you make on you computer (download > area, desktop, etc). > >> or more than a few versions. And, in >> modern typical use, it is _highly_ likely both will be true. > > So what if you have more than a few versions of a file. > > Beauty is in the eye of the beholder, and just because YOU find it > unwieldy does not make it so for the general user or anyone else. > > I would LOVE to have a VMS style (sorry, my TOPS-20 usage was very > little so I have no remembrance of it there) file versioning built in to > the system. > > "save early, save often" ONLY makes sense with a file versioning system, > or else you lose previous edits if you decide you have gone down a wrong > alley. > > Chad > > --- > Chad Leigh -- Shire.Net LLC > Your Web App and Email hosting provider > chad at shire.net > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Michael Schuster +49 89 46008-2974 / x62974 Recursion, n.: see ''Recursion''
> What would a version FS buy us that cron+ zfs > snapshots doesn''t?Some people are making money on the concept, so I suppose there are those who perceive benefits: http://en.wikipedia.org/wiki/Rational_ClearCase (I dimly remember DSEE on the Apollos; also some sort of versioning file type on (probably long-dead) Harris VOS real-time OS.) This message posted from opensolaris.org
On Oct 6, 2006, at 1:07 AM, Michael Schuster wrote:> I seem to remember that one could configure the max. number of > versions VMS would retain for you on a per-file basis - setting > this to 1 would de facto turn off versioning. > IFF versioning were implemented in ZFS, AND was made configurable > on a per-file basis (everything else wouldn''t make any sense at > all, IMO), the default could be set to 1, to avoid the various > horror scenarios that have been painted here, and people could > increase the number of versions they want for those files that need > it.Yes, it was configurable. I don''t remember if it was per file or per directory but per file would make sense. It would have to fit into a "unix" way of things so the top most version (most recent) would have to have the "plain" name so that it would work with standard unix apps that expect certain names... I am not one to expound on how that would be done or details as I am not by any means a guru of "low level unix-style things. But I would dearly like to have a versioning capability. Best Chad> > cheers > Michael > > Chad Leigh -- Shire.Net LLC wrote: >> On Oct 5, 2006, at 5:40 PM, Erik Trimble wrote: >>> And, try thinking of a directory with a few dozen files in it, >>> each with >>> a dozen or more versions. that''s hideous, from a normal user >>> standpoint. >>> VMS''s implementation of <filename>;<version> is completely >>> unwieldy if >>> you have more than a few files, >> No it is not. I worked for DEC and used VMS up through 1993 and >> never found it unwieldy. Even if I had 100 versions of one file. >> It is >> 1) what you are used to >> 2) what you are trained to do >> that makes it unwieldy or not >> I find the "unix" conventions of storying a file and file~ or any >> of the other myriad billion ways of doing it that each app has >> invented to be much more unwieldy. >> Yes, you have to "purge" your directories once in a while. The >> same way you have to clean up any file "mess" you make on you >> computer (download area, desktop, etc). >>> or more than a few versions. And, in >>> modern typical use, it is _highly_ likely both will be true. >> So what if you have more than a few versions of a file. >> Beauty is in the eye of the beholder, and just because YOU find it >> unwieldy does not make it so for the general user or anyone else. >> I would LOVE to have a VMS style (sorry, my TOPS-20 usage was very >> little so I have no remembrance of it there) file versioning built >> in to the system. >> "save early, save often" ONLY makes sense with a file versioning >> system, or else you lose previous edits if you decide you have >> gone down a wrong alley. >> Chad >> --- >> Chad Leigh -- Shire.Net LLC >> Your Web App and Email hosting provider >> chad at shire.net >> --------------------------------------------------------------------- >> --- >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > -- > Michael Schuster +49 89 46008-2974 / x62974 > Recursion, n.: see ''Recursion''--- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/91927f78/attachment.bin>
On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC wrote:> > But I would dearly like to have a versioning capability.Me too. Example (real life scenario): there is a samba server for about 200 concurrent connected users. They keep mainly doc/xls files on the server. From time to time they (somehow) currupt their files (they share the files so it is possible) so they are recovered from backup. Having versioning they could be said that if their main file is corrupted they can open previous version and keep working. ZFS snapshots is not solution in this case because we would have to create snapshots for 400 filesystems (yes, each user has its filesystem and I said that there are 200 concurrent connections but there much more accounts on the server) each hour or so. przemol
Hello, On 10/6/06, przemolicc at poczta.fm <przemolicc at poczta.fm> wrote:> On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC wrote: > > > > But I would dearly like to have a versioning capability. > > Me too. > Example (real life scenario): there is a samba server for about 200 > concurrent connected users. They keep mainly doc/xls files on the > server. From time to time they (somehow) currupt their files (they > share the files so it is possible) so they are recovered from backup. > Having versioning they could be said that if their main file is > corrupted they can open previous version and keep working. > ZFS snapshots is not solution in this case because we would have to > create snapshots for 400 filesystems (yes, each user has its filesystem > and I said that there are 200 concurrent connections but there much more > accounts on the server) each hour or so.So, if I build it, people will want it? ;) -- Regards, Jeremy
On Thu, Oct 05, 2006 at 02:19:46PM -0700, Erik Trimble wrote:> Doing versioning at the file-system layer allows block-level changes to > be stored, so it doesn''t consume enormous amounts of extra space. In > fact, it''s more efficient than any versioning software (CVS, SVN, > teamware, etc) for storing versions.Depends on the kinds of changes... Insert a small amount of code at the head of a large source file and most of the blocks might change...> However, there are three BIG drawbacks for using versioning in your FS > (that assumes that it is a tunable parameter and can be turned off for a > FS when not desired): > > [...] > > Maybe when we change filesystems to a DB, we can look at automatic > versioning again, as a DB can mitigate #1 and #3 issues above, and can > actually implement #2 completely. OracleFS, here I come. (<groan>)The way a DB would do this would be by effectively having version-aware interfaces. I think we could extend ZFS to provide versioning, but we''d have to make relevant applications version-aware, and versioning would have to be invisible to applications that aren''t version-aware (which also must not create versions automatically). Nico --
A couple of use cases I was considering off hand: 1. Oops i truncated my file 2. Oops i saved over my file 3. Oops an app corrupted my file. 4. Oops i rm -rf the wrong directory. All of which can be solved by periodic snapshots, but versioning gives us immediacy. So is immediacy worth it to you folks? I rather not embark on writing and finishing code on something no one wants besides me. -- Regards, Jeremy
On Fri, Oct 06, 2006 at 11:25:29PM +0800, Jeremy Teo wrote:> A couple of use cases I was considering off hand: > > 1. Oops i truncated my file > 2. Oops i saved over my file > 3. Oops an app corrupted my file. > 4. Oops i rm -rf the wrong directory. > All of which can be solved by periodic snapshots, but versioning gives > us immediacy.There''s been talk of making every transaction a snapshot. Of course, there''d be no information as to whether a transaction includes a file close, or truncation, or whatever. IMO a file versioning API would be good, but file versioning should normally be invisible, particularly to applications that are not aware of it (which would be every application to date). So think about the interfaces first. I think ls(1) would have to be made version-aware. And cp(1)/mv(1)/ln(1). That would be enough for a start. Then add find/sfind and tar/star support. And GNOME support. Nico --
ClearCase is a version control system, though ? not the same as file versioning. This message posted from opensolaris.org
On Fri, Oct 06, 2006 at 09:18:16AM -0700, Anton B. Rang wrote:> ClearCase is a version control system, though ? not the same as file versioning.But they have a filesystem interface. Crucially, this involves additional interfaces. VC cannot be automatic.
On 10/5/06, Wee Yeh Tan <weeyeh at gmail.com> wrote:> On 10/6/06, David Dyer-Bennet <dd-b at dd-b.net> wrote: > > One of the big problems with CVS and SVN and Microsoft SourceSafe is > > that you don''t have the benefits of version control most of the time, > > because all commits are *public*. > > David, > > That is exactly what "branch" is for in CVS and SVN. Dunno much about > M$ SourceSafe.I''ve never encountered branch being used that way, anywhere. It''s used for things like developing release 2.0 while still supporting 1.5 and 1.6. However, especially with merge in svn it might be feasible to use a branch that way. What''s the operation to update the branch from the trunk in that scenario? -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
On Fri, Oct 06, 2006 at 09:40:22AM +0200, przemolicc at poczta.fm wrote:> Example (real life scenario): there is a samba server for about 200 > concurrent connected users. They keep mainly doc/xls files on the > server. From time to time they (somehow) currupt their files (they > share the files so it is possible) so they are recovered from backup. > Having versioning they could be said that if their main file is > corrupted they can open previous version and keep working. > ZFS snapshots is not solution in this case because we would have to > create snapshots for 400 filesystems (yes, each user has its filesystem > and I said that there are 200 concurrent connections but there much more > accounts on the server) each hour or so.Why is creating that many snapshots a problem? The somewhat recent addition of recursive snapshots (zfs snapshot -r) reduces this to a single command. Taking individual snapshots of each filesystem can take a decent amount of time, but I was under the impression that recursive snapshots would be much faster due to the snapshots being committed in a single transaction. Is this not correct? Ed Plese
przemolicc at poczta.fm wrote:> On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC wrote: >> But I would dearly like to have a versioning capability. > > Me too. > Example (real life scenario): there is a samba server for about 200 > concurrent connected users. They keep mainly doc/xls files on the > server. From time to time they (somehow) currupt their files (they > share the files so it is possible) so they are recovered from backup. > Having versioning they could be said that if their main file is > corrupted they can open previous version and keep working. > ZFS snapshots is not solution in this case because we would have to > create snapshots for 400 filesystems (yes, each user has its filesystem > and I said that there are 200 concurrent connections but there much more > accounts on the server) each hour or so.I completely disagree. In this scenario (and almost all others), use of regular snapshots will solve the problem. ''zfs snapshot -r'' is extremely fast, and I''m working on some new features that will make using snapshots for this even easier and better-performing. If you disagree, please tell us *why* you think snapshots don''t solve the problem. --matt
Jeremy Teo wrote:> A couple of use cases I was considering off hand: > > 1. Oops i truncated my file > 2. Oops i saved over my file > 3. Oops an app corrupted my file. > 4. Oops i rm -rf the wrong directory. > All of which can be solved by periodic snapshots, but versioning gives > us immediacy. > > So is immediacy worth it to you folks? I rather not embark on writing > and finishing code on something no one wants besides me.In my opinion, the marginal benefit of per-write(2) versions over snapshots (which can be per-transaction, ie. every ~5 seconds) does not outweigh the complexity of implementation and use/administration. --matt
On Oct 6, 2006, at 1:02 PM, Matthew Ahrens wrote:> Jeremy Teo wrote: >> A couple of use cases I was considering off hand: >> 1. Oops i truncated my file >> 2. Oops i saved over my file >> 3. Oops an app corrupted my file. >> 4. Oops i rm -rf the wrong directory. >> All of which can be solved by periodic snapshots, but versioning >> gives >> us immediacy. >> So is immediacy worth it to you folks? I rather not embark on writing >> and finishing code on something no one wants besides me. > > In my opinion, the marginal benefit of per-write(2) versions over > snapshots (which can be per-transaction, ie. every ~5 seconds) does > not outweigh the complexity of implementation and use/administration.disclaimer: I have not used zfs snapshots a lot as I am still experimenting with zfs, but they appear to be similar to freebsd snapshots, with which I am familiar. The user experience with snapshots, in terms of file versioning (#1, #2, maybe #3) is much worse than a true file versioning user experience. People are oriented to their files, not to snapshots. And I may not want versioning with all my files (object files etc) which you would get with the snapshots. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/65384b7d/attachment.bin>
Matthew Ahrens wrote:> > If you disagree, please tell us *why* you think snapshots don''t solve > the problem.Technically there''s a race condition here. If you''re taking regular snapshots, you might see 10:25 - snapshot 1 - myfile.xls version 21 10:26 - - myfile.xls version 22 10:27 - - myfile.xls version 23 - corrupted 10:30 - snapshot 2 - myfile.xls version 23 - corrupted So if you need to roll back to a previous version, the most recent non-corrupt version (22) is lost. Snapshots are a decent alternative but not as comprehensive and perhaps automatic as people would like. --joe
On 10/6/06, Matthew Ahrens <Matthew.Ahrens at sun.com> wrote:> Jeremy Teo wrote: > > A couple of use cases I was considering off hand: > > > > 1. Oops i truncated my file > > 2. Oops i saved over my file > > 3. Oops an app corrupted my file. > > 4. Oops i rm -rf the wrong directory. > > All of which can be solved by periodic snapshots, but versioning gives > > us immediacy. > > > > So is immediacy worth it to you folks? I rather not embark on writing > > and finishing code on something no one wants besides me. > > In my opinion, the marginal benefit of per-write(2) versions over > snapshots (which can be per-transaction, ie. every ~5 seconds) does not > outweigh the complexity of implementation and use/administration.It may quite possibly not be worth adding the second, fairly similar, facility. In addition to the points you cite, trying to explain to average users what the two are and when to use each one would be fairly challenging. All the arguments about piles of version seem to apply in spades to taking snapshots every 5 seconds. And given the snapshot hierarchy, it''s much harder to find your file in the snapshot you want (let''s say your file is 5 or 10 directories down, quite common in source trees in my experience; you have to go back to the top and navigate to ~/.zfs/<weirdsnapshotdirectoryname>/foo/bar/mumble/bag/baz/etc/the-file-I-want.cpp in each snapshot that might have the version you''re looking for. I''d say the snapshot system is not as good as file versioning for the tasks I think file versioning is best for. However, snapshotting at very freaquent intervals would definitely capture close enough to the version I need to retrieve to make it a tolerable alternative. The user interface to it for retrieving a file is rather harder to use, it seems to me, and that might possibly discourage use when it would have been helpful. -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
On Fri, Oct 06, 2006 at 12:02:16PM -0700, Matthew Ahrens wrote:> In my opinion, the marginal benefit of per-write(2) versions over > snapshots (which can be per-transaction, ie. every ~5 seconds) does not > outweigh the complexity of implementation and use/administration.Per-write(2) versions would be worse than useless in many, if not most cases. Even per-close(2) versions wouldn''t always be useful. Versions need to be captured / snapshots need to be taken when it makes sense given what the application/user is doing. Versioning cannot be automated; taking periodic snapshots != capturing application state. FS-wide snapshots make a lot of sense in general, and can serve as a basic versioning tool for filesystems that have very specific uses (e.g., for a database). Per-file snapshots (versions, whatever) make sense more generally, but need to be used by applications/users that are aware of them, else the feature would go unused or, worse, it''d be worse than useless. IMO, there''s no urgent need for any new features around this, but if there is a need, then it''s for snapshots that aren''t filesystem-wide, with APIs so that applications can be aware of it. Nico --
Chad Leigh -- Shire.Net LLC wrote:> > disclaimer: I have not used zfs snapshots a lot as I am still > experimenting with zfs, but they appear to be similar to freebsd > snapshots, with which I am familiar. > > The user experience with snapshots, in terms of file versioning (#1, > #2, maybe #3) is much worse than a true file versioning user > experience. People are oriented to their files, not to snapshots. > And I may not want versioning with all my files (object files etc) > which you would get with the snapshots.disclaimer: ditto I tend to agree with Chad though. If you are taking snapshots every 5 seconds like Matthew suggests in a earlier reply, how does a user easily go back to previous versions without encountering a bunch of duplicated "versions" in the myriad of snapshots that are being taken. If the latest snapshot is number 2000, for example, and my file was last changed in snapshot 450. How do I easily figure that out without walking through snapshots 1999 - 451 before finding it? --joe
First of all, let''s agree that this discussion of File Versioning makes no more reference to its usage as Version Control. That is, we aren''t going to talk about it being useful for source code, other than in the context where a source code file is a document, like any other text document. File Versioning and Version Control are separate things, with different purposes and feature sets. OK. So, now we''re on to FV. As Nico pointed out, FV is going to need a new API. Using the VMS convention of simply creating file names with a version string afterwards is unacceptible, as it creates enormous directory pollution, not to mention user confusion. So, FV has to be invisible to non-aware programs. Now we have a problem: how do we access FV for non-local (e.g. SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is in the network file server arena, unless we can use FV over the network, it is useless. You can''t modify the SMB or NFS protocol (easily or quickly) to add FV functionality (look how hard it was to add ACLs to these protocols). About the only way I can think around this problem is to store versions in a special subdir of each directory (e.g. .zfs_version), which would then be browsable over the network, using tools not normally FV-aware. But this puts us back into the problem of a directory which potentially has hundreds or thousands of files. Also, "save-early-save-often" results in a version explosion, as does auto-save in the app. While this may indeed mean that you have all of your changes around, figuring out which version has them can be massively time-consuming. Let''s say you have auto-save set for 5 minutes (very common in MS Word). That gives you 12 versions per hour. If you suddenly decide you want to back up a couple of hours, that leaves you with looking at a whole bunch of files, trying to figure out which one you want. E.g. I want a file from about 3 hours ago. Do I want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hours ago? And, what if I''ve mis-remembered, and it really was closer to 4 hours ago? Yes, the data is eventually there. However, wouldn''t a 1-hour snapshot capability have saved you an enormous amount of time, by being able to simplify your search (and, yes, you won''t have _exactly_ the version you want, but odds are you will have something close, and you can put all the time you would have spent searching the FV tree into restarting work from the snapshot-ed version). Remember, FV''s main audience is going to be "naive" users, not us technical users, who generally have the problem that FV solves under control (yes, FV would make it easier for us, but we''re not the primary target). Version explosion (and the consequential problem of picking the right version to edit) is a huge problem for the naive audience. Also, a big difference between Snapshots and FV tends to be who controls EOL-ing a version/Snapshot. Snapshots tend to be done by the Admin, and their aging strictly controlled and defines (e.g. "we keep hourly snapshots for 1 week"). File versioning is typically under the control of the End-User, as their utility is much more nebulously defined. Certainly, there is no ability to truncate based on number of versions (e.g. "we only allow 100 versions to be kept"), since the frequency of versioning a file varies widely. Aging on a version is possibly a better answer, but this runs into a problem of user education, where we have to retrain our users to stop making frequent copies of important documents (like they do now, in absence of FV), but _do_ remember to dig through the FV archive periodically to save a desirable old copy. Also, if managing FV is to be a User task, how are they to do it over NFS/SAMBA? And, "log into the NFS server to do a cleanup" isn''t an acceptable answer. Also, FV is only useful for apps which do a "close()" on a file (or at least, I''m assuming we wait for a file to signal that it is closed before taking a version - otherwise, we do what? take a version every X minutes while the file still open? I shudder to think about the implementation of this, and its implications...). How many apps keep a file open for a long period of time? FV isn''t useful to them, only an "unlimited undo" functionality INSIDE the app. Lastly, consider the additional storage requirement of FV, and exactly how much utility you gain for sacrificing disk space. Look at this scenario: I''m editing a file, making 1MB of change per 5 minutes (a likely scenario when actively editing any Office-style document), of which only 50% to I actually make permanent (the rest being temp edits for ideas I decide to change or throw out). If I''m auto-saving every 5 minutes, that means I use 12MB of version space per hour. If I took a hourly snapshot, then I need only 6MB of storage. The situation gets worse, for the primary usefulness of FV is for files which are frequently edited - mean that they have rapid content change, and not in append-mode. Such a usage pattern means that FV will take up a much greater amount of space than periodic snapshots, as the longer interval in snapshots will allow the changes to "settle". To me, FV is/was very useful in TOPS-20 and VMS, where you were looking at a system DESIGNED with the idea in mind, already have a user base trained to use and expect it, and virtually all usage was local (i.e. no network filesharing). None of this is true in the UNIX/POSIX world. -Erik
Chad Leigh -- Shire.Net LLC wrote:> disclaimer: I have not used zfs snapshots a lot as I am still > experimenting with zfs, but they appear to be similar to freebsd > snapshots, with which I am familiar. > > The user experience with snapshots, in terms of file versioning (#1, > #2, maybe #3) is much worse than a true file versioning user > experience. People are oriented to their files, not to snapshots. > And I may not want versioning with all my files (object files etc) > which you would get with the snapshots. > > Chad >You can''t turn off and on File Versioning at the file level. At least, I can''t imaging trying to support (i.e. write) this kind of functionality into ZFS. File Versioning would be a tunable parameter for each filesystem. So, you''d have to store your object files on a different filesystem than your code. Which would make snapshots no different than FV, w/r/t keeping versions of the code, and not the object files. -Erik
On Oct 6, 2006, at 3:14 PM, Erik Trimble wrote:> Chad Leigh -- Shire.Net LLC wrote: >> disclaimer: I have not used zfs snapshots a lot as I am still >> experimenting with zfs, but they appear to be similar to freebsd >> snapshots, with which I am familiar. >> >> The user experience with snapshots, in terms of file versioning >> (#1, #2, maybe #3) is much worse than a true file versioning user >> experience. People are oriented to their files, not to >> snapshots. And I may not want versioning with all my files >> (object files etc) which you would get with the snapshots. >> >> Chad >> > You can''t turn off and on File Versioning at the file level. At > least, I can''t imaging trying to support (i.e. write) this kind of > functionality into ZFS.??? I will admit that I am not involved with the ZFS code, but it would seem that extensible meta data should make this easy. From reading some threads in a forum about Apple''s possible use of ZFS (conjecture in some forums) a Sun engineer mentioned that ZFS was easily extensible in the meta data arena so that Apple should have no problems meeting their requirements. Was this incorrect?> File Versioning would be a tunable parameter for each filesystem. > So, you''d have to store your object files on a different filesystem > than your code. Which would make snapshots no different than FV, w/ > r/t keeping versions of the code, and not the object files.The problem is that you are stuck on snapshots and cannot think "outside of the box". All your implementations you are thinking of are constrained by your tunnel vision. This is not meant as a personal attack. It is just that the arguments you put forth (in your long post which I am in the middle of replying to) show that this tunnel vision is readily apparent. Chad> > -Erik--- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/e2398d96/attachment.bin>
On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:> First of all, let''s agree that this discussion of File Versioning > makes no more reference to its usage as Version Control. That is, > we aren''t going to talk about it being useful for source code, > other than in the context where a source code file is a document, > like any other text document. File Versioning and Version Control > are separate things, with different purposes and feature sets. > > > OK. So, now we''re on to FV. As Nico pointed out, FV is going to > need a new API. Using the VMS convention of simply creating file > names with a version string afterwards is unacceptible, as it > creates enormous directory pollution,Assumption, not supported. "Eye of the beholder."> not to mention user confusion.Assumption, not supported.> So, FV has to be invisible to non-aware programs.yes> > Now we have a problem: how do we access FV for non-local (e.g. > SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is > in the network file server arena,Assumption, and definitely not supported. It is very useful outside of the file sharing arena.> unless we can use FV over the network, it is useless.Wrong> You can''t modify the SMB or NFS protocol (easily or quickly) to add > FV functionality (look how hard it was to add ACLs to these > protocols). > > About the only way I can think around this problem is to store > versions in a special subdir of each directory (e.g. .zfs_version), > which would then be browsable over the network, using tools not > normally FV-aware. But this puts us back into the problem of a > directory which potentially has hundreds or thousands of files.This directory way of doing it is not a good way. It fails the ease of use to the end user test. The VMS way is far superior. The problem is that you have to make sure that apps that are not FV aware have no problems, which means you cannot just append something to the actual file name. It has to be some sort of meta data.> > Also, "save-early-save-often" results in a version explosion, as > does auto-save in the app.Does not have to. In VMS it is configurable on how many versions you want to save before it does an auto purge. A simple purge command then cleans things up for you. Very minimal requirements for "retraining" the user. Set the default configuration to be a max of 1 version and you have no problems unless you turn it on.> While this may indeed mean that you have all of your changes > around, figuring out which version has them can be massively time- > consuming.Your assumption. (And much less hard than using snapshots).> Let''s say you have auto-save set for 5 minutes (very common in MS > Word). That gives you 12 versions per hour.So?> If you suddenly decide you want to back up a couple of hours, that > leaves you with looking at a whole bunch of files, trying to figure > out which one you want. E.g. I want a file from about 3 hours ago. > Do I want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 > hours ago?Look at the file create time. Take a quick look at the contents if you are confused. At least you HAVE the capability to go back.> And, what if I''ve mis-remembered, and it really was closer to 4 > hours ago?Simple file system tools help me find it.> Yes, the data is eventually there. However, wouldn''t a 1-hour > snapshot capability have saved you an enormous amount of time,No. Managing the versions is not hard like you say. I lived on VMS for years and it was never a problem. It is your mindset and your preconceived notions that is the problem> by being able to simplify your search (and, yes, you won''t have > _exactly_ the version you want, but odds are you will have > something close, and you can put all the time you would have spent > searching the FV tree into restarting work from the snapshot-ed > version).I would much rather take an extra 2 minutes futzing around with the FV saved versions than trying to recreate what I had done. And snapshots are not user friendly from a UI perspective -- funny strange directories and having to dig around in them.> > Remember, FV''s main audience is going to be "naive" users, not us > technical users,No, it is US technical users as much as the naive user.> who generally have the problem that FV solves under control (yes, > FV would make it easier for us, but we''re not the primary target).We do? I have often edited system files and then wanted to go back to something I deleted earlier as I realized it was the wrong one.> Version explosion (and the consequential problem of picking the > right version to edit) is a huge problem for the naive audience. >This statement is naive itself and is unsupportable. Where are the usability tests that support this? VMS has a LONG HISTORY and is/was used by a lot of what you call "naive" users. FV never caused any problems that I encountered or indeed that DEC encountered as it never once came up as a an issue with VMS usability.> Also, a big difference between Snapshots and FV tends to be who > controls EOL-ing a version/Snapshot. Snapshots tend to be done by > the Admin, and their aging strictly controlled and defines (e.g. > "we keep hourly snapshots for 1 week"). File versioning is > typically under the control of the End-User, as their utility is > much more nebulously defined. Certainly, there is no ability to > truncate based on number of versions (e.g. "we only allow 100 > versions to be kept"), since the frequency of versioning a file > varies widely. Aging on a version is possibly a better answer, but > this runs into a problem of user education, where we have to > retrain our users to stop making frequent copies of important > documents (like they do now, in absence of FV), but _do_ remember > to dig through the FV archive periodically to save a desirable old > copy. Also, if managing FV is to be a User task, how are they to > do it over NFS/SAMBA? And, "log into the NFS server to do a > cleanup" isn''t an acceptable answer. > > Also, FV is only useful for apps which do a "close()" on a file (or > at least, I''m assuming we wait for a file to signal that it is > closed before taking a version - otherwise, we do what? take a > version every X minutes while the file still open? I shudder to > think about the implementation of this, and its implications...). > How many apps keep a file open for a long period of time? FV isn''t > useful to them, only an "unlimited undo" functionality INSIDE the app.Yes, any time you do a close() or equivalent. The idea is not to implement a universal undo stack. You can always find a scenario where FV doesn''t help. So what. There are lots of scenarios where it does help. More positive scenarios than you can dream up negatives for.> > Lastly, consider the additional storage requirement of FV, and > exactly how much utility you gain for sacrificing disk space.We have GB and TB of cheap space. A few extra versions lying around until people hit their quotas is the users'' issue, not the sysadmin.> Look at this scenario: I''m editing a file, making 1MB of change > per 5 minutes (a likely scenario when actively editing any Office- > style document), of which only 50% to I actually make permanent > (the rest being temp edits for ideas I decide to change or throw > out). If I''m auto-saving every 5 minutes, that means I use 12MB of > version space per hour. If I took a hourly snapshot, then I need > only 6MB of storage.So. Your snapshot is much less useful and 12MB is nothing in todays GBs of cheap space. Probably compressed too so even less usage than you envision.> The situation gets worse, for the primary usefulness of FV is for > files which are frequently edited - mean that they have rapid > content change, and not in append-mode. Such a usage pattern means > that FV will take up a much greater amount of space than periodic > snapshots, as the longer interval in snapshots will allow the > changes to "settle".Not an issue. Cheap disk space.> > > To me, FV is/was very useful in TOPS-20 and VMS, where you were > looking at a system DESIGNED with the idea in mind, already have a > user base trained to use and expect it, and virtually all usage was > local (i.e. no network filesharing). None of this is true in the > UNIX/POSIX world.And does not affects its usefulness. Chad> > > -Erik > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss--- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/643a9a6e/attachment.bin>
On 10/6/06, Erik Trimble <Erik.Trimble at sun.com> wrote:> First of all, let''s agree that this discussion of File Versioning makes > no more reference to its usage as Version Control. That is, we aren''t > going to talk about it being useful for source code, other than in the > context where a source code file is a document, like any other text > document. File Versioning and Version Control are separate things, with > different purposes and feature sets.Hmm; the most important uses of file versioning come, in my opinion, when working on source code. But for handling very different situations than source control does.> OK. So, now we''re on to FV. As Nico pointed out, FV is going to need a > new API. Using the VMS convention of simply creating file names with a > version string afterwards is unacceptible, as it creates enormous > directory pollution, not to mention user confusion. So, FV has to be > invisible to non-aware programs.Strongly disagree, twice. Having FV invisible to programs not updated to specially support it is IMHO unacceptable, and would render the feature useless. I remember it being a bit inconvenient on VMS. It wasn''t on TOPS-20. I''ll have to look into what the TOPS-20 conventions were again (I used TOPS-20 from 1977 to 1985, but hardly touched it since), but I found them very friendly and easy to work with, not confusing, etc. They weren''t *that* different from the VMS approach, but this is probably one of those situations where tiny tweaks to user interface make a huge difference to user experience.> Also, FV is only useful for apps which do a "close()" on a file (or at > least, I''m assuming we wait for a file to signal that it is closed > before taking a version - otherwise, we do what? take a version every X > minutes while the file still open? I shudder to think about the > implementation of this, and its implications...). How many apps keep a > file open for a long period of time? FV isn''t useful to them, only an > "unlimited undo" functionality INSIDE the app.It''s the rewrite scenario; when we open or rename a file on top of an existing file, the new file gets an incremented version number, and the old file stays around.> Lastly, consider the additional storage requirement of FV, and exactly > how much utility you gain for sacrificing disk space.It was something we could afford, and did afford, on TOPS-20 systems where having three RP06 disk pack systems (at 200MB each) was considered rather a lot of storage. Today it''s a complete non-issue. Disk space is free.> To me, FV is/was very useful in TOPS-20 and VMS, where you were looking > at a system DESIGNED with the idea in mind, already have a user base > trained to use and expect it, and virtually all usage was local (i.e. no > network filesharing). None of this is true in the UNIX/POSIX world.When TOPS-20 was introduced, essentially nobody was used to file versioning. When VMS was introduced, very few people were used to file versioning (and the TOPS-20 community mostly moved to Unix rather than VMS). TOPS-20 wasn''t the first system I used (it was the fifth, Ithink), or even the first timesharing system (the third, I believe). File versioning was one of those "instant love" features; it was instantly obvious how it worked, how to use it, and how beneficial it was. I see network file access as a non-issue; the version gets treated as part of the file name, just as it did on all the previous systems that supported file versioning. I''m *still* not really sure it''s actually worth the trouble of adding, if 5-second snapshots are really feasible. They''re less convenient to use by quite a bit, but the important use cases arise relatively rarely, and the value is high when they arise, so that''s not *too* big an issue. And more code complexity and more user confusion (I don''t think versioning is terribly comlex to understand, but certainly snapshots plus versioning is more complex than snapshots alone). But if people are going to decide against file versioning, I''d prefer it to be based on a more accurate understanding of how it plays to users :-). -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC wrote:> On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote: > >OK. So, now we''re on to FV. As Nico pointed out, FV is going to > >need a new API. Using the VMS convention of simply creating file > >names with a version string afterwards is unacceptible, as it > >creates enormous directory pollution, > > Assumption, not supported. "Eye of the beholder."No, you really need an API, otherwise you have to guess when to snapshot versions of files.> >not to mention user confusion. > > Assumption, not supported.Maybe Erik would find it confusing. I know I would find it _annoying_.> >So, FV has to be invisible to non-aware programs. > > yesInteresting that you agree with this when you disagree with Erik''s other points! To me this statement implies FV APIs.> >Now we have a problem: how do we access FV for non-local (e.g. > >SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is > >in the network file server arena, > > Assumption, and definitely not supported. It is very useful outside > of the file sharing arena.I agree with you, and I agree with Erik. We, Sun engineers that is, need to look at the big picture, and network access is part of the big picture.> >unless we can use FV over the network, it is useless. > > WrongYes, but we have to provide for it.> >You can''t modify the SMB or NFS protocol (easily or quickly) to add > >FV functionality (look how hard it was to add ACLs to these > >protocols). > > > >About the only way I can think around this problem is to store > >versions in a special subdir of each directory (e.g. .zfs_version), > >which would then be browsable over the network, using tools not > >normally FV-aware. But this puts us back into the problem of a > >directory which potentially has hundreds or thousands of files. > > This directory way of doing it is not a good way. It fails the ease > of use to the end user test.No, it doesn''t: it doesn''t preclude having FV-aware UIs that make it easier to access versions. All Erik''s .zfs_version proposal is about is remote access, not a user interface.> The VMS way is far superior. The problem is that you have to make > sure that apps that are not FV aware have no problems, which means > you cannot just append something to the actual file name. It has to > be some sort of meta data.I.e., APIs. The big question though is: how to snapshot file versions when they are touched/created by applications that are not aware of FV? Certainly not with every write(2). At fsync(2), close(2), open(2) for write/append? What if an application deals in multiple files? Etc... Automatically capturing file versions isn''t possible in the general case with applications that aren''t aware of FV.> >While this may indeed mean that you have all of your changes > >around, figuring out which version has them can be massively time- > >consuming. > > Your assumption. (And much less hard than using snapshots).I agree that with ZFS snapshots it could be hard to find the file versions you want. I don''t agree that the same isn''t true with FV *except* where you have FV-aware applications.> Yes, any time you do a close() or equivalent. The idea is not to > implement a universal undo stack.Or open(2) for write, fsync(2)s, unlinks. Maybe. It could work for some apps and not for others. (I really wouldn''t want building code to result in lots of file versions of intermediate and end-result files!) Nico --
On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote:> On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net > LLC wrote: >> On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote: >>> OK. So, now we''re on to FV. As Nico pointed out, FV is going to >>> need a new API. Using the VMS convention of simply creating file >>> names with a version string afterwards is unacceptible, as it >>> creates enormous directory pollution, >> >> Assumption, not supported. "Eye of the beholder." > > No, you really need an API, otherwise you have to guess when to > snapshot > versions of files.What does "snapshot versions of files" mean? My line "Assumption, not supported. "Eye of the beholder"" was in reference to "enormous directory polution"> >>> not to mention user confusion. >> >> Assumption, not supported. > > Maybe Erik would find it confusing. I know I would find it > _annoying_.Then leave it set to 1 version> >>> So, FV has to be invisible to non-aware programs. >> >> yes > > Interesting that you agree with this when you disagree with Erik''s > other > points! To me this statement implies FV APIs.It has to do with the implementation details. I don''t know what sort of APIs you are saying are needed. Maybe they are needed and maybe they would be handy. I am not disputing that. The above should be simple to do however -- a program does an open of a file name "foo.bar". ZFS / the file system routine would use the most recent version by default if no version info is given.> >>> Now we have a problem: how do we access FV for non-local (e.g. >>> SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is >>> in the network file server arena, >> >> Assumption, and definitely not supported. It is very useful outside >> of the file sharing arena. > > I agree with you, and I agree with Erik. We, Sun engineers that is, > need to look at the big picture, and network access is part of the big > picture.Sure> >>> unless we can use FV over the network, it is useless. >> >> Wrong > > Yes, but we have to provide for it.I never said that file sharing is not useful (in this or any context). I just said that FV is not useless except in the "over the network" use. And if it did not support filesharing scenarios, at least in the beginning, it still has great use. The same way that apache does not support lockfiles on nfs file systems, does not make apache or nfs "useless", FV that is not 100% in every nook and cranny does not make it useless. I would find it of tremendous use just in managing system and configuration files.> >>> You can''t modify the SMB or NFS protocol (easily or quickly) to add >>> FV functionality (look how hard it was to add ACLs to these >>> protocols). >>> >>> About the only way I can think around this problem is to store >>> versions in a special subdir of each directory (e.g. .zfs_version), >>> which would then be browsable over the network, using tools not >>> normally FV-aware. But this puts us back into the problem of a >>> directory which potentially has hundreds or thousands of files. >> >> This directory way of doing it is not a good way. It fails the ease >> of use to the end user test. > > No, it doesn''t: it doesn''t preclude having FV-aware UIs that make it > easier to access versions. All Erik''s .zfs_version proposal is > about is > remote access, not a user interface.one UI is the command line shell> >> The VMS way is far superior. The problem is that you have to make >> sure that apps that are not FV aware have no problems, which means >> you cannot just append something to the actual file name. It has to >> be some sort of meta data. > > I.e., APIs.Well, file system level meta data that the file system uses may or may not need APIs to expose it -- depends on how the final implementation works. However, I never came out against APIs> > The big question though is: how to snapshot file versions when they > are > touched/created by applications that are not aware of FV?Don''t use the word snapshot as it may draw in unintended comparisons to snapshot features.> > Certainly not with every write(2).no> At fsync(2), close(2), open(2) for > write/append?probably> What if an application deals in multiple files?so?> Etc... > > Automatically capturing file versions isn''t possible in the general > case > with applications that aren''t aware of FV.In most cases it is possible. At worst you make a copy on open and work on the copy, making it the most recent version.> >>> While this may indeed mean that you have all of your changes >>> around, figuring out which version has them can be massively time- >>> consuming. >> >> Your assumption. (And much less hard than using snapshots). > > I agree that with ZFS snapshots it could be hard to find the file > versions you want. I don''t agree that the same isn''t true with FV > *except* where you have FV-aware applications.How so? The shell / desktop is enough of a UI to deal with it.> >> Yes, any time you do a close() or equivalent. The idea is not to >> implement a universal undo stack. > > Or open(2) for write, fsync(2)s, unlinks. Maybe. It could work for > some apps and not for others.See my comments above -- worst case is to copy the file on open and then do everything on the copy as normal.> > (I really wouldn''t want building code to result in lots of file > versions > of intermediate and end-result files!)No harm as they get deleted by the build process anyway. And if you "enhance" the FV, you can set directories like scratch directories to not allow more than 1 FV per file. Chad> > Nico > ----- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/d72e7840/attachment.bin>
On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com> wrote:> On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC wrote: > > On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote: > > >OK. So, now we''re on to FV. As Nico pointed out, FV is going to > > >need a new API. Using the VMS convention of simply creating file > > >names with a version string afterwards is unacceptible, as it > > >creates enormous directory pollution, > > > > Assumption, not supported. "Eye of the beholder." > > No, you really need an API, otherwise you have to guess when to snapshot > versions of files.First of all "snapshot versions of files" is a very confusing phrase especially in this discussion. But, if you mean what I think you mean, then the existing file API gives you all the information you need . Whenever you create a new file, you create a new version. The only thing that changes is, if an *old* version already exists, it doesn''t get deleted the way it used to.> > >Now we have a problem: how do we access FV for non-local (e.g. > > >SAMBA/NFS) clients? Since the VAST majority of usefulness of FV is > > >in the network file server arena, > > > > Assumption, and definitely not supported. It is very useful outside > > of the file sharing arena. > > I agree with you, and I agree with Erik. We, Sun engineers that is, > need to look at the big picture, and network access is part of the big > picture.Yes, I have to agree here also. So much of people''s file access is over a network these days that a local-only facility isn''t very interesting / useful.> > >You can''t modify the SMB or NFS protocol (easily or quickly) to add > > >FV functionality (look how hard it was to add ACLs to these > > >protocols). > > > > > >About the only way I can think around this problem is to store > > >versions in a special subdir of each directory (e.g. .zfs_version), > > >which would then be browsable over the network, using tools not > > >normally FV-aware. But this puts us back into the problem of a > > >directory which potentially has hundreds or thousands of files. > > > > This directory way of doing it is not a good way. It fails the ease > > of use to the end user test. > > No, it doesn''t: it doesn''t preclude having FV-aware UIs that make it > easier to access versions. All Erik''s .zfs_version proposal is about is > remote access, not a user interface.Requiring special software to access this kind of feature is death. People don''t want to learn new tools; they want to learn existing tools. Depending on the user, that''s ls, or awk, or grep, or find, or Emacs dired, or this or that or the other thing. One of the reasons ZFS snapshots (and other snapshots, in my limited experience) work easily is that they appear as ordinary files within the directory structure, and do *not* require special tools to access.> > The VMS way is far superior. The problem is that you have to make > > sure that apps that are not FV aware have no problems, which means > > you cannot just append something to the actual file name. It has to > > be some sort of meta data. > > I.e., APIs.I don''t think I understand the issues being raised here. My off-the-cuff impression is that they don''t exist at all, or are at least moderate molehills not mountains. When writing an application for TOPS-20 or VMS, you didn''t have to do anything to specifically deal with file versioning. It just worked. If the user wanted the most recent version of the file, they typed the name without the version, or else with the most current version. If they *did* want an older version, they had to type very slightly more, by appending the version number. And (on TOPS-20) of course we had filename completion and inline help to make it easy to refresh your memory on what versions existed in the middle of doing this. So, one small feature built into the filesystem OPEN code: if a version is not specificied for a file, use the most recent version. NO special code in any application is needed. There are public-access TOPS-20 systems on the net today (I''ve got an account on one, though that data is at home and I''m in Palo Alto this week). And I''ve still got the small TOPS-20 system manual (I didn''t keep the big twenty-something volume set though) where I can look up the details when I''m home. This technology isn''t completely lost yet :-). -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
Chad, I think our problem is that we look at FV from different angles. I look at it from the point of view of people who have NEVER used FV, and you look at it from the view of people who have ALWAYS used FV. For those of us who have never had FV available, technical users have used VC tools for important files forever (scripts, config files, etc), and will continue to use VC for those purposes, even if FV is implemented, as VC has decided advantages for these uses (history, management, etc.). For the technical user, FV is primarily useful for when editing documents where were never put under VC in the pre-FV era. This is virtually identical to the usage for "naive" users. That is, FV is highly useful for keeping multiple copies of documents under active editing. In order for an FV implementation to be useful for this stated purpose, it must fulfill the following requirements: (1) Clean interface for users. That is, one must NOT be presented with a complete list of all versions unless explicitly asked for it, and it should be simple to select a version based on some reasonable criteria (date of creation/modification, version number, etc.) (2) Simple way to decide if a file should be versioned or not. Either automatically version all files (or none at all), or provide a mechanism to turn FV on/off on a per-file or per-directory basis. (3) Network-FS awareness. Without this, FV is severely limited. Given my preconditions above (that is, the current usage pattern of us in the non-FS world), limiting FV to those on the local system restricts its usefulness to the point where it isn''t worth the effort. So, we have two scenarios for the implementation here: (a) FV requires no special API, and all programs using the Filesystem automatically have access to versions (b) FV uses a new API, so versions are only available to applications using the new API For case (a), you are going to have to store the versions as files _somewhere_, in which case you run into the "directory pollution" problem I quote (if you store the versions next to the "current" version), or the "where is my version" problem that you quote w/r/t snapshots (if you store them elsewhere). In case (b), you will have to re-write _all_ FS-access apps to make them FV-aware, in the same manner work had to be done to make apps ACL-aware. And, to get requirement (3) above, you have to modify the network FS protocols to support the API calls. Also, regardless of which implementation mechanism you use (a) or (b), you will need some sort of tool to indicate which files are to be versioned (to satisfy requirement (2) above), how many versions are to be kept, and other FV administration utilities. These tools will all need to be netFS-aware/usable. Disk space consumption is NOT irrelevant. Else, why is there so much concern around the ZFS compression project? Disk is NOT cheap - on the desktop, yes, but I''m sorry, networked disk systems are not really cheap, and tape archivers less so. Allocating several GB of disk space per end-user is not uncommon, so 1000 users requires multi-terabyte systems just for "normal" storage (i.e. no backups/versions/snapshots/archives). Take a look at what a typical system costs: $10+/GB for workgroup-level storage (Sun 3510FC class, 1-20TB), $30+/GB for nice mid-level SAN storage arrays (Sun 6920-class. >10TB). If I have to increase my storage requirements 25-50% for FV, most of which is unused versions, this is decidedly non-trivial amounts. This applies as well to the 5-second snapshot proposal. For source code, FV isn''t really needed - the problem has already been solved. If your particular VC/editor/IDE doesn''t handle the problem correctly, then switch. There are many VC and IDE combinations on all platforms which provide a solution to the same problem FV solves. Mercurial, RationalRose, BitKeeper, Git, and others on the VC side; NetBeans, CodeWarrior, Visual Studio, and even Emacs can be configured to handle the problem on the IDE side. -Erik
On Fri, Oct 06, 2006 at 04:06:37PM -0600, Chad Leigh -- Shire.Net LLC wrote:> On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote: > >On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net > >LLC wrote: > >>On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote: > >>>OK. So, now we''re on to FV. As Nico pointed out, FV is going to > >>>need a new API. Using the VMS convention of simply creating file > >>>names with a version string afterwards is unacceptible, as it > >>>creates enormous directory pollution, > >> > >>Assumption, not supported. "Eye of the beholder." > > > >No, you really need an API, otherwise you have to guess when to > >snapshot > >versions of files. > > What does "snapshot versions of files" mean?The act of creating file versions ala VMS.> My line "Assumption, not supported. "Eye of the beholder"" was in > reference to "enormous directory polution"Ah. ''Twasn''t clear.> > > >>>not to mention user confusion. > >> > >>Assumption, not supported. > > > >Maybe Erik would find it confusing. I know I would find it > >_annoying_. > > Then leave it set to 1 versionPer-directory? Per-filesystem?> > > >>>So, FV has to be invisible to non-aware programs. > >> > >>yes > > > >Interesting that you agree with this when you disagree with Erik''s > >other > >points! To me this statement implies FV APIs. > > It has to do with the implementation details. I don''t know what sort > of APIs you are saying are needed. Maybe they are needed and maybe > they would be handy. I am not disputing that. > > The above should be simple to do however -- a program does an open of > a file name "foo.bar". ZFS / the file system routine would use the > most recent version by default if no version info is given.How can version information be given without changing the APIs or putting the version number/string into the file name? Putting the version number/string into the file name is hard for me to accept. It''s what would lead to polluting my directories. Now, if the default is 1 version (i.e., keep the current version only), then I might live with it because I''d never change that setting. But if we don''t encode the version number/string in the file name and instead enhance APIs and UIs so that by default I can keep N>1 versions without them polluting my directories, THEN I would set N>1.> one UI is the command line shellIndeed! And command-line tools, like ls(1), find(1), etc... What I''m saying is that I''d like to be able to keep multiple versions of my files without "echo *" or "ls" showing them to me by default. I''d like an option for ls(1), find(1) and friends to show file versions, and a way to copy (or, rather, un-hide) selected versions files so that I could now refer to them as usual -- when I do this I don''t care to see version numbers in the file name, I just want to give them names. And, maybe, I''d like a way to write globs that match file versions (think of extended globboing, as in KSH). GUIs would, presumably, have a way show/hide file versions, search for them, select them, etc...> >Certainly not with every write(2). > > noGood.> >At fsync(2), close(2), open(2) for > >write/append? > > probablyWhich?> >What if an application deals in multiple files? > > so?So, file versions aren''t useful unless the application explicitly decides tells the OS when to make them. Similarly with applications that keep files open but keep writing transactions in ways that the OS can''t isolate without input from the app. E.g., databases. fsync(2) helps here, but lots and lots of fsync(2)s would result in no useful versioning. Nico --
On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com> wrote:> On Fri, Oct 06, 2006 at 04:06:37PM -0600, Chad Leigh -- Shire.Net LLC wrote: > > On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote:> > >Maybe Erik would find it confusing. I know I would find it > > >_annoying_. > > > > Then leave it set to 1 version > > Per-directory? Per-filesystem?Whatever. What''s the actual issue here? I don''t recall that on TOPS-20 it was possible to not version. What you could do is set your logout.cmd file to purge your space down to one copy when you logged out. This worked fine for the users I knew; even on a system that didn''t have as much as a gigabyte of disk storage total to support a few dozen software engineers.> > The above should be simple to do however -- a program does an open of > > a file name "foo.bar". ZFS / the file system routine would use the > > most recent version by default if no version info is given. > > How can version information be given without changing the APIs or > putting the version number/string into the file name?The version number is part of the file name in all the examples I know about. I''d find it useless without that; it has to be a real part of the filesystem, usable by everybody, not a special addon accessible only with one or two dedicated applications.> Putting the version number/string into the file name is hard for me to > accept. It''s what would lead to polluting my directories.Set your ls default to not show versions. Isn''t the problem then solved? Maybe add that option to the GUI filesystem explorer as well. In practice, it never was a problem that I noticed, or that other people noticed. And remember that this was on slower systems with smaller screens and often rather slower screen update. Do you not like the idea based on theory, or did you actually use TOPS-20 for a while and find the versioning troublesome?> > one UI is the command line shell > > Indeed! And command-line tools, like ls(1), find(1), etc... > > What I''m saying is that I''d like to be able to keep multiple versions of > my files without "echo *" or "ls" showing them to me by default.And I find that completely unacceptable; useless. The whole point of putting versioning in the filesystem is that that makes it accessible to all programs.> > >What if an application deals in multiple files? > > > > so? > > So, file versions aren''t useful unless the application explicitly > decides tells the OS when to make them.File versions are created when a file is created. In the scenario where, today, an existing file would be overwritten (deleted), instead the old file is kept and the new file is given the version number +1 of the old file.> Similarly with applications that keep files open but keep writing > transactions in ways that the OS can''t isolate without input from the > app. E.g., databases. fsync(2) helps here, but lots and lots of > fsync(2)s would result in no useful versioning.None of those are candidates for file versioning, and a darned good thing, too. -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
Nicolas Williams wrote:>On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC wrote: > > >>On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote: >> >> >>>OK. So, now we''re on to FV. As Nico pointed out, FV is going to >>>need a new API. Using the VMS convention of simply creating file >>>names with a version string afterwards is unacceptible, as it >>>creates enormous directory pollution, >>> >>> >>Assumption, not supported. "Eye of the beholder." >> >> > >No, you really need an API, otherwise you have to guess when to snapshot >versions of files. > >David Dyer-Bennet''s post gives a hint of how this could be done without any API. Simply augment a few system calls like open(), unlink(), etc. Calls that can potentially change files. Since you can''t change a file unless is open()''ed with various write flags like O_WRONLY, O_RDWR, etc, this could be an ideal place to create the version. One could probably write a "poor man''s" FV LD_PRELOAD library to do this without the filesystem''s knowledge at all. It wouldn''t be as efficient with space as could be done at the filesystem level, but as someone said, disk is cheap.
Nicolas Williams wrote:> >The big question though is: how to snapshot file versions when they are >touched/created by applications that are not aware of FV? > >Certainly not with every write(2). At fsync(2), close(2), open(2) for >write/append? What if an application deals in multiple files? Etc... > >Automatically capturing file versions isn''t possible in the general case >with applications that aren''t aware of FV. > >Don''t snapshots have the same problem. A snapshot could potentially be taken when a file is partially written or updated, no? For example, I start to write a large file, zfs''s buffers fill up and it flushes them to disk during the middle of the file I''m writing. If a snapshot came along at about the same time, the file would be incomplete/corrupt, no?
David Dyer-Bennet wrote:> On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com> wrote: > >> > >Maybe Erik would find it confusing. I know I would find it >> > >_annoying_. >> > >> > Then leave it set to 1 version >> >> Per-directory? Per-filesystem? > > Whatever. What''s the actual issue here? > > I don''t recall that on TOPS-20 it was possible to not version. What > you could do is set your logout.cmd file to purge your space down to > one copy when you logged out.But see, that assumes you have a logout-type functionality to use. Which indeed is possible for command-line usage, but then only in a very limited way. During a typical session, I access almost 20 NFS-mounted directories. And anyone using autofs/automount trees gets even more. You''re saying that my logout script has to know about all of them to keep things clean? That''s unrealistic. And that still doesn''t solve the problem of people who use SAMBA or NFS from machines which don''t have an interactive shell logout system (i.e. Windows).> This worked fine for the users I knew; even on a system that didn''t > have as much as a gigabyte of disk storage total to support a few > dozen software engineers. >The problem is we are comparing apples to oranges in user bases here. TOPS-20 systems had a couple of dozen users (or, at most, a few hundred). VMS only slightly more. UNIX/POSIX systems have 10s of thousands. Plus, the number of files being created under typical modern systems is at least two (and probably three or four) orders of magnitude greater. I''ve got 100,000 files under /usr in Solaris, and almost 1,000 under my home directory. And I don''t have anything significant in my /home (no source code, no build/test trees, just misc business stuff). What is managable with a few files quickly becomes unwieldy with more than a few dozen. This is what Nico and I are talking about: if you turn on file versioning automatically (even for just a directory, and not a whole filesystem), the number of files being created explodes geometrically.>> > The above should be simple to do however -- a program does an open of >> > a file name "foo.bar". ZFS / the file system routine would use the >> > most recent version by default if no version info is given. >> >> How can version information be given without changing the APIs or >> putting the version number/string into the file name? > > The version number is part of the file name in all the examples I know > about. I''d find it useless without that; it has to be a real part of > the filesystem, usable by everybody, not a special addon accessible > only with one or two dedicated applications. > >> Putting the version number/string into the file name is hard for me to >> accept. It''s what would lead to polluting my directories. > > Set your ls default to not show versions. Isn''t the problem then > solved? Maybe add that option to the GUI filesystem explorer as well. >But this requires modifying all the relevant apps, which is the same amount of work as modifying them to use a new FV API. It''s not transparent to the end-user.> In practice, it never was a problem that I noticed, or that other > people noticed. And remember that this was on slower systems with > smaller screens and often rather slower screen update. > > Do you not like the idea based on theory, or did you actually use > TOPS-20 for a while and find the versioning troublesome? >Putting the file version number as part of the file name breaks things. Apps unaware of the special significance of this format will tend to write similar names, which can screw everything royally. Example: Say we use <file>;<version> In emacs, I edit FOO:2 it will write out a temp file "FOO:2~". So, how does the FS deal with this the next time they need to create a new version? The problem lies in that under VMS, the '';'' was a special character, and unusable in normal naming. I suspect a similar situation exists under TOPS-20. No such luck in a POSIX filesystem - all printable (and many unprintable) characters are valid for use in filenames. So you _CAN''T_ use them to deliniate File Versioning, without risking blowing the entire scheme when some random app decides to either use your FV marker for its own needs, or something similar to the emacs case above.>> > one UI is the command line shell >> >> Indeed! And command-line tools, like ls(1), find(1), etc... >> >> What I''m saying is that I''d like to be able to keep multiple versions of >> my files without "echo *" or "ls" showing them to me by default. > > And I find that completely unacceptable; useless. The whole point of > putting versioning in the filesystem is that that makes it accessible > to all programs. >But, because of the explosion in the number of files, you CAN''T automatically show all versions. Users will NEVER accept this. The only clean way to do this is to show file versions only upon request. Not by default.>> > >What if an application deals in multiple files? >> > >> > so? >> >> So, file versions aren''t useful unless the application explicitly >> decides tells the OS when to make them. > > File versions are created when a file is created. In the scenario > where, today, an existing file would be overwritten (deleted), instead > the old file is kept and the new file is given the version number +1 > of the old file. > >> Similarly with applications that keep files open but keep writing >> transactions in ways that the OS can''t isolate without input from the >> app. E.g., databases. fsync(2) helps here, but lots and lots of >> fsync(2)s would result in no useful versioning. > > None of those are candidates for file versioning, and a darned good > thing, too.Honestly, as far as file versioning goes, the time to make a new version is when calling open() with the appropriate arguments to allow for append or modification. You obviously don''t want to create a new version if you are only opening a file for read-only access, and changing version on fsync() is ludicrous, and on close() doesn''t differentiate between a file which has been modified or not. Given this, we''re back into the problem FV is supposed to solve. It is entirely possible for an editor to keep open a file for a long time, periodically writing out your changes without issuing a new open(). Word with auto-save turned off is a prime example. Given this, you''ve only created a new version when you first load the document, and all your intermediary changes are lost, since it only saves the document on close(). Thus, in order to get benefits from FV, your editor must issue periodic close() and open() commands on the same file, as you edit, all without your intervention. Exactly how many editors do this? I have no idea. So, the only way to enable FV is to require the user to periodically push the "Save" button. Which is how much more different than the current situation? -Erik
On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote:> > This is what Nico and I are talking about: if you turn on file > versioning automatically (even for just a directory, and not a > whole filesystem), the number of files being created explodes > geometrically.But it doesn''t. Unless you are editing geometrically more files. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/43cb536f/attachment.bin>
Joseph Mocker wrote:> Nicolas Williams wrote: > >> >> The big question though is: how to snapshot file versions when they are >> touched/created by applications that are not aware of FV? >> >> Certainly not with every write(2). At fsync(2), close(2), open(2) for >> write/append? What if an application deals in multiple files? Etc... >> >> Automatically capturing file versions isn''t possible in the general case >> with applications that aren''t aware of FV. >> >> > Don''t snapshots have the same problem. A snapshot could potentially be > taken when a file is partially written or updated, no? > > For example, I start to write a large file, zfs''s buffers fill up and > it flushes them to disk during the middle of the file I''m writing. If > a snapshot came along at about the same time, the file would be > incomplete/corrupt, no? >The developers can answer this definitively, but I believe the answer to your questions is NO. That is, if there is anything in the buffer waiting to be written when a snapshot request comes along, the buffer is written out so that the file is consistent with the last write(). So, snapshotting should NEVER cause a file corruption in this matter. That said, if you are doing the following: 1. App issues write() for data A 2. snapshot request 3. App issues write for data B Then yes, the snapshot file will only contain data A, and not data B, which might lead to an inconsistency in the app''s behavior, if both A and B were important to be written together. But if that were the case, then the app should have written A and B atomically. So, if you are writing to a file, it works better to write everything at once in a stream, rather than a character (or byte) at a time. :-) -Erik
Erik Trimble wrote:> > The developers can answer this definitively, but I believe the answer > to your questions is NO. That is, if there is anything in the buffer > waiting to be written when a snapshot request comes along, the buffer > is written out so that the file is consistent with the last write(). > So, snapshotting should NEVER cause a file corruption in this matter. > That said, if you are doing the following: > > 1. App issues write() for data A > 2. snapshot request > 3. App issues write for data B > > Then yes, the snapshot file will only contain data A, and not data B, > which might lead to an inconsistency in the app''s behavior, if both A > and B were important to be written together.Yes, this is what I was talking about.> But if that were the case, then the app should have written A and B > atomically. >And how realistic is that? You are suggesting, for example, that every application that writes an XML file should buffer the _entire_ XML stream in memory and issue a single atomic write of that entire document. That''s not realistic, and in some cases not even possible. Otherwise, uh, we better fix sed then :-) --joe
On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote:> David Dyer-Bennet wrote: >> On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com> wrote: >> >>> > >Maybe Erik would find it confusing. I know I would find it >>> > >_annoying_. >>> > >>> > Then leave it set to 1 version >>> >>> Per-directory? Per-filesystem? >> >> Whatever. What''s the actual issue here? >> >> I don''t recall that on TOPS-20 it was possible to not version. What >> you could do is set your logout.cmd file to purge your space down to >> one copy when you logged out. > But see, that assumes you have a logout-type functionality to use. > Which indeed is possible for command-line usage, but then only in a > very limited way. During a typical session, I access almost 20 > NFS-mounted directories. And anyone using autofs/automount trees > gets even more. You''re saying that my logout script has to know > about all of them to keep things clean? That''s unrealistic.It is up to you to come up with a scheme to keep things clean, the same way you do now anyway (downloads, etc),> And that still doesn''t solve the problem of people who use SAMBA > or NFS from machines which don''t have an interactive shell logout > system (i.e. Windows).It is still mounted on their desktops and they can still delete files with FV the same way they do now No real issue.> >> This worked fine for the users I knew; even on a system that didn''t >> have as much as a gigabyte of disk storage total to support a few >> dozen software engineers. >> > The problem is we are comparing apples to oranges in user bases > here. TOPS-20 systems had a couple of dozen users (or, at most, a > few hundred). VMS only slightly more. UNIX/POSIX systems have 10s > of thousands.Rarely. Most of them have in the same range as VMS now or then.> Plus, the number of files being created under typical modern > systems is at least two (and probably three or four) orders of > magnitude greater. I''ve got 100,000 files under /usr in Solaris,so? You are not editing these are you?> and almost 1,000 under my home directory.again, FV only matters when you edit them> And I don''t have anything significant in my /home (no source code, > no build/test trees, just misc business stuff). What is managable > with a few files quickly becomes unwieldy with more than a few dozen.I think you admitted you had not used FV before. Is that the case? Then how can you speak about what becomes unwieldy? FV is not any more unwieldy with 1000 files in a dir than with 10. Most people are not editing the 1000 files sitting in their directory.> > This is what Nico and I are talking about: if you turn on file > versioning automatically (even for just a directory, and not a > whole filesystem), the number of files being created explodes > geometrically.Again, it does not. Files are only versioned when they are edited.> >>> > The above should be simple to do however -- a program does an >>> open of >>> > a file name "foo.bar". ZFS / the file system routine would use >>> the >>> > most recent version by default if no version info is given. >>> >>> How can version information be given without changing the APIs or >>> putting the version number/string into the file name? >> >> The version number is part of the file name in all the examples I >> know >> about. I''d find it useless without that; it has to be a real part of >> the filesystem, usable by everybody, not a special addon accessible >> only with one or two dedicated applications. >> >>> Putting the version number/string into the file name is hard for >>> me to >>> accept. It''s what would lead to polluting my directories. >> >> Set your ls default to not show versions. Isn''t the problem then >> solved? Maybe add that option to the GUI filesystem explorer as >> well. >> > But this requires modifying all the relevant apps, which is the > same amount of work as modifying them to use a new FV API. It''s > not transparent to the end-user.Because the semantics of a file name are different on a unix/posix system than they are on a VMS or TOPS-20 system, which had more structured filenames. I would say that the version cannot be an actual part of the file name but would have to be meta data. However, it could display as part of the username and the underlying system can be made to do the right thing ie, "foo" gets you the latest "foo" Specifically entering in foo;7 gets you version 7 or the latest if there are less than 7 versions available. The app can think of it as being part of the file name, but the underlying system would have to know how to do the right thing in extracting the version out and making it meta data. Takes some thinking and I am not claiming to have all the answers right now, but hardly undoable. No app changes are necessary.> >> In practice, it never was a problem that I noticed, or that other >> people noticed. And remember that this was on slower systems with >> smaller screens and often rather slower screen update. >> >> Do you not like the idea based on theory, or did you actually use >> TOPS-20 for a while and find the versioning troublesome? >> > Putting the file version number as part of the file name breaks > things. Apps unaware of the special significance of this format > will tend to write similar names, which can screw everything royally. > Example: > > Say we use <file>;<version> > > In emacs, I edit FOO:2 > > it will write out a temp file "FOO:2~". So, how does the FS deal > with this the next time they need to create a new version? > > The problem lies in that under VMS, the '';'' was a special > character, and unusable in normal naming. I suspect a similar > situation exists under TOPS-20. No such luck in a POSIX filesystem > - all printable (and many unprintable) characters are valid for use > in filenames. So you _CAN''T_ use them to deliniate File Versioning, > without risking blowing the entire scheme when some random app > decides to either use your FV marker for its own needs, or > something similar to the emacs case above.Yes, this needs to be thought about but is hardly a show stopper. There are most likely many possible solutions that will work for most people, and if you make it configurable then those people who run into issues can reconfigure it. Ie, say you do use <file>;<version> and that proves unworkable in a specific case, then the system can be reconfigured to display / decode using a different character. Or perhaps, in that case, the user needs to supply a \ character in front of the ; that exists in a real file to not have it decoded as a version identifier.> > > >>> > one UI is the command line shell >>> >>> Indeed! And command-line tools, like ls(1), find(1), etc... >>> >>> What I''m saying is that I''d like to be able to keep multiple >>> versions of >>> my files without "echo *" or "ls" showing them to me by default. >> >> And I find that completely unacceptable; useless. The whole point of >> putting versioning in the filesystem is that that makes it accessible >> to all programs. >> > But, because of the explosion in the number of files,There is no explosion. You have not made any case except your claim in mail that such an explosion is real and not just your personal fear. Remember, only files that are edited/changed are FVed.> you CAN''T automatically show all versions.Sure you can.> Users will NEVER accept this.Have you done usability testing? There is no explosion of files like you claim so most users would probably not object.> The only clean way to do this is to show file versions only upon > request. Not by default.Your claim. I claim otherwise. You''d have to do some real testing to see if it really is a problem.> > >>> > >What if an application deals in multiple files? >>> > >>> > so? >>> >>> So, file versions aren''t useful unless the application explicitly >>> decides tells the OS when to make them. >> >> File versions are created when a file is created. In the scenario >> where, today, an existing file would be overwritten (deleted), >> instead >> the old file is kept and the new file is given the version number +1 >> of the old file.Exactly>> >>> Similarly with applications that keep files open but keep writing >>> transactions in ways that the OS can''t isolate without input from >>> the >>> app. E.g., databases. fsync(2) helps here, but lots and lots of >>> fsync(2)s would result in no useful versioning. >> >> None of those are candidates for file versioning, and a darned >> good thing, too. > > Honestly, as far as file versioning goes, the time to make a new > version is when calling open() with the appropriate arguments to > allow for append or modification.exactly> You obviously don''t want to create a new version if you are only > opening a file for read-only access, and changing version on fsync > () is ludicrous,yes> and on close() doesn''t differentiate between a file which has been > modified or not.ok. I am not an expert on low level file operations so I don''t know what knowledge if around of a file having been changed or not. However, I''d have to think back to my VMS dev days -- I think that when using the LSE editor, whenever I did a write it did create a new version. I cannot remember for sure. Would have to find a VMS system to test on. This would have to be thought out some.> > Given this, we''re back into the problem FV is supposed to solve. > It is entirely possible for an editor to keep open a file for a > long time, periodically writing out your changes without issuing a > new open(). Word with auto-save turned off is a prime example.ok> Given this, you''ve only created a new version when you first load > the document, and all your intermediary changes are lost, since it > only saves the document on close().ok. FV is not a panacea to all problems. But most people do not sit there with a file open forever. FV solves a lot of problems. It still acts as a checkpoint for file edits -- especially for most people''s standard usage of open a file, edit it, close it, go watch TV, think of something else, come back in and open again and edit it, etc.> Thus, in order to get benefits from FV, your editor must issue > periodic close() and open() commands on the same file, as you edit, > all without your intervention.No, you get the benefits of FV, just across editing sessions and not internal to an editing session.> Exactly how many editors do this? I have no idea. So, the only > way to enable FV is to require the user to periodically push the > "Save" button. Which is how much more different than the current > situation?I edit a file. I realize I screwed up. I can go back to the previous version (or 2 ago or whatever). I cannot do that in the current situation. Chad> > -Erik > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss--- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/eaa14b57/attachment.bin>
>I think our problem is that we look at FV from different angles. I look >at it from the point of view of people who have NEVER used FV, and you >look at it from the view of people who have ALWAYS used FV.That''s certainly a part of it. It''s interesting reading this discussion, as someone who used VMS heavily through about the mid-1980s & then became a UNIX sysadmin. File versioning was one of the items I really missed. There are a lot of interesting use cases (walk away from terminal, come back, quit emacs, get prompted whether to save file -- go ahead, save it, use ''diff'' to determine what changed, and then delete the newly written file version if the changes are unwanted). Directory pollution really turned out not to be an issue in practice, perhaps because the default version limit was relatively small (3 in VMS). It could be set per-file or (IIRC) per-directory. If someone didn''t want versioning at all, they could just set their directory to use one version, and old versions simply didn''t exist. Alternatively, to see just the most recent versions, '';'' would refer to the most recent version, so ''dir ;'' showed only the filenames and no old versions (going from memory here). Having a delimiter character really did help. On UNIX, we have ''/'', though POSIX prohibits, or at least highly discourages, its use. Mac OS X uses ''/'' to access named forks (aka named streams, aka extended attributes in the Solaris sense). If I do ''ls xyz'' I see just xyz. If I do ''ls xyz/rsrc'' and xyz has a ''resource fork'' then I see that. No real reason why I couldn''t do ''ls xyz/versions'', or, preferably, ''ls -V'' :-) and see versions. ''diff xyz xyz/-1'' would diff xyz with its immediately preceding version; ''diff xyz xyz/2'' with version 2. (This would be interesting to prototype and do some usability testing.) I agree that we want a clean interface, that versions should be optional, and that they should be exposed via the network. (My home directory is NFS-mounted.) While disk space is not irrelevant, quotas really help in the multi-user scenario. If someone is close to their limits, they can use ''purge'' (VMS syntax) to remove old versions of their files, or delete specific versions. I don''t agree that version control systems solve the same problem as file versioning. I don''t want to check *every change* that I make into version control -- it makes the history unwieldy. At the same time, if I make a change that turns out to work really poorly, I''d like to revert to the previous code -- not necessary the code which is checked in. (I suspect there may be some versioning systems which allow intermediate versions to be deleted, and I just haven''t used them, but this still seems complex compared to only checking in known-good code.) This message posted from opensolaris.org
>People are oriented to their files, not to snapshots.True, though with NetApp-style snapshots, it''s not that difficult to translate ''src/file.c'' to ''.snapshot/hourly.0/src/file.c'' and see what it was like an hour ago. I imagine that a syntax like ''.snapshot/22:20/src/file.c'' would also be easy to use. (On the other hand, zfs currently requires knowledge of where the file system is rooted, and knowledge of where the current directory is within that filesystem, which IMHO is somewhat confusing to users and requires far too much typing.) I don''t have an answer to your question about how to find an earlier version of a file with snapshots, though given an intelligent file system, there''s no reason why we couldn''t have a ''.version'' pseudodirectory (or the like) which understood file changes and was virtually populated by analyzing differences in the snapshots. This message posted from opensolaris.org
>Versioning cannot be automated; taking periodic snapshots != capturing application state.But I think we have existence proofs of operating systems which do automate versioning. It''s true that capturing a new version each time a file has been modified and closed may not be perfect, but if it works for 99% of user cases, that''s good for almost everyone. We have a lot of 99% tools (even ''ls'' is pretty useless in a ten-million-file directory). If we introduce a new API, users won''t see the benefits because nobody is going to update all of vi, vim, emacs, rsync, ftp, sed, cat, cp .... This message posted from opensolaris.org
On Oct 6, 2006, at 21:17, Joseph Mocker wrote:> Nicolas Williams wrote: > >> On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net >> LLC wrote: >> >>> On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote: >>> >>>> OK. So, now we''re on to FV. As Nico pointed out, FV is going >>>> to need a new API. Using the VMS convention of simply creating >>>> file names with a version string afterwards is unacceptible, as >>>> it creates enormous directory pollution, >>>> >>> Assumption, not supported. "Eye of the beholder." >>> >> >> No, you really need an API, otherwise you have to guess when to >> snapshot >> versions of files. >> > David Dyer-Bennet''s post gives a hint of how this could be done > without any API. Simply augment a few system calls like open(), > unlink(), etc. Calls that can potentially change files. Since you > can''t change a file unless is open()''ed with various write flags > like O_WRONLY, O_RDWR, etc, this could be an ideal place to create > the version. > > One could probably write a "poor man''s" FV LD_PRELOAD library to do > this without the filesystem''s knowledge at all.With the stackable approach, versionfs does this with compression and a number of other configurable policies see: http://filesystems.org/project-versionfs.html> It wouldn''t be as efficient with space as could be done at the > filesystem level, but as someone said, disk is cheap.true, but it''s still finite - there''s typically a notion of recycling or cleaning that is introduced such as in elephant: http://www.hpl.hp.com/personal/Alistair_Veitch/papers/elephant-hotos/ index.html or certain versioning implementations that have been written around SAM-FS and it''s recycler policies using the archiver.log: http://www.hmk-computer.com/docs/products/synstar_restoreme.htm .je
Erik Trimble wrote:> The problem is we are comparing apples to oranges in user bases here. > TOPS-20 systems had a couple of dozen users (or, at most, a few > hundred). VMS only slightly more. UNIX/POSIX systems have 10s of > thousands.IIRC, I had about a dozen files under VMS, not counting versions.> Plus, the number of files being created under typical modern > systems is at least two (and probably three or four) orders of magnitude > greater. I''ve got 100,000 files under /usr in Solaris, and almost 1,000 > under my home directory.wimp :-) I count 88,148 in my main home directory. I''ll bet just running gnome and firefox will get you in the ballpark of 1,000 :-/ -- richard
On Oct 6, 2006, at 10:18 PM, Richard Elling - PAE wrote:> Erik Trimble wrote: >> The problem is we are comparing apples to oranges in user bases >> here. TOPS-20 systems had a couple of dozen users (or, at most, a >> few hundred). VMS only slightly more. UNIX/POSIX systems have >> 10s of thousands. > > IIRC, I had about a dozen files under VMS, not counting versions.You mean in your system? There was a lot more than that...> >> Plus, the number of files being created under typical >> modern systems is at least two (and probably three or four) orders >> of magnitude greater. I''ve got 100,000 files under /usr in >> Solaris, and almost 1,000 under my home directory. > > wimp :-) I count 88,148 in my main home directory. I''ll bet just > running gnome and firefox will get you in the ballpark of 1,000 :-/None (well, maybe 1 or 2) of which you edit and hence would not generate versions. Chad --- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2411 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061006/1c81cd6c/attachment.bin>
On Oct 6, 2006, at 23:42, Anton B. Rang wrote:> I don''t agree that version control systems solve the same problem > as file versioning. I don''t want to check *every change* that I > make into version control -- it makes the history unwieldy. At the > same time, if I make a change that turns out to work really poorly, > I''d like to revert to the previous code -- not necessary the code > which is checked in. (I suspect there may be some versioning > systems which allow intermediate versions to be deleted, and I just > haven''t used them, but this still seems complex compared to only > checking in known-good code.)The use cases are somewhat different here. I would venture to say that a *personal* file versioning system needs to be thought of differently from a *group* co-ordination formal version control system. Of course there is a fair amount of overlap in both use cases particularly when you consider a global namespace and concurrent access problems as you can see in the cedar or plan9 systems (fossil/venti): http://portal.acm.org/citation.cfm?doid=42392.42398 http://cm.bell-labs.com/plan9/ And if we were to also consider dynamic linking and versioning for depracated functions, there''s another whole level of parallel backwards compatibility interface problems that are become much easier to approach. While this is an FV discussion, I do believe that we need some sort of clearer distinction between FV, VC, DR, CDP, and Snapshotting structured around the usability cases and close/sync vs a forced version mark/branch .. there''s too much confusion in this space often with conflicting goals misapplied to often solve similar problems. .je -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061007/5a1c7b2a/attachment.html>
On Oct 6, 2006, at 12:18 PM, David Dyer-Bennet wrote:> On 10/5/06, Wee Yeh Tan <weeyeh at gmail.com> wrote: >> On 10/6/06, David Dyer-Bennet <dd-b at dd-b.net> wrote: >> > One of the big problems with CVS and SVN and Microsoft >> SourceSafe is >> > that you don''t have the benefits of version control most of the >> time, >> > because all commits are *public*. >> >> David, >> >> That is exactly what "branch" is for in CVS and SVN. Dunno much >> about >> M$ SourceSafe. > > I''ve never encountered branch being used that way, anywhere. It''s > used for things like developing release 2.0 while still supporting 1.5 > and 1.6. > > However, especially with merge in svn it might be feasible to use a > branch that way. What''s the operation to update the branch from the > trunk in that scenario?We use personal branches all the time; in fact each developer has at least one, sometimes several if they are working on orthogonal issues or experimenting with a couple of different approaches to the same problem. Personal branches are for messy code, unfinished patches - basically anything that took longer than 15 minutes to write. Keeping that stuff on just one machine is unworkable as I code from many locations, not to mention the server is backed up more often. Note that when I say ''personal'', I mean intended for the use of one particular person. Some people refer to these as ''private'' branches, but we don''t do access control in svn other than on a per-project level, so other users can take a look at what I''m up to. This allows me to ask for suggestions or advice without having to email diffs around. Updating from trunk is slightly irritating as svn doesn''t do merge tracking ATM (it''s in the works, though). Currently I just grep the commit log for the last merge from trunk (I use a consistent log message so this is easy). svn log https://svn.example.com/project/branches/ben | grep ''Merged from trunk'' (note last merged revision) svn merge -r$LAST_MERGED_REV:HEAD https://svn.example.com/project/ trunk /path/to/wc (fix any conflicts) svn ci /path/to/wc -m "Merged from trunk r$LASTMERGEDREV" Of course, you can also cherry-pick changes from other branches or tags if you know the revision number(s). From what I''ve seen on the svn mailing lists, this is a pretty common pattern to use. I don''t think it''s very common in CVS though, simply because branching and merging are more difficult. -- Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061007/6253fc59/attachment.bin>
On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote:> What I''m saying is that I''d like to be able to keep multiple > versions of > my files without "echo *" or "ls" showing them to me by default.Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc? If you don''t like the _ you could use @ or some other character.> I''d like an option for ls(1), find(1) and friends to show file > versions, > and a way to copy (or, rather, un-hide) selected versions files so > that > I could now refer to them as usual -- when I do this I don''t care > to see > version numbers in the file name, I just want to give them names.ln -s ._file.txt.1 first_published_draft.txt ln -s ._file.txt.5 second_published_draft.txt> And, maybe, I''d like a way to write globs that match file versions > (think of extended globboing, as in KSH).Hmm, I''m not exactly sure what you mean by this, but using a dotfile scheme would allow you to easily glob for the file names.> Similarly with applications that keep files open but keep writing > transactions in ways that the OS can''t isolate without input from the > app. E.g., databases. fsync(2) helps here, but lots and lots of > fsync(2)s would result in no useful versioning.Presumably you''d create a different fs for your database, turning the versioning property off. You''d be likely to want to adjust other fs parameters anyway, judging from some recent posts discussing how to get the best database performance. -- Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061007/1b2b0485/attachment.bin>
> If you disagree, please tell us *why* you think snapshots don''t solve the problem.Three reasons. First of all, unless we have per-file snapshots, there''s no way to keep old versions of particularly important files without keeping old versions of everything else. If I have a 4 GB video in my home directory, and my 50 KB file containing finance data, keeping the last version of the 50 KB file (last edited two weeks ago) means keeping the 4 GB file around. Forever, if I never make another change to the 50 KB file. Second, with any rational implementation of file versioning, the end user has control over the number of versions kept for a particular file. Generally snapshots are administratively defined rather than end-user defined, and not at file granularity. Third, snapshots are tied to time, not change. A real-life example: One day I logged into a VAX to test a small program and discovered the Ada compiler wasn''t working because it was complaining about an error in its configuration file. It turned out I''d edited that some eight months earlier (two semesters ago) and made an error which had never been caught as I''d finished the course involved. I simply deleted the current (bad) version, reverting to the last good version. Even if I''d had snapshots going back that far, it would have been painful to find which one (if any) had the correct version of the file. Similarly, I can edit a script, test it, find that the change doesn''t work, and go back, all within 150 seconds or so. The chances that a snapshot would pick up the previous version in this scenario are low. Technically, these could be seen as arguing against the current *implementation* of snapshots. One can envision per-file, user-configurable snapshots. Those would come close, though the third argument above is still an issue. (I can also imagine a "snapshot only if modified" command which might help there.) That said, do file versions fit into UNIX? I think they could be made to, but they would change existing behavior, which could confuse either users (amply demonstrated in these threads) or applications. (For what it''s worth, incidentally, most users don''t use the command line, believe it or not....) This message posted from opensolaris.org
Erik Trimble
2006-Oct-07 10:13 UTC
Snapshots of an active file (was: Re: [zfs-discuss] A versioning FS)
Joseph Mocker wrote:> Erik Trimble wrote: > >> >> The developers can answer this definitively, but I believe the answer >> to your questions is NO. That is, if there is anything in the buffer >> waiting to be written when a snapshot request comes along, the buffer >> is written out so that the file is consistent with the last write(). >> So, snapshotting should NEVER cause a file corruption in this matter. >> That said, if you are doing the following: >> >> 1. App issues write() for data A >> 2. snapshot request >> 3. App issues write for data B >> >> Then yes, the snapshot file will only contain data A, and not data B, >> which might lead to an inconsistency in the app''s behavior, if both A >> and B were important to be written together. > > Yes, this is what I was talking about. > >> But if that were the case, then the app should have written A and B >> atomically. >> > And how realistic is that? You are suggesting, for example, that every > application that writes an XML file should buffer the _entire_ XML > stream in memory and issue a single atomic write of that entire > document. That''s not realistic, and in some cases not even possible. > > Otherwise, uh, we better fix sed then :-) > > --joe >There is no real answer to this problem. If I have to wait for the app to issue a close() on a file before taking a copy for the snapshot, then I can wait indefinitely, and could potentially NEVER get anything for the snapshot. If I somehow manage to do some magic and set aside a copy of the file upon open(), just to put that file into a (possible) snapshot, then yes, I get a consistent file, which is out-of-date w/r/t the current one (and, kinda wrecks the concept of a snapshot being "what is there at time X"). And, as you note, if I wait for pending write() to finish before immediately taking a copy, then I run into the problem of apps possibly not leaving the file in a consistent state. No, you won''t ever have file Corruption (i.e. incomplete write() screwing the file completely), but you certainly might have file Inconsistency, which is an application problem, and not one that can be dealt with at the FileSystem level, as it requires a knowledge of what makes a "consistent" file, from the app''s standpoint. This is the exact same problem as backup software has had forever, so it is not new to snapshots. Which is why you want to take backups (and snapshots) of a quiet filesystem if at all possible. -Erik
Chad Leigh -- Shire.Net LLC wrote:>>> Plus, the number of files being created under typical >>> modern systems is at least two (and probably three or four) orders >>> of magnitude greater. I''ve got 100,000 files under /usr in Solaris, >>> and almost 1,000 under my home directory. >> >> wimp :-) I count 88,148 in my main home directory. I''ll bet just >> running gnome and firefox will get you in the ballpark of 1,000 :-/ > > None (well, maybe 1 or 2) of which you edit and hence would not > generate versions. > > ChadRichard actually brings up a good point, which answers another question Chad had for me: exactly how many files do I edit? Which directly impacts the "directory pollution" problem I''ve been talking about. There are essentially three scenarios: (a) FV is turned on on a per-file basis (b) FV is turned on on a per-directory basis (c) FV is turned on on a per-filesystem basis Now, I think we can all see that you get geometic file explosion in case (c), as absolutely anything that writes to the filesystem gets versioned. Things like Web Browser caches alone would kill you. In case (b), there''s quite a bit of explosion, too. There are lots of apps which create, update, and destroy files frequently in various directories. Most Office and similar large user apps do this. So it is very, very easy to have many versions quickly. This can be somewhat mitigated by NOT turning on FV in directories which are commonly used as temp dirs (e.g. ~/tmp) In case (a), you are down to files you actively tell FV to use, which I agree can be quite manageable. I tend to actively edit a couple of dozen files frequently, so that number can be manageable, so long as the number of versions is held down to some limit. However, in both case (a) and (b) for netFS users, exactly how are they supposed to indicate that they want FV turned on? There is no symantics for doing this in any netFS protocol, so we''d have have to have custom API/tools for them to run to turn on FV. Also, something to think about: under FV, do old versions of a file which was deleted (via unlink() or similar) also get deleted? -Erik
Chad Leigh -- Shire.Net LLC wrote:>> But see, that assumes you have a logout-type functionality to use. >> Which indeed is possible for command-line usage, but then only in a >> very limited way. During a typical session, I access almost 20 >> NFS-mounted directories. And anyone using autofs/automount trees gets >> even more. You''re saying that my logout script has to know about all >> of them to keep things clean? That''s unrealistic. > It is up to you to come up with a scheme to keep things clean, the > same way you do now anyway (downloads, etc), >Which is entirely reasonable if the number of places where FV is limited, but completely unrealistic if FV is turned on for a large number of places. And much more difficult for those restricted to accessing File Versioned directories over a netFS, where scripting cleanups can be difficult or highly impractical.>> And that still doesn''t solve the problem of people who use SAMBA or >> NFS from machines which don''t have an interactive shell logout system >> (i.e. Windows). > It is still mounted on their desktops and they can still delete files > with FV the same way they do now > > No real issue.Well.... If the versions of everything are kept in the same directory, then you are going to have a VERY bad user experience with people using GUI file browsers. Cleaning up multiple versions of the same file name is going to be tricky, and you will find people very frequently accidentally delete the wrong thing. More importantly, people are going to consider it a big hassle to have to keep things tidy by hand. If the versioning is kept somewhere different than the "current" file version, then this mitigates things a bit, but you still don''t want to require people to clean this stuff up via a GUI. And, with Windows, asking users to use the command prompt for what is normally a GUI operation isn''t acceptable, from a general usability standpoint.>> This worked fine for the users I knew; even on a system that didn''t >> The problem is we are comparing apples to oranges in user bases here. >> TOPS-20 systems had a couple of dozen users (or, at most, a few >> hundred). VMS only slightly more. UNIX/POSIX systems have 10s of >> thousands. > Rarely. Most of them have in the same range as VMS now or then.Very, Very few VMS systems that I know about had more than a couple hundred users. MIT''s main VMS server had only about 2000, with less than half that active. A couple of Fortune 500 companies I''ve worked at in the 90s had VMS systems, and they had very restricted user bases. VMS simply was never used as a general-purpose file server, and if there were a fairly large number of users, they were logged in via some custom app, and never really used the system in the manner we are discussing here. On the other hand, virtually all the companies I''ve worked for have had a UNIX-based file server, with at least a hundred or more UIDs. And with Single Sign-on and LDAP becoming the way to go, even mid-sized companies have systems with over a 1000 users. 10,000 active users isn''t hard to come up with at all. And, given that Enterprises are a main target for ZFS, millions of users are entirely within reason.>> But this requires modifying all the relevant apps, which is the same >> amount of work as modifying them to use a new FV API. It''s not >> transparent to the end-user. > > Because the semantics of a file name are different on a unix/posix > system than they are on a VMS or TOPS-20 system, which had more > structured filenames. I would say that the version cannot be an > actual part of the file name but would have to be meta data. However, > it could display as part of the username and the underlying system can > be made to do the right thing > > ie, > > "foo" gets you the latest "foo" > > Specifically entering in foo;7 gets you version 7 or the latest if > there are less than 7 versions available. The app can think of it as > being part of the file name, but the underlying system would have to > know how to do the right thing in extracting the version out and > making it meta data. Takes some thinking and I am not claiming to > have all the answers right now, but hardly undoable. > > No app changes are necessary.No, this is untrue. Remember that you can''t use any character to indicate FV, as all characters are valid POSIX file names. (well, except ''/''). You CAN''T say "foo;8" gives me version 8 of the file "foo", because there very well might be a completely different file name "foo;8" that is NOT any version of the file foo. VMS and TOPS had reserved characters for file versioning, and thus you were set. This isn''t true in UNIX filesystems. The only way to do FV in the POSIX concept is to either keep the file versions in a separate file tree than the "current" files, or to use some sort of an API to access them, and otherwise keep them normally hidden from view. You can''t dodge this by simply saying "oh, well, then change the FV delimiter if it causes you problems". Aside from the fact that you are breaking POSIX compatibility by reserving some character for special use, how confusing would it be to users if the FV delimiter is ";" in this directory, "&" in that directory, "_" in the one over there, etc. ? That''s entirely possible, given the demands of many Windows apps for file naming. -Erik
"Jeremy Teo" <white.wristband at gmail.com> wrote:> A couple of use cases I was considering off hand: > > 1. Oops i truncated my file > 2. Oops i saved over my file > 3. Oops an app corrupted my file. > 4. Oops i rm -rf the wrong directory. > All of which can be solved by periodic snapshots, but versioning gives > us immediacy.I am sure that the same people who accitental type rm -rf * would type rm -rf *\;* And note that this feature would cause a need to change a lot of utilities including all shells (see path name expansion). J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Nicolas Williams <Nicolas.Williams at sun.com> wrote:> On Fri, Oct 06, 2006 at 12:02:16PM -0700, Matthew Ahrens wrote: > > In my opinion, the marginal benefit of per-write(2) versions over > > snapshots (which can be per-transaction, ie. every ~5 seconds) does not > > outweigh the complexity of implementation and use/administration. > > Per-write(2) versions would be worse than useless in many, if not most > cases. Even per-close(2) versions wouldn''t always be useful.Even if there is a proper way to find the right time for a micro snapshot, if the versions live in the standard namespace of the filesystem, it would cause POSIX compatibility problems and we would need to change many programs. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
"David Dyer-Bennet" <dd-b at dd-b.net> wrote:> On 10/6/06, Erik Trimble <Erik.Trimble at sun.com> wrote: > > First of all, let''s agree that this discussion of File Versioning makes > > no more reference to its usage as Version Control. That is, we aren''t > > going to talk about it being useful for source code, other than in the > > context where a source code file is a document, like any other text > > document. File Versioning and Version Control are separate things, with > > different purposes and feature sets. > > Hmm; the most important uses of file versioning come, in my opinion, > when working on source code. But for handling very different > situations than source control does. > > > OK. So, now we''re on to FV. As Nico pointed out, FV is going to need a > > new API. Using the VMS convention of simply creating file names with a > > version string afterwards is unacceptible, as it creates enormous > > directory pollution, not to mention user confusion. So, FV has to be > > invisible to non-aware programs. > > Strongly disagree, twice. > > Having FV invisible to programs not updated to specially support it is > IMHO unacceptable, and would render the feature useless.Making it visible to programs causes many problems with OSIX compatibility and will enforce to change many programs. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Erik Trimble <Erik.Trimble at Sun.COM> wrote:> > In order for an FV implementation to be useful for this stated purpose, > it must fulfill the following requirements: > > (1) Clean interface for users. That is, one must NOT be presented with > a complete list of all versions unless explicitly asked for it, and it > should be simple to select a version based on some reasonable criteria > (date of creation/modification, version number, etc.) > > (2) Simple way to decide if a file should be versioned or not. Either > automatically version all files (or none at all), or provide a mechanism > to turn FV on/off on a per-file or per-directory basis. > > (3) Network-FS awareness. Without this, FV is severely limited. Given > my preconditions above (that is, the current usage pattern of us in the > non-FS world), limiting FV to those on the local system restricts its > usefulness to the point where it isn''t worth the effort.The only idea I get thast matches this criteria is to have the versions in the extended attribute name space. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
"Anton B. Rang" <Anton.Rang at Sun.COM> wrote:> >People are oriented to their files, not to snapshots. > > True, though with NetApp-style snapshots, it''s not that difficult to translate ''src/file.c'' to ''.snapshot/hourly.0/src/file.c'' and see what it was like an hour ago. I imagine that a syntax like ''.snapshot/22:20/src/file.c'' would also be easy to use. (On the other hand, zfs currently requires knowledge of where the file system is rooted, and knowledge of where the current directory is within that filesystem, which IMHO is somewhat confusing to users and requires far too much typing.) >AFAIR, netapp has problems caused by the fact that the inode numbers for the snapshots reside on the same fs. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
It seems that Windows 2003 (and VIsta will too), supports file versioning. I am not familiar with the implementation. AFAIR it is using the "alternate data stream" builtin in NTFS, to work with the versions and hide the versions from the user. Certainly in Vista they will have to handle at least the question of the GUI user interface. The ADS feature already has some way of being acessed from the user interface. -- A. C. Censi accensi [em] gmail [ponto] com accensi [em] montreal [ponto] com [ponto] br accensi [em] gmail [ponto] com - Google Talk
"A. C. Censi" <accensi at gmail.com> wrote:> It seems that Windows 2003 (and VIsta will too), supports file > versioning. I am not familiar with the implementation. AFAIR it is > using the "alternate data stream" builtin in NTFS, to work with the > versions and hide the versions from the user. >This looks like a simplified version of Sun Extended Attributes. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Just to put the references I read in the past about it: http://www.microsoft.com/technet/windowsvista/library/4ac505e6-dd8b-4ae7-80fa-b9d77cd8104d.mspx Windows 2003 Derver implementation (for server side copies of client user files) Working with the Windows Server 2003 Volume Shadow Copy Service http://www.windowsnetworking.com/articles_tutorials/Windows-Server-2003-Volume-Shadow-Copy-Service.html Summary: - named "shadow copy" - works by NTFS volume (equivalent to *ix filesystem) - config number of copies or a % of the volume space for copies - config automatic copies per day - from the GUI it is accessed by the Properties/Previous Versions tab (What if the original file is deleted? the oldest is promoted?) ACC On 10/7/06, Joerg Schilling <Joerg.Schilling at fokus.fraunhofer.de> wrote:> "A. C. Censi" <accensi at gmail.com> wrote: > > > It seems that Windows 2003 (and VIsta will too), supports file > > versioning. I am not familiar with the implementation. AFAIR it is > > using the "alternate data stream" builtin in NTFS, to work with the > > versions and hide the versions from the user. > > > > This looks like a simplified version of Sun Extended Attributes. > > J?rg > > -- > EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin > js at cs.tu-berlin.de (uni) > schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ > URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily >-- A. C. Censi accensi [em] gmail [ponto] com accensi [em] montreal [ponto] com [ponto] br accensi [em] gmail [ponto] com - Google Talk
Erik Trimble wrote:> Joseph Mocker wrote: > >> Erik Trimble wrote: >> >>> >>> The developers can answer this definitively, but I believe the >>> answer to your questions is NO. That is, if there is anything in >>> the buffer waiting to be written when a snapshot request comes >>> along, the buffer is written out so that the file is consistent with >>> the last write(). So, snapshotting should NEVER cause a file >>> corruption in this matter. That said, if you are doing the following: >>> >>> 1. App issues write() for data A >>> 2. snapshot request >>> 3. App issues write for data B >>> >>> Then yes, the snapshot file will only contain data A, and not data >>> B, which might lead to an inconsistency in the app''s behavior, if >>> both A and B were important to be written together. >> >> >> Yes, this is what I was talking about. >> >>> But if that were the case, then the app should have written A and B >>> atomically. >>> >> And how realistic is that? You are suggesting, for example, that >> every application that writes an XML file should buffer the _entire_ >> XML stream in memory and issue a single atomic write of that entire >> document. That''s not realistic, and in some cases not even possible. >> >> Otherwise, uh, we better fix sed then :-) >> >> --joe >> > > There is no real answer to this problem. If I have to wait for the app > to issue a close() on a file before taking a copy for the snapshot, > then I can wait indefinitely, and could potentially NEVER get anything > for the snapshot. If I somehow manage to do some magic and set aside > a copy of the file upon open(), just to put that file into a > (possible) snapshot, then yes, I get a consistent file, which is > out-of-date w/r/t the current one (and, kinda wrecks the concept of a > snapshot being "what is there at time X"). And, as you note, if I > wait for pending write() to finish before immediately taking a copy, > then I run into the problem of apps possibly not leaving the file in a > consistent state. > > No, you won''t ever have file Corruption (i.e. incomplete write() > screwing the file completely), but you certainly might have file > Inconsistency, which is an application problem, and not one that can > be dealt with at the FileSystem level, as it requires a knowledge of > what makes a "consistent" file, from the app''s standpoint. > > This is the exact same problem as backup software has had forever, so > it is not new to snapshots. Which is why you want to take backups > (and snapshots) of a quiet filesystem if at all possible.Which brings me back to the point of file versioning. If an implementation were based on something like when a file is open()ed with write bits set. There would be no potential for broken files like this. Also, it would seem that your statement about snapshots as being "what is there at time X" in a nutshell, describes why snapshots are different than file versioning. File versioning is not temporal in the same way. --joe
Joseph Mocker wrote:> > Which brings me back to the point of file versioning. If an > implementation were based on something like when a file is open()ed > with write bits set. There would be no potential for broken files like > this.I''m showing my lack of knowledge on this one but I thought SAM-FS could do something like this. Anyone know for sure? Of course this doesn''t help for apps that keep files open all the time.
> So, if I build it, people will want it? ;)I think implementing this feature would help Apple adopt ZFS for Time Machine, which is essentially a versioning FS in practice. Actually I don''t know if Apple does this, but you can increment versions with kernel notifications of file changes (Spotlight). Cheers This message posted from opensolaris.org
David Dyer-Bennet wrote:> > Actually, "save early and often" is exactly why versioning is > important. If you discover you''ve gone down a blind alley in some > code, it makes it easy to get back to the earlier spots. This, in my > experience, happens at a detail level where you won''t (in fact can''t) > be doing checkins to version control.Isn''t that what your editor''s undo command is for? Ian
> I''m showing my lack of knowledge on this one but I thought SAM-FS could > do something like this. Anyone know for sure?It''s not quite the same, and not out-of-the-box. SAM-FS has the ability to create an archive copy of files onto disk or tape when the files are closed after having been modified. These copies may not be made immediately; their timing depends on rules set by the system administrator. Hence they are not ?instant? versions. More importantly, there is (currently) no easy way to retrieve an old version. When the archiver makes a copy, the location of the copy is logged. It is possible to use this log to retrieve an older copy even after the file has been overwritten, and there are several third parties who have written software which enables this. Things get trickier when tape recycling comes into play, since the recycler does not know about the desire to keep old versions. At sites which require recycling, the positions of old versions have to be logged into a new file system; or the recycler has to be replaced with a variant which understands versioning. Third parties have done both of these in the past, but we (Sun) don?t currently ship this. There are plans to add a more robust versioning feature to SAM-FS but I don?t believe there is a definite date or release attached yet. This message posted from opensolaris.org
On Fri, Oct 06, 2006 at 06:17:12PM -0700, Joseph Mocker wrote:> David Dyer-Bennet''s post gives a hint of how this could be done without > any API. Simply augment a few system calls like open(), unlink(), etc. > Calls that can potentially change files. Since you can''t change a file > unless is open()''ed with various write flags like O_WRONLY, O_RDWR, etc, > this could be an ideal place to create the version.I wrote about the same thing. These are but heuristics.> One could probably write a "poor man''s" FV LD_PRELOAD library to do this > without the filesystem''s knowledge at all.Indeed, and I believe it has been done before (search freshmeat.net).> It wouldn''t be as efficient with space as could be done at the > filesystem level, but as someone said, disk is cheap.Suppose ZFS gave you a primitive for efficiently "snapshotting" individual files, rather than entire filesystems. That''s the primitive you''d need to implement this LD_PRELOADable object space efficiently. Nico --
On Fri, Oct 06, 2006 at 06:22:01PM -0700, Joseph Mocker wrote:> Nicolas Williams wrote: > >Automatically capturing file versions isn''t possible in the general case > >with applications that aren''t aware of FV. > > > Don''t snapshots have the same problem. A snapshot could potentially be > taken when a file is partially written or updated, no?And backups in general. Nico --
Joseph Mocker wrote:> Which brings me back to the point of file versioning. If an > implementation were based on something like when a file is open()ed > with write bits set. There would be no potential for broken files like > this. > > Also, it would seem that your statement about snapshots as being "what > is there at time X" in a nutshell, describes why snapshots are > different than file versioning. File versioning is not temporal in > the same way. >That is correct. File Versioning is primarily User-Driven (that is, executed and completed at the End-User''s command), if you implement it using open() and close() as the drivers. (which, seems to be the consensus, is the sane way to do FV). So, in theory, FV should never result in any file Inconsistency or Corruption. Snapshots are essentially System-driven, and as such, about a Point-in-Time for the System, not a Point-in-State of an App, which FV centers on. Snapshots and Backup definitely can result in Inconsistency, as the don''t tend to communicate with the app holding a file open. Backups have mitigated this problem with certain apps which tend to hold files open for extended time (primarily DBs), by allowing the Backup program to talk to the app, and have the app write a consistent state to disk before the Backup program run. Snapshots are indeed a different beast than FV, in both subtle and not-so-subtle ways. They fit much more into the class of Backup. Honestly, I''ve thought that through this FV thread, we should never reference snapshots for functionality, as they really aren''t comparable. Apples to Oranges and all. -Erik
Joerg Schilling wrote:> Erik Trimble <Erik.Trimble at Sun.COM> wrote: > > >> In order for an FV implementation to be useful for this stated purpose, >> it must fulfill the following requirements: >> >> (1) Clean interface for users. That is, one must NOT be presented with >> a complete list of all versions unless explicitly asked for it, and it >> should be simple to select a version based on some reasonable criteria >> (date of creation/modification, version number, etc.) >> >> (2) Simple way to decide if a file should be versioned or not. Either >> automatically version all files (or none at all), or provide a mechanism >> to turn FV on/off on a per-file or per-directory basis. >> >> (3) Network-FS awareness. Without this, FV is severely limited. Given >> my preconditions above (that is, the current usage pattern of us in the >> non-FS world), limiting FV to those on the local system restricts its >> usefulness to the point where it isn''t worth the effort. >> > > The only idea I get thast matches this criteria is to have the versions > in the extended attribute name space. > > J?rg > >Realistically speaking, that''s my conclusion, if we want a nice clean, well-designed solution. You need to hide the versioning info in the meta-tags, and create a whole new API for accessing/manipulating them. This easily solves (1) and (2) above, but (3) is the huge problem, as having a new API means you need to change the SMB/NFS protocols to allow for client machines to access the new API. With the new Windows NTFS "versioning", we at least have something to hook into for Windows, but UNIX clients will need to have a whole new suite of tools written, and a raft of current apps modified to take advantage of FV. That said, FV may very well be worth it, and it certainly is worthy of a community-driven exploratory implementation. -Erik
On Sat, Oct 07, 2006 at 01:43:29PM +0200, Joerg Schilling wrote:> The only idea I get thast matches this criteria is to have the versions > in the extended attribute name space.Indeed. All that''s needed then, CLI UI-wise, beyond what we have now is a way to rename versions extended attributes to new file,s or at least copy them (we have the latter). And it nicely hides versions. And it nicely provides an API for creating them on demand ("magic" extended attributes), and remote access. Nico --
On 10/7/06, David Dyer-Bennet <dd-b at dd-b.net> wrote:> I''ve never encountered branch being used that way, anywhere. It''s > used for things like developing release 2.0 while still supporting 1.5 > and 1.6. > > However, especially with merge in svn it might be feasible to use a > branch that way. What''s the operation to update the branch from the > trunk in that scenario?You "merge" the changes from the main trunk. -- Just me, Wire ...
On 10/7/06, Ben Gollmer <ben at jatosoft.com> wrote:> On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote: > > What I''m saying is that I''d like to be able to keep multiple > > versions of > > my files without "echo *" or "ls" showing them to me by default. > > Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc? If you > don''t like the _ you could use @ or some other character.You missed Nicolas''s point. It does not matter which delimiter you use. I still want my "for i in *; do ..." to work as per now. We want to differentiate files that are created intentionally from those that are just versions. If files starts showing up on their own, a lot of my scripts will break. Still, an FV-aware shell/program/API can accept an environment setting that may quiesce the version output. E.g. "export show-version=off/on". -- Just me, Wire ...
On Oct 8, 2006, at 21:40, Wee Yeh Tan wrote:> On 10/7/06, Ben Gollmer <ben at jatosoft.com> wrote: >> On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote: >> > What I''m saying is that I''d like to be able to keep multiple >> > versions of >> > my files without "echo *" or "ls" showing them to me by default. >> >> Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc? If you >> don''t like the _ you could use @ or some other character. > > You missed Nicolas''s point. > > It does not matter which delimiter you use. I still want my "for i in > *; do ..." to work as per now. > > We want to differentiate files that are created intentionally from > those that are just versions. If files starts showing up on their > own, a lot of my scripts will break. Still, an FV-aware > shell/program/API can accept an environment setting that may quiesce > the version output. E.g. "export show-version=off/on". >if we''re talking implementation - i think it would make more sense to store the block version differences in the base dnode itself rather than creating new dnode structures to handle the different versions. You''d then structure different tools or flags to handle the versions (copy them to a new file/dnode, etc) - standard or existing tools don''t need to know about the underlying versions. .je
On Thu, Oct 05, 2006 at 05:25:17PM -0700, David Dyer-Bennet wrote:> No, any sane VC protocol must specifically forbid the checkin of the > stuff I want versioning (or file copies or whatever) for. It''s > partial changes, probably doesn''t compile, nearly certainly doesn''t > work. This level of work product *cannot* be committed to the > repository. > > [...] > > One of the big problems with CVS and SVN and Microsoft SourceSafe is > that you don''t have the benefits of version control most of the time, > because all commits are *public*.I think what you''re saying is something like this: a VC repository is one thing, but when I''m working on something not ready to put into that repository I still want versioning in my "workspace." That''s still VC though! In Teamware you use SCCS for version control in your workspace, then, if you have wx (a script built atop Teamware) you collapse the SCCS deltas to remove all the intermediate work and ''putback'' just the end result to the parent repository. In Teamware the distinction between repository and workspace isn''t :) But you can work that way in many other VCs. In PRCS, for example, you can checkout a project, check it into a new repository, check in changes as you go, then later do this again with the "trunk," merge, then check-in to the original repository. Or you can use one repository and delete unsightly history. Mercurial supports the model of development we use in ON based on Teamware. So you can also get version control for your intermediate versions using Mercurial and lose the unsightly history when you''re ready to commit your changes to the gate. It''s been a while since I''ve used ClearCase, but I''m pretty sure there''s something like this there as well. And, in any case, I think any good VC supports this. And all should, because with file versioning a la VMS I don''t get a lot of things I need, like comments, branches, history, merges, etc... Nico --
On Mon, Oct 09, 2006 at 09:27:14AM +0800, Wee Yeh Tan wrote:> On 10/7/06, David Dyer-Bennet <dd-b at dd-b.net> wrote: > >I''ve never encountered branch being used that way, anywhere. It''s > >used for things like developing release 2.0 while still supporting 1.5 > >and 1.6. > > > >However, especially with merge in svn it might be feasible to use a > >branch that way. What''s the operation to update the branch from the > >trunk in that scenario? > > You "merge" the changes from the main trunk.I think David meant something else. History of intermediate changes is often useless, particularly if some of those changes don''t build. In ON development we''ve used Teamware for years, and for years we''ve had a policy that intermediate deltas must be collapsed. We have a script, ''wx'', that can do that trivially, and good thing too, because collapsing deltas without it is a pain. (I.e., in Teamware terms, if you bringover version 1.7 of some file, check-in 1.8, then 1.9, then putback to the parent workspace you''ll be creating versions 1.8 and 1.9 in the parent when noone needs to see 1.8, so what you want to do is collapse those two deltas, which then become version 1.8, and that''s what you putback.) But this is a lame argument for FV! Because any good VC lets you version intermediate work without polluting the main trunk when you''re done. Nico --
On Sun, Oct 08, 2006 at 10:28:06PM -0400, Jonathan Edwards wrote:> On Oct 8, 2006, at 21:40, Wee Yeh Tan wrote: > >On 10/7/06, Ben Gollmer <ben at jatosoft.com> wrote: > >>Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc? If you > >>don''t like the _ you could use @ or some other character. > > > >It does not matter which delimiter you use. I still want my "for i in > >*; do ..." to work as per now..<prefix> might be acceptable, but I rubs me the wrong way because of this:> >We want to differentiate files that are created intentionally from > >those that are just versions. If files starts showing up on their > >own, a lot of my scripts will break. Still, an FV-aware > >shell/program/API can accept an environment setting that may quiesce > >the version output. E.g. "export show-version=off/on".Exactly.> if we''re talking implementation - i think it would make more sense to > store the block version differences in the base dnode itself rather than > creating new dnode structures to handle the different versions. You''d > then structure different tools or flags to handle the versions (copy > them > to a new file/dnode, etc) - standard or existing tools don''t need to > know > about the underlying versions.You''re arguing for treating FV as extended/named attributes :) I think that''d be the right thing to do, since we have tools that are aware of those already. Of course, we''re talking about somewhat magical attributes, but I think that''s fine (though, IIRC, NFSv4 [RFC3530] has some strange verbiage limiting attributes to "applications"). Nico --
On 10/9/06, Jonathan Edwards <Jonathan.Edwards at sun.com> wrote:> > We want to differentiate files that are created intentionally from > > those that are just versions. If files starts showing up on their > > own, a lot of my scripts will break. Still, an FV-aware > > shell/program/API can accept an environment setting that may quiesce > > the version output. E.g. "export show-version=off/on". > > if we''re talking implementation - i think it would make more sense to > store the block version differences in the base dnode itself rather than > creating new dnode structures to handle the different versions. You''d > then structure different tools or flags to handle the versions (copy > them > to a new file/dnode, etc) - standard or existing tools don''t need to > know > about the underlying versions.The beauty of extending the dnode is that it will continue to behave nicely through renames or multiple hardlinks. However, handling Erik''s concerns about recovering deleted files will require a bit more work (mainly concerns about how a user will recover his file(s)). There may also be performance considerations when if mass version purging happens often. -- Just me, Wire ...
On Oct 8, 2006, at 22:46, Nicolas Williams wrote:> On Sun, Oct 08, 2006 at 10:28:06PM -0400, Jonathan Edwards wrote: >> On Oct 8, 2006, at 21:40, Wee Yeh Tan wrote: >>> On 10/7/06, Ben Gollmer <ben at jatosoft.com> wrote: >>>> Hmm, what about file.txt -> ._file.txt.1, ._file.txt.2, etc? If you >>>> don''t like the _ you could use @ or some other character. >>> >>> It does not matter which delimiter you use. I still want my "for >>> i in >>> *; do ..." to work as per now. > > .<prefix> might be acceptable, but I rubs me the wrong way because of > this: > >>> We want to differentiate files that are created intentionally from >>> those that are just versions. If files starts showing up on their >>> own, a lot of my scripts will break. Still, an FV-aware >>> shell/program/API can accept an environment setting that may quiesce >>> the version output. E.g. "export show-version=off/on". > > Exactly. > >> if we''re talking implementation - i think it would make more sense to >> store the block version differences in the base dnode itself >> rather than >> creating new dnode structures to handle the different versions. >> You''d >> then structure different tools or flags to handle the versions (copy >> them >> to a new file/dnode, etc) - standard or existing tools don''t need to >> know >> about the underlying versions. > > You''re arguing for treating FV as extended/named attributes :)kind of - but one of the problems with EAs is the increase/bloat in the inode/dnode structures and corresponding incompatibilities with other applications or tools. Another approach might be to put it all into the block storage rather than trying to stuff it into the metadata on top. If we look at the zfs on-disk structure instead and simply extend the existing block pointer mappings to handle the diffs along with a header block to handle the version numbers - this might be an easier way out rather than trying to redefine or extend the dnode structure. Of course you''d still need a single attribute to flag reading the version block header and corresponding diff blocks, but this could go anywhere - even a magic acl perhaps .. i would argue that the overall goal should be aimed toward the reduction of complexity in the metadata nodes rather than attempting to extend them and increase the seek/parse time. .je
On Sun, Oct 08, 2006 at 11:16:21PM -0400, Jonathan Edwards wrote:> On Oct 8, 2006, at 22:46, Nicolas Williams wrote: > >You''re arguing for treating FV as extended/named attributes :) > > kind of - but one of the problems with EAs is the increase/bloat in > the inode/dnode structures and corresponding incompatibilities with > other applications or tools.This in a thread where folks [understandably] claim that storage is cheap and abundant. And I agree that it is. Plus, I think you may be jumping to conclusions about the bloat of extended attributes:> Another approach might be to put it all > into the block storage rather than trying to stuff it into the > metadata on top. If we look at the zfs on-disk structure instead and > simply extend the existing block pointer mappings to handle the diffs > along with a header block to handle the version numbers - this might > be an easier way out rather than trying to redefine or extend the > dnode structure. Of course you''d still need a single attribute to > flag reading the version block header and corresponding diff blocks, > but this could go anywhere - even a magic acl perhaps .. i would > argue that the overall goal should be aimed toward the reduction of > complexity in the metadata nodes rather than attempting to extend > them and increase the seek/parse time.Wait a minute -- the extended attribute idea is about *interfaces*, not internal implementation. I certainly did not argue that a file version should be copied into an EA. Let''s keep interface and implementation details separate. Most of this thread has been about interfaces precisely because that''s what users will interact with; users won''t care one bit about how it''s all implemented under the hood. Nico --
On Sun, Oct 08, 2006 at 03:38:54PM -0700, Erik Trimble wrote:> Joseph Mocker wrote: > >Which brings me back to the point of file versioning. If an > >implementation were based on something like when a file is open()ed > >with write bits set. There would be no potential for broken files like > >this. > > > >Also, it would seem that your statement about snapshots as being "what > >is there at time X" in a nutshell, describes why snapshots are > >different than file versioning. File versioning is not temporal in > >the same way. > > > That is correct. File Versioning is primarily User-Driven (that is, > executed and completed at the End-User''s command), if you implement it > using open() and close() as the drivers. (which, seems to be the > consensus, is the sane way to do FV). So, in theory, FV should never > result in any file Inconsistency or Corruption. Snapshots are > essentially System-driven, and as such, about a Point-in-Time for the > System, not a Point-in-State of an App, which FV centers on.I don''t agree entirely. For many apps heuristic FV boundaries will do. There are apps for which it won''t. And we''ve not talked about files unlinked on rename. Should the new file''s FV history replace the old one''s? Should the histories be merged? I think heuristic FV has its place, but it won''t do in general.> Snapshots and Backup definitely can result in Inconsistency, as the > don''t tend to communicate with the app holding a file open. Backups have > mitigated this problem with certain apps which tend to hold files open > for extended time (primarily DBs), by allowing the Backup program to > talk to the app, and have the app write a consistent state to disk > before the Backup program run.Sortof. If you can quiesce the application then you can snapshot the FS safely. Or if the application has any sort of recovery (journalling + rollback, say), then you may be able to snapshot safely at any time. Of course, if you have large filesystems dedicated to multiple apps (e.g., home directories), then you typically can''t quiesce all of those apps.> Snapshots are indeed a different beast than FV, in both subtle and > not-so-subtle ways. They fit much more into the class of Backup. > Honestly, I''ve thought that through this FV thread, we should never > reference snapshots for functionality, as they really aren''t comparable. > Apples to Oranges and all.More like giant oranges (snapshots) to mini-mandarins (FV). I''m not saying that FV is a bad idea -- IMO it''s a good idea. I''m concerned about the interfaces. VMS-style in-your-face FV seems like a bad idea to me, and only heuristic-driven FV does too. And then there''s semantics to iron out (e.g., unlinks on rename). Things to work out: - APIs for creating FV (magic EAs provide an easy answer) - APIs for accessing FV (EAs provide an easy answer) - UIs for acessing FV (EAs provide an easy answer) - heuristics for automatic FV (open for write, atomic appends, fsyncs, unlinks, ...) - UIs for controlling automatic FV (ideas? FS properties come to mind; EAs; LD_PRELOAD has been mentioned) - FV semantics in the face of POSIX Nico --
On Fri, Oct 06, 2006 at 05:11:54PM -0700, David Dyer-Bennet wrote:> I don''t recall that on TOPS-20 it was possible to not version. What > you could do is set your logout.cmd file to purge your space down to > one copy when you logged out.I never used TOPS-20. I did use VMS. As I recall it didn''t have anything like hard links (ok, let''s not pick nits here: it did have links, but not link counts, and so everyone avoided links; orphaned files were a pain). No hard links simplifies a lot of things. OTOH, we''re stuck with hard links. My point? There''s not necessarily an obvious, workable mapping of FV in non-POSIX-like OSes to POSIX ones. Nico --
On Fri, Oct 06, 2006 at 06:33:14PM -0700, Erik Trimble wrote:> But, because of the explosion in the number of files, you CAN''T > automatically show all versions. Users will NEVER accept this. The only > clean way to do this is to show file versions only upon request. Not by > default.Besides, what good does do to a user to have FVs visible on every directory listing all the time? None, except maybe to give them the wams and fuzzies. The user wants those FVs only when he/she screws up :) OTOH, I know at least one user who would find his directory listings exploding if FV were on by default: me :) And don''t tell me that I could turn it off, or that it''d be off by default, as I''m sure our IT department would probably turn it on for all users by default and would make it difficult for me to get it turned off. Nico --
On Fri, Oct 06, 2006 at 07:37:47PM -0600, Chad Leigh -- Shire.Net LLC wrote:> On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote: > >This is what Nico and I are talking about: if you turn on file > >versioning automatically (even for just a directory, and not a > >whole filesystem), the number of files being created explodes > >geometrically. > > But it doesn''t. Unless you are editing geometrically more files.Perhaps my filing habits aren''t very good, as I have many files that I''ve edited over the years in very few directories. Why punish me? (Also, I believe in the search better, search more, file/sort less model that Gmail and friends promote. Filing is a pain. Searching should be easy and fast. Until we get to where searching is always simpler/faster than scrolling through directory listings I simply could not accept in-your-face FV.) Nico --
Nicolas Williams wrote:> On Thu, Oct 05, 2006 at 05:25:17PM -0700, David Dyer-Bennet wrote: > >> No, any sane VC protocol must specifically forbid the checkin of the >> stuff I want versioning (or file copies or whatever) for. It''s >> partial changes, probably doesn''t compile, nearly certainly doesn''t >> work. This level of work product *cannot* be committed to the >> repository. >> >> [...] >> >> One of the big problems with CVS and SVN and Microsoft SourceSafe is >> that you don''t have the benefits of version control most of the time, >> because all commits are *public*. >> > > I think what you''re saying is something like this: a VC repository is > one thing, but when I''m working on something not ready to put into that > repository I still want versioning in my "workspace." > > That''s still VC though! >This is just one class of problem that I think VC might be useful for. We could go on, specific case by case coming up with best practices for applications, but it seems to me that FV is trying to solve a general problem in general way. Whether that is a good idea or bad idea I don''t know. However would it be great if I could somehow easily FV a file I am working on with some arbitrary (closed) application I am forced to use without the application really knowing about it, and with little or no actions I have to take to do so?
On Fri, Oct 06, 2006 at 11:57:36AM -0700, Matthew Ahrens wrote:> przemolicc at poczta.fm wrote: > >On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC > >wrote: > >>But I would dearly like to have a versioning capability. > > > >Me too. > >Example (real life scenario): there is a samba server for about 200 > >concurrent connected users. They keep mainly doc/xls files on the > >server. From time to time they (somehow) currupt their files (they > >share the files so it is possible) so they are recovered from backup. > >Having versioning they could be said that if their main file is > >corrupted they can open previous version and keep working. > >ZFS snapshots is not solution in this case because we would have to > >create snapshots for 400 filesystems (yes, each user has its filesystem > >and I said that there are 200 concurrent connections but there much more > >accounts on the server) each hour or so. > > I completely disagree. In this scenario (and almost all others), use of > regular snapshots will solve the problem. ''zfs snapshot -r'' is > extremely fast, and I''m working on some new features that will make > using snapshots for this even easier and better-performing. > > If you disagree, please tell us *why* you think snapshots don''t solve > the problem.Matt, think of night when some (maybe 5 %) people still work. Having snapshot I would still have to create snapshots for 400 filesystems each hour because I don''t know which of them are working. And what about weekend ? Still 400 snaphosts each hour ? And ''zfs list'' will list me 400*24*2=19200 lines ? And how about organizations which has thousends people and keep their files on one server ? Or ISP/free e-maila account providers who have millions ? Imagine just ordinary people who use ZFS in their homes and forgot creating snapshots ? Or they turn their computer on once and then don''t turn it off: they work daily (and create snapshot an hour) and don''t turn it off in the evening but leave it working and downloading some films and musics. Still one snapshot an hour ? How many snapshot''s a day, a week a month ? Thousands ? And having ZFS which is _so_easy_ to use does managing so many snapshots is ZFS-like feature ? (ZFS-like extremely easy). The way ZFS is working right now is that it cares about disks (checksumming), redundancy (raid*) and performance. Having versioning would let ZFS care about people mistakes. And people do mistakes. Yes, Matt, you are right that snapshots are a feature which might be used here but it is not the most convenient in such scenarios. Snapshots are probably much more useful then versioning in "predictable" scenarios: backup at night, software development (commit new version) etc. In highly unpredictable environment (many users working in _diferent_ hours in different part ot the world) you would have to create many thousands of snapshots. To deal with them might be painfull. Matt, I agree with you that having snapshots *solve* the problem with 400 filesystems because in SVM/UFS environemnt I _wouldn''t_ have such solution. But I feel that versioning would be much more convenient here. Imagine that you are the admin of the server and ZFS has versioning: having a choice what would you choose in this case ? przemol
przemolicc at poczta.fm wrote:>> I completely disagree. In this scenario (and almost all others), use of >> regular snapshots will solve the problem. ''zfs snapshot -r'' is >> extremely fast, and I''m working on some new features that will make >> using snapshots for this even easier and better-performing. >> >> If you disagree, please tell us *why* you think snapshots don''t solve >> the problem. > > think of night when some (maybe 5 %) people still work. Having snapshot > I would still have to create snapshots for 400 filesystems each hour because I > don''t know which of them are working. And what about weekend ? Still > 400 snaphosts each hour ? And ''zfs list'' will list me 400*24*2=19200 lines ? > And how about organizations which has thousends people and keep their > files on one server ? Or ISP/free e-maila account providers who have millions ?Yes, this is unfortunate. I have some forthcoming changes that will allow you to take periodic snapshots, but only when changes are occurring. This will greatly decrease the "snapshot explosion" you point out.> Matt, I agree with you that having snapshots *solve* the problem with > 400 filesystems because in SVM/UFS environemnt I _wouldn''t_ have such > solution. But I feel that versioning would be much more convenient here. > Imagine that you are the admin of the server and ZFS has versioning: having a choice > what would you choose in this case ?I think that my preference would depends a lot on the details of the file versioning. Certainly, if there is some implementation of file versioning which allows me to find the old data I''m looking for more easily than snapshots, and it does not impose an undue performance burden[*] on the system, then I would choose that. However, as we''re discovering on this thread, discovering such a scheme is nontrivial :-) --matt [*] This includes disk space usage. For example, how would the space used by file versions be accounted/expressed? How would file versioning interact with snapshots? Including old file versions in snapshots might be contrary to the user''s expectations, and wasteful of space. But any other behavior may be prohibitively complicated to implement.
On Fri, Oct 06, 2006 at 02:08:34PM -0700, Erik Trimble wrote:> Also, "save-early-save-often" results in a version explosion, as does > auto-save in the app. While this may indeed mean that you have all of > your changes around, figuring out which version has them can be > massively time-consuming. Let''s say you have auto-save set for 5 > minutes (very common in MS Word). That gives you 12 versions per hour. > If you suddenly decide you want to back up a couple of hours, that > leaves you with looking at a whole bunch of files, trying to figure out > which one you want. E.g. I want a file from about 3 hours ago. Do I > want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hours > ago? And, what if I''ve mis-remembered, and it really was closer to 4 > hours ago? Yes, the data is eventually there. However, wouldn''t a > 1-hour snapshot capability have saved you an enormous amount of time, by > being able to simplify your search (and, yes, you won''t have _exactly_ > the version you want, but odds are you will have something close, and > you can put all the time you would have spent searching the FV tree into > restarting work from the snapshot-ed version).Erik, versioning could be managed by sort of versioning policy managed by users. E.g. if a file, which is going to be saved right now (auto-saving), has a previous version saved within last 30 minuts, don''t create another "previous" version. 10:00 open file f.xls 10:10 (...working...) 10:20 file.xls;1 (...auto save ...) 10:30 (...working...) 10:40 (...auto save ...) - don''t create another version because within last 30 minuts there is another, previous version Another policy might be based on number of previous version: e.g. if there are more then 10, purge the older.> [...] > > > To me, FV is/was very useful in TOPS-20 and VMS, where you were looking > at a system DESIGNED with the idea in mind, already have a user base > trained to use and expect it, and virtually all usage was local (i.e. no > network filesharing). None of this is true in the UNIX/POSIX world.Versioning could be turned off per filesystem. And also could be inherited from a parent - exactly like current compression. przemol
Erik Trimble <Erik.Trimble at Sun.COM> wrote:> > The only idea I get thast matches this criteria is to have the versions > > in the extended attribute name space. > > > > J?rg > > > > > Realistically speaking, that''s my conclusion, if we want a nice clean, > well-designed solution. You need to hide the versioning info in the > meta-tags, and create a whole new API for accessing/manipulating them. > This easily solves (1) and (2) above, but (3) is the huge problem, as > having a new API means you need to change the SMB/NFS protocols to allow > for client machines to access the new API. With the new Windows NTFSThere is no need to extend NFS as NFS v4 already supports extended attributes. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:> On Sat, Oct 07, 2006 at 01:43:29PM +0200, Joerg Schilling wrote: > > The only idea I get thast matches this criteria is to have the versions > > in the extended attribute name space. > > Indeed. All that''s needed then, CLI UI-wise, beyond what we have now is > a way to rename versions extended attributes to new file,s or at least > copy them (we have the latter). And it nicely hides versions. And it > nicely provides an API for creating them on demand ("magic" extended > attributes), and remote access. >The infrastructure is there - local or remote via NFSv4 - the problem is that the extended attribute name space lacks definitions for usage. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:> You''re arguing for treating FV as extended/named attributes :) > > I think that''d be the right thing to do, since we have tools that are > aware of those already. Of course, we''re talking about somewhat magical > attributes, but I think that''s fine (though, IIRC, NFSv4 [RFC3530] has > some strange verbiage limiting attributes to "applications").I thought NFSv4 supports extended attributes. What "limiting" are you aware of? J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Joseph Mocker wrote:> However would it be great if I could somehow easily FV a file I am > working on with some arbitrary (closed) application I am forced to use > without the application really knowing about it, and with little or no > actions I have to take to do so? >To paraphrase an old wive''s tale: "That ain''t gonna happen in my lifetime." I think that this discussion thread has determined that you DON''T want to make file versioning visible to un-modified applications. That said, given that it looks like a new FV API is the current favorite implementation way, I see no reason why you can''t open your favorite app, edit FOO, and have ZFS do versioning on it behind the scenes. You just won''t be able to see the versions from inside your App. You''d need a FV-aware app (whether a GUI filebrower or cmdline util) to access the old versions, and potentially copy an older version to a new filename, allowing you to edit it in your non-FV-aware App. -Erik
On Mon, Oct 09, 2006 at 12:44:34PM +0200, Joerg Schilling wrote:> Nicolas Williams <Nicolas.Williams at Sun.COM> wrote: > > > You''re arguing for treating FV as extended/named attributes :) > > > > I think that''d be the right thing to do, since we have tools that are > > aware of those already. Of course, we''re talking about somewhat magical > > attributes, but I think that''s fine (though, IIRC, NFSv4 [RFC3530] has > > some strange verbiage limiting attributes to "applications"). > > I thought NFSv4 supports extended attributes. What "limiting" are you > aware of?It does. I meant this on pg. 12: [...] Named attributes are meant to be used by client applications as a method to associate application specific data with a regular file or directory. and this on pg. 36: Named attributes are intended for data needed by applications rather than by an NFS client implementation. NFS implementors are strongly encouraged to define their new attributes as recommended attributes by bringing them to the IETF standards-track process. and this on pg. 232: 17.1. Named Attribute Definition The NFS version 4 protocol provides for the association of named attributes to files. The name space identifiers for these attributes are defined as string names. The protocol does not define the specific assignment of the name space for these file attributes. Even though the name space is not specifically controlled to prevent collisions, an IANA registry has been created for the registration of NFS version 4 named attributes. Registration will be achieved through the publication of an Informational RFC and will require not only the name of the attribute but the syntax and semantics of the named attribute contents; the intent is to promote interoperability where common interests exist. While application developers are allowed to define and use attributes as needed, they are encouraged to register the attributes with IANA. Nico --
On Oct 8, 2006, at 23:54, Nicolas Williams wrote:> On Sun, Oct 08, 2006 at 11:16:21PM -0400, Jonathan Edwards wrote: >> On Oct 8, 2006, at 22:46, Nicolas Williams wrote: >>> You''re arguing for treating FV as extended/named attributes :) >> >> kind of - but one of the problems with EAs is the increase/bloat in >> the inode/dnode structures and corresponding incompatibilities with >> other applications or tools. > > This in a thread where folks [understandably] claim that storage is > cheap and abundant. And I agree that it is. > > Plus, I think you may be jumping to conclusions about the bloat of > extended attributes: > >> Another approach might be to put it all >> into the block storage rather than trying to stuff it into the >> metadata on top. If we look at the zfs on-disk structure instead and >> simply extend the existing block pointer mappings to handle the diffs >> along with a header block to handle the version numbers - this might >> be an easier way out rather than trying to redefine or extend the >> dnode structure. Of course you''d still need a single attribute to >> flag reading the version block header and corresponding diff blocks, >> but this could go anywhere - even a magic acl perhaps .. i would >> argue that the overall goal should be aimed toward the reduction of >> complexity in the metadata nodes rather than attempting to extend >> them and increase the seek/parse time. > > Wait a minute -- the extended attribute idea is about *interfaces*, > not > internal implementation. I certainly did not argue that a file > version > should be copied into an EA.true, but I just find that the EA discussion is just as loaded as the FV discussion that too often focuses on improvements in the metadata space rather than the block data space. I''m not talking about the file version data .. rather the bplist for the file version data and possibly causing this to live in the block data space instead of the dnode DMU. This way the FV will be completely accessible within the filesystem block data structure instead of being abstracted back out of the dnode DMU. I would hold that the version data space consumption should also be readily apparent on the filesystem level and that versioned access should not impede the regular file lookup or attribute caching. It''s a slight deviation from the typical EA approach, but an important distinction to make to keep the metadata structures relatively lean.> Let''s keep interface and implementation details separate. Most of > this > thread has been about interfaces precisely because that''s what users > will interact with; users won''t care one bit about how it''s all > implemented under the hood.I''m not so sure you can separate the two without creating a hack. I would also argue that users (particularly the ones creating the interfaces) will care about the implementation details since those are the real underlying issues they''ll be wrestling with. .je
On Mon, Oct 09, 2006 at 11:16:41AM -0400, Jonathan Edwards wrote:> On Oct 8, 2006, at 23:54, Nicolas Williams wrote: > >Let''s keep interface and implementation details separate. Most of > >this > >thread has been about interfaces precisely because that''s what users > >will interact with; users won''t care one bit about how it''s all > >implemented under the hood. > > I''m not so sure you can separate the two without creating a hack. I > would also argue that users (particularly the ones creating the > interfaces) will care about the implementation details since those > are the real underlying issues they''ll be wrestling with.I''m sure that we can. And I''m sure that most users won''t care one bit how FV is implemented. Nico --
On 10/6/06, Erik Trimble <Erik.Trimble at sun.com> wrote:> David Dyer-Bennet wrote: > > On 10/6/06, Nicolas Williams <Nicolas.Williams at sun.com> wrote: > > > >> > >Maybe Erik would find it confusing. I know I would find it > >> > >_annoying_. > >> > > >> > Then leave it set to 1 version > >> > >> Per-directory? Per-filesystem? > > > > Whatever. What''s the actual issue here? > > > > I don''t recall that on TOPS-20 it was possible to not version. What > > you could do is set your logout.cmd file to purge your space down to > > one copy when you logged out. > But see, that assumes you have a logout-type functionality to use. Which > indeed is possible for command-line usage, but then only in a very > limited way. During a typical session, I access almost 20 NFS-mounted > directories. And anyone using autofs/automount trees gets even more. > You''re saying that my logout script has to know about all of them to > keep things clean? That''s unrealistic. And that still doesn''t solve > the problem of people who use SAMBA or NFS from machines which don''t > have an interactive shell logout system (i.e. Windows).Seems entirely realistic to me that your logout script would know about the things you routinely use. People who don''t log into any system are more of a problem, though. Various things come to mind, like having a default number of files (so it doesn''t expand without limits), and maybe a regular cron job; but I''ve never worked in an environment doing versioning for non-login users over the network, so they''re all theory, no idea how they''d work in practice.> > > This worked fine for the users I knew; even on a system that didn''t > > have as much as a gigabyte of disk storage total to support a few > > dozen software engineers. > > > The problem is we are comparing apples to oranges in user bases here. > TOPS-20 systems had a couple of dozen users (or, at most, a few > hundred). VMS only slightly more. UNIX/POSIX systems have 10s of > thousands. Plus, the number of files being created under typical modern > systems is at least two (and probably three or four) orders of magnitude > greater. I''ve got 100,000 files under /usr in Solaris, and almost 1,000 > under my home directory. And I don''t have anything significant in my > /home (no source code, no build/test trees, just misc business stuff). > What is managable with a few files quickly becomes unwieldy with more > than a few dozen.I have to ask again -- is this theory? Or have you actually worked on a versioning filesystem? And specifically on TOPS-20? (I remember, vaguely, that people found VMS versioning MUCH less comfortable to work with than TOPS-20, and I don''t know at this distance if that was just because it was different, or because of subtle UI differences). I don''t think the number of files under /usr is relevant; how often do you edit them by hand? I''d expect an installation procedure to clean up old versions when it was done installing new software; but if not a simple purge would settle the matter. I don''t recall my directories having much fewer files then than now. I have more *directories* now, but the number of files in a directory is set by human issues and by development process issues, not by disk space available.> This is what Nico and I are talking about: if you turn on file > versioning automatically (even for just a directory, and not a whole > filesystem), the number of files being created explodes geometrically.I don''t see it; new versions are created *when you do something* to a file; not from the file just sitting there. And the number of files I poke in a day, again, isn''t controlled much by the disk space available, it''s controlled by *my time*, and so has stayed more constant over the years.> >> > The above should be simple to do however -- a program does an open of > >> > a file name "foo.bar". ZFS / the file system routine would use the > >> > most recent version by default if no version info is given. > >> > >> How can version information be given without changing the APIs or > >> putting the version number/string into the file name? > > > > The version number is part of the file name in all the examples I know > > about. I''d find it useless without that; it has to be a real part of > > the filesystem, usable by everybody, not a special addon accessible > > only with one or two dedicated applications. > > > >> Putting the version number/string into the file name is hard for me to > >> accept. It''s what would lead to polluting my directories. > > > > Set your ls default to not show versions. Isn''t the problem then > > solved? Maybe add that option to the GUI filesystem explorer as well. > > > But this requires modifying all the relevant apps, which is the same > amount of work as modifying them to use a new FV API. It''s not > transparent to the end-user.I think the relevant apps are very different in the two cases. File listing tools are much rarer than file using tools, and in my case you only need to modify the file listing tools. In your case, you have to modify every single file using tool.> > In practice, it never was a problem that I noticed, or that other > > people noticed. And remember that this was on slower systems with > > smaller screens and often rather slower screen update. > > > > Do you not like the idea based on theory, or did you actually use > > TOPS-20 for a while and find the versioning troublesome? > > > Putting the file version number as part of the file name breaks things. > Apps unaware of the special significance of this format will tend to > write similar names, which can screw everything royally. > > Example: > > Say we use <file>;<version> > > In emacs, I edit FOO:2 > > it will write out a temp file "FOO:2~". So, how does the FS deal with > this the next time they need to create a new version?Whatever. None of the choices are a disaster. None of them "break" anything. I essentially never have to look at these, any version of them, so it doesn''t matter very much what their names are. Possibly some clever definitions of how things are handled could make the results cleaner, and that''s worth looking at, but the worst results I can imagine from this scenario are unimportant, they don''t hurt anything.> The problem lies in that under VMS, the '';'' was a special character, and > unusable in normal naming. I suspect a similar situation exists under > TOPS-20. No such luck in a POSIX filesystem - all printable (and many > unprintable) characters are valid for use in filenames. So you _CAN''T_ > use them to deliniate File Versioning, without risking blowing the > entire scheme when some random app decides to either use your FV marker > for its own needs, or something similar to the emacs case above.This is theory again. In practice, there aren''t such schemes in use anywhere I can find. If there are, yes, some file-versioning schemes would break them, and those apps would have to be updated. A theoretically clean approach is desirable, but an approach that actually works is more important. An approach that requires programs to be updated before they can use file versioning doesn''t, by my standards, "work"; I wouldn''t be able to use it with the files and applications it''s valuable to me for any time soon. When you talk about a new API for versioning -- how do you envision information being conveyed from the command lines of programs to this new API? Isn''t it likely that it would end up becoming a part of file name syntax, and changing the rules about allowable characters in filenames? And in that case, you can make the whole change in the "open" and "link" calls, and get the same end effect.> >> > one UI is the command line shell > >> > >> Indeed! And command-line tools, like ls(1), find(1), etc... > >> > >> What I''m saying is that I''d like to be able to keep multiple versions of > >> my files without "echo *" or "ls" showing them to me by default. > > > > And I find that completely unacceptable; useless. The whole point of > > putting versioning in the filesystem is that that makes it accessible > > to all programs. > > > But, because of the explosion in the number of files, you CAN''T > automatically show all versions. Users will NEVER accept this. The only > clean way to do this is to show file versions only upon request. Not by > default.Is this theory, or do you have some experience to support it? You say "can''t"; I''m not at all worried about it, myself. I''ve worked in these environments, and liked it very much. I''ve watched new people get introduced to them. People like this when they see it well-implemented. I don''t accept your assertion that directories people edit files in have more files in them today than they used to, in general. I also don''t accept the assertion that the number of extra versions scales with the number of files in the directory -- it scales with the number of files you re-write in the directory, which is limited more by human working speed and time in the day, not by number of files there.> >> > >What if an application deals in multiple files? > >> > > >> > so? > >> > >> So, file versions aren''t useful unless the application explicitly > >> decides tells the OS when to make them. > > > > File versions are created when a file is created. In the scenario > > where, today, an existing file would be overwritten (deleted), instead > > the old file is kept and the new file is given the version number +1 > > of the old file. > > > >> Similarly with applications that keep files open but keep writing > >> transactions in ways that the OS can''t isolate without input from the > >> app. E.g., databases. fsync(2) helps here, but lots and lots of > >> fsync(2)s would result in no useful versioning. > > > > None of those are candidates for file versioning, and a darned good > > thing, too. > > Honestly, as far as file versioning goes, the time to make a new version > is when calling open() with the appropriate arguments to allow for > append or modification. You obviously don''t want to create a new version > if you are only opening a file for read-only access, and changing > version on fsync() is ludicrous, and on close() doesn''t differentiate > between a file which has been modified or not.Yes, versioning is a file-create feature.> Given this, we''re back into the problem FV is supposed to solve. It is > entirely possible for an editor to keep open a file for a long time, > periodically writing out your changes without issuing a new open().You describe this as a problem, but *I* see it as the exact thing that makes file versioning useful. It DOESN''T save random magically chosen moments; it saves exactly all the version that *you*, the user, saved at some point of the editing session.> Word with auto-save turned off is a prime example. Given this, you''ve > only created a new version when you first load the document, and all > your intermediary changes are lost, since it only saves the document on > close().You''re forgetting that the user, unless he''s stupid, will save regularly during the editing session.> Thus, in order to get benefits from FV, your editor must > issue periodic close() and open() commands on the same file, as you > edit, all without your intervention. Exactly how many editors do this? > I have no idea. So, the only way to enable FV is to require the user to > periodically push the "Save" button. Which is how much more different > than the current situation?It is completely and utterly different from the current situation. In the current situation, when I type the "save" command *I am deleting a previous version*. That''s dangerous, because people don''t think of it as performing a destructive operation, and hence don''t give it the care and consideration they give to an explicit "rm". And that''s precisely what file versioning fixes; saving a file is no longer a destructive operation. -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
On 10/7/06, Erik Trimble <Erik.Trimble at sun.com> wrote:> Chad Leigh -- Shire.Net LLC wrote: > >>> Plus, the number of files being created under typical > >>> modern systems is at least two (and probably three or four) orders > >>> of magnitude greater. I''ve got 100,000 files under /usr in Solaris, > >>> and almost 1,000 under my home directory. > >> > >> wimp :-) I count 88,148 in my main home directory. I''ll bet just > >> running gnome and firefox will get you in the ballpark of 1,000 :-/ > > > > None (well, maybe 1 or 2) of which you edit and hence would not > > generate versions. > > > > Chad > > Richard actually brings up a good point, which answers another question > Chad had for me: exactly how many files do I edit? Which directly > impacts the "directory pollution" problem I''ve been talking about. > > There are essentially three scenarios: > > (a) FV is turned on on a per-file basis > > (b) FV is turned on on a per-directory basis > > (c) FV is turned on on a per-filesystem basis > > > Now, I think we can all see that you get geometic file explosion in case > (c), as absolutely anything that writes to the filesystem gets > versioned. Things like Web Browser caches alone would kill you.Web browser caches (as normally used) would *never* generate a single additional file version. The web browsers use a naming algorithm to prevent overwriting the same file, and that''s the situation when a new version is created. They delete the files they decide they don''t need directly, rather than by overwriting the same name. Your use of "writes to the filesystem" suggests to me you''re thinking of a different implementation of versioning than was in TOPS-20 and VMS, and that (I think) most of us are discussing here. The kind of versioning I''m talking about works by keep old versions of a file *when it''s overwritten by a new version*. It''s the operation of creating a new file with the same name as an old file that triggers it; in current Unix semantics the old file is deleted, but in the kind of FV I''m talking about, the old version is *kept* and the new version is given an incremented version number to keep the names unique. It has nothing to do with writing to files; if you update a file in place, a new version isn''t generated. -- David Dyer-Bennet, <mailto:dd-b at dd-b.net>, <http://www.dd-b.net/dd-b/> RKBA: <http://www.dd-b.net/carry/> Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/> Dragaera/Steven Brust: <http://dragaera.info/>
Why not see if you can find (or write, or have written) an editor that does the version name changes for you? i.e. - each time you save, or each auto-save, it writes a different version of the file, and when you exit, it asks if you''d like to retain the other versions or not? Sounds like it would be a LOT simpler to do, and with snapshots for everything else, I don''t see a need for a version name changing filesystem. This message posted from opensolaris.org
On Fri, 2006-10-06 at 00:07 -0700, Richard L. Hamilton wrote:> Some people are making money on the concept, so I > suppose there are those who perceive benefits: > > http://en.wikipedia.org/wiki/Rational_ClearCase > > (I dimly remember DSEE on the Apollos; ...)I used both fairly extensively. Much of the apollo DSEE team left HP to write ClearCase. Neither are versioning filesystems; instead, both are software configuration management systems which export a limited virtual filesystem interface. With such systems, versioning is not transparent but instead involves interaction with a CLI or GUI around checkout/checkin.
Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:> On Mon, Oct 09, 2006 at 12:44:34PM +0200, Joerg Schilling wrote: > > Nicolas Williams <Nicolas.Williams at Sun.COM> wrote: > > > > > You''re arguing for treating FV as extended/named attributes :) > > > > > > I think that''d be the right thing to do, since we have tools that are > > > aware of those already. Of course, we''re talking about somewhat magical > > > attributes, but I think that''s fine (though, IIRC, NFSv4 [RFC3530] has > > > some strange verbiage limiting attributes to "applications"). > > > > I thought NFSv4 supports extended attributes. What "limiting" are you > > aware of? > > It does. I meant this on pg. 12: > > [...] Named attributes > are meant to be used by client applications as a method to associate > application specific data with a regular file or directory.FreeBSD and Linux implement something different also called extended attributes. There should be a possibility to map from FreeBSD/Linux to Solaris.> and this on pg. 36: > > Named attributes are intended for data needed by applications rather > than by an NFS client implementation. NFS implementors are strongly > encouraged to define their new attributes as recommended attributes > by bringing them to the IETF standards-track process.See above... Since the extended attributes appeared on a Solaris ( 8 update???), I was looking for a way to map simple exteneded attribute implementation as those on Mac OS, FreeBSD and Linux to the more general implementation on Solaris. Before we start defining the first offocial functionality for this Sun feature, we should define a mapping for Mac OS, FreeBSD and Linux. It may make sense, to define a sub directory for the attribute directory for keeping old versions of a file. J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote:> Before we start defining the first offocial functionality for this Sun feature, > we should define a mapping for Mac OS, FreeBSD and Linux. It may make sense, to > define a sub directory for the attribute directory for keeping old versions > of a file.Definitely a sub-directory would be needed yes, and I don''t agree to the first part.
Nicolas Williams <Nicolas.Williams at Sun.COM> wrote:> On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote: > > Before we start defining the first offocial functionality for this Sun feature, > > we should define a mapping for Mac OS, FreeBSD and Linux. It may make sense, to > > define a sub directory for the attribute directory for keeping old versions > > of a file. > > Definitely a sub-directory would be needed yes, and I don''t agree to the > first part.Why not? J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
On Fri, Oct 13, 2006 at 11:03:51AM +0200, Joerg Schilling wrote:> Nicolas Williams <Nicolas.Williams at Sun.COM> wrote: > > > On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote: > > > Before we start defining the first offocial functionality for this Sun feature, > > > we should define a mapping for Mac OS, FreeBSD and Linux. It may make sense, to > > > define a sub directory for the attribute directory for keeping old versions > > > of a file. > > > > Definitely a sub-directory would be needed yes, and I don''t agree to the > > first part. > > Why not?Because I don''t see how creating a sub-directory of the EA namespace for storing FVs will step on the toes of anyone trying to map other platforms'' notions of EA onto Solaris''. Is this being too optimistic? Nico --